Lay press articles - methodology
Here we summarise the methods we have used for the lay press articles. The Google News search tool is used to capture data from online news sources. Articles matching the search terms are loaded into a database every hour. Product and location labels are applied automatically, by means of a pattern matching algorithm for product names, and an open source library called CLIFF for locations. Trained analysts then correct or refine automated labels and add further categorisation information to content in the database through a web-based interface. The labelled data then serve as training data for a machine learning algorithm that applies tentative labels to new data as it enters the system.
The MQM Globe web application is based on the Cesium ion platform and Angular web framework. These choices were based on community support availability and longevity. Its technology supports mobile and tablets although the user interface is not currently optimised for it.
The MQM Globe search tool is based on the Apache Lucene open-source search software, running on the server. The indexes for the data have an incremental update twice a day and a full rebuild once per week, ensuring that the latest information is always searched and displayed. We have added synonym content to help the search find articles when users do not use strict medical terminology; so that, for example, a search on tranquillizer will find articles about anxiolytic medicines. Additionally a search for Africa will find articles of individual countries located in the African region.
There are important limitations on the use of these data and we have added warning statements to alert users to these issues. See the data categories for the different data extracted from the articles by the analysts:
Lay press articles loaded into the database:
Type of products included
- Articles related to poor quality medicines, are mainly retrieved from Google News in different languages using our bespoke key terms
- Any article regarding the quality of medicines/medical products, including, but not exhaustively, vaccines, medical devices, traditional and herbal medicines, vitamins, weight loss medicines, nutritional supplement, veterinary medicines, medical products, blood products and psychoactive substances (i.e. drugs used for psychoactive properties, such as cocaine and cannabis, if the quality of these products is suspected).
Type of articles included
- ‘Incidents’: Any article describing recalls, seizures, diversions, thefts, degradation, adulteration or contamination of medicines, cases of patients suffering adverse effects/lack of efficacy after taking a medicine suspected to be substandard or falsified.
‘Incidents’ will appear in the list generated in the left side panel when the user performs a query in the search box and as pins on the globe.
- ‘General discussions’: General discussions, development/marketing of a new system/technologies to identify/prevent poor quality medicines, articles on new laws/regulations, legal proceedings against manufacturers/individuals involved in smuggling and/or manufacturing, or cases of medicines diverted for resale.
‘General discussions’ will appear in the list generated in the left side panel when the user performs a query in the search box, but will not appear as ‘pins’ on the Globe.
Type of sources included
- The majority of the articles are retrieved from Google News
- In addition some websites and articles from NGOs, national and international organisations which are not necessarily linked to a governance body are added when captured by the members of the team or shared by colleagues.
- Currently the ‘lay press’ section also displays some alerts coming from relevant U.S. FDA RSS news feeds. Once the ‘regulatory alert’ section has the functionality to display separate alerts, the US FDA alerts will be transferred to the ‘regulatory alert’ section
- The articles are first automatically extracted and tagged as relevant/not relevant by the system. Then, analysts (with a public health and/or medical/pharmaceutical background, and with fluency in the language of the report) manually curate the data daily during working days, starting on 1 July 2018 for English language, 1 February 2019 for French language, 11 June 2019 for Mandarin, 14 August 2019 for Spanish and 25 September 2019 for Vietnamese.
- Searches in the MQM Globe on specific topics/locations/dates/report types can be performed using the search functionality and reports can be extracted as pdf. Please see the User guide.