Here we summarise the methods we have used. The Google News search tool is used to capture data from online news sources. Articles matching the search terms are loaded into a database every hour. Product and location labels are applied automatically, by means of a pattern matching algorithm for product names, and an open source library called CLIFF for locations. Trained analysts then correct or refine automated labels and add further categorisation information to content in the database through a web-based interface. The labelled data then serve as training data for a machine learning algorithm that applies tentative labels to new data as it enters the system.
The MQM Globe web application is based on the Cesium ion platform and Angular web framework. These choices were based on community support availability and longevity. Its technology supports mobile and tablets although the user interface is not currently optimised for it.
The MQM Globe search tool is based on the Apache Lucene open-source search software, running on the server. The indexes for the data have an incremental update twice a day and a full rebuild once per week, ensuring that the latest information is always searched and displayed. We have added synonym content to help the search find articles when users do not use strict medical terminology; so that, for example, a search on tranquillizer will find articles about anxiolytic medicines. Additionally a search for Africa will find articles of individual countries located in the African region.
There are important limitations on the use of these data and we have added warning statements to alert users to these issues. See the data categories for the different data extracted from the articles by the analysts:
- Articles related to poor quality medicines, retrieved from Google News in different languages using our bespoke key terms as well as the names of the different medicines and medicine classes from the ATC classification.
- Any articles regarding the quality of medicines/medical products, including, but not exhaustively, vaccines, medical devices, traditional and herbal medicines, vitamins, weight loss medicines, nutritional supplement, veterinary medicines, medical products, blood products and psychoactive substances (i.e. drugs used for psychoactive properties, such as cocaine and cannabis, if the quality of these products is suspected)
- Any articles describing recalls, seizures, diversions, thefts, degradation, adulteration or contamination of medicines, cases of patients suffering adverse effects/lack of efficacy after taking a medicine suspected to be substandard or falsified.
- General discussion, development/marketing of a new system/technologies to identify/prevent poor quality medicines, articles on new laws/regulations, legal proceedings against manufacturers/individuals involved in smuggling and/or manufacturing, or cases of medicines diverted for resale will appear in the list generated when the user performs a query in the search box, but won’t appear on the Globe.
- The articles are first automatically extracted and tagged as relevant/not relevant by the system. Then, analysts (with a public health and/or medical/pharmaceutical background, and with fluency in the language of the report) manually curate the data daily during working days, starting on 1 July 2018 for English language; 1 February 2019 for French language; 11 June 2019 for Mandarin and 14 August 2019 for Spanish.
- Searches on specific topics/locations/dates can be performed using the search functionality and reports can be extracted as a report
- See the full list of search terms here