Hundreds of delegates at the European CDISC Interchange listened as Kalynn Kennon and Sam Strudwick, of the Infectious Diseases Data Observatory (IDDO) shared their solutions to the challenges of applying the Study Data Tabulation Model (SDTM) standard in the context of a multi-study, multi-disease data platform.
Delegates learnt how IDDO is utilising a novel implementation of SDTM, a CDISC standard for organising and formatting data, to maximise the benefits of sharing legacy data on the IDDO platform.
With more non-standardised data on infectious diseases pouring into IDDO daily, and an increasing global emphasis on maximising the impact of data sharing, the data scientists had a major challenge on their hands.
Each day they deal with submissions of data to the IDDO platform from across the world about different diseases, in different formats and usually with unique information – their challenge was to harmonise it to a single standard to produce meaningful pooled data to help answer key research questions as well as accelerate research towards better treatments.
With a combination of ‘thinking outside the box’ and CDISC, a solution was created.
CDISC is a data standards organisation shaping how data are optimally structured to improve quality and reusability. Each year it hosts conferences around the world where visionaries, thought leaders, scientists and global regulatory representatives speak about the latest trends and initiatives. Delegates come from a range of professional backgrounds including data managers, clinicians, medical writers, metadata modellers, programmers, study designers and biostatisticians.
For the last 10 years scientists at the Worldwide Antimalarial Resistance Network, WWARN, used a standard system to aggregate and harmonise malaria data to produce new evidence of better treatments that save lives. One example of a WWARN analysis, the WWARN Dose Impact Study Group, led to policy changes in the WHO ‘Guidelines for the Treatment of Malaria’. Building on the WWARN model, IDDO was launched to further work across neglected and emerging diseases such as Ebola, visceral leishmaniasis, schistosomiasis, soil-transmitted helminthiases, and Chagas disease.
IDDO’s work relies on a centralised data repository, where global data are shared and standardised. The data are re-used in new analyses to answer research questions posed by research collaborations. Their outputs inform guidelines and policy for the treatment of these diseases.
Early versions of the IDDO repository used an internal standard to harmonise data. However, as the number of diseases they worked on increased, the IDDO data scientists found that expanding the current system for every new disease wasn’t efficient or scalable.
Data Team Leader, Kalynn Kennon said: “What we needed was a single standard that could be applied to all of these diseases to help create an overarching IDDO data repository. Something that would make the data useful and reusable and increase the benefits of this data that had already been collected.
“With SDTM, we could have a disease-agnostic, all-encompassing standard for all diseases in a single repository and data curators could work across diseases and not have to have specialised knowledge of the different disease-specific standards.”
Kalynn said the team had this vision in mind, but there were issues they had to overcome. She said: “The SDTM standard itself is incredibly flexible and can accommodate the different types of data that we receive. However, the implementation of that standard was designed for the regulatory submission of individual clinical trial datasets. It meant our goal of curating highly heterogeneous legacy data into a single repository required a bit of learning and adjusting as we went.”
The ‘simple’ plan
Data Manager Sam Strudwick said the key to the plan was to leverage the strength and range of the SDTM standard to create the structure of the IDDO Data Repository. They would also create a SDTM data dictionary that would be disease and data agnostic and make it accessible and user -friendly.
She said: “Of course it’s always easier said than done - when you are trying to put such disparate data together into a unified structure, the question isn’t just ‘where does this go’, but ‘does it make sense to put it there?’”
“We wanted to be able to retain the stories of the data that have been shared with us, and also ensure the accessibility and usability, not just of the data, but of the standard.”
Their task was complicated because the data is ‘legacy data’ which was not collected in a standard format and can include non-standard information that must be preserved.
Sam said: “We have bent some of the current implementation rules to incorporate our data into the SDTM model. But the model is inherently flexible so it was possible to create a usable and accessible data repository using a standard initially built for the special constraints of regulatory bodies.”
For IDDO’s data team this is a work in progress. As the IDDO portfolio of diseases and repository size continues to increase, they continue to build the data dictionary and apply the standards to new data.