Therapeutic efficacy in COVID-19 is dependent upon disease severity (treatment effect heterogeneity). Unfortunately, definitions of severity vary widely. This compromises the meta-analysis of randomised controlled trials (RCTs) and the therapeutic guidelines derived from them. The World Health Organisation ‘living’ guidelines for the treatment of COVID-19 are based on a network meta-analysis (NMA) of published RCTs. We reviewed the 81 studies included in the WHO COVID-19 living NMA and compared their severity classifications with the severity classifications employed by the international COVID-NMA initiative. The two were concordant in only 35% (24/68) of trials. Of the RCTs evaluated, 69% (55/77) were considered by the WHO group to include patients with a range of severities (12 mild-moderate; 3 mild-severe; 18 mild-critical; 5 moderate-severe; 8 moderate-critical; 10 severe-critical), but the distribution of disease severities within these groups usually could not be determined, and data on the duration of illness and/or oxygen saturation values were often missing. Where severity classifications were clear there was substantial overlap in mortality across trials in different severity strata. This imprecision in severity assessment compromises the validity of some therapeutic recommendations; notably extrapolation of “lack of therapeutic benefit” shown in hospitalised severely ill patients on respiratory support to ambulant mildly ill patients is not warranted. Both harmonised unambiguous definitions of severity and individual patient data (IPD) meta-analyses are needed to guide and improve therapeutic recommendations in COVID-19. Achieving this goal will require improved coordination of the main stakeholders developing treatment guidelines and medicine regulatory agencies. Open science, including prompt data sharing, should become the standard to allow IPD meta-analyses.