Evidence and reporting standards in N-of-1 medical studies: a systematic review
Literature search
Inclusion/exclusion criteria
The review followed the 2020 PRISMA recommendations [6] (Supplementary Table 1) and guidelines from the Cochrane Collaboration for data extraction and synthesis [7]. Included studies were peer-reviewed, published in medical journals, examined medical outcomes, used SCED/N-of-1, were empirical articles, and in the English language. Only medical conditions listed in International Classification of Diseases (ICD-10) [8] or Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [9] were included in the present study to retain a meaningful scope and align with widely used clinical practices. Online Supplementary Table 1 gives the PRISMA checklist and how they were met for the current study.
Search strategy
The following databases were searched: PubMed and Web of Science. These databases were chosen because these search engines have reproducible search results in different locations and at different times. Exact search terms were: “n-of-1*” OR “N-of-1 trial” OR “N-of-1 design” OR “single case design” OR “single subject design” OR “single case experimental design” AND “drug” OR “therapy” OR “intervention” OR “treatment” in the title, abstract, or keywords. The dates of publication were restricted to between January 1, 2013 and May 3, 2022 for relevance, sufficiency, and feasibility as the WWC Standards for SCEDs were published in 2010 and later in 2013. The search ended on May 3, 2022.
Data management
References and abstracts of articles found from the initial search were downloaded into the reference management software EndNote. Duplicate reference entries were removed. The remaining reference entries were transported to a Google Sheets file by and for independent review by two co-authors (EM and BB) for inclusion criteria. Reliability of electronic search results was established through replication of the electronic search and an inter-rater comparison of the number of identified articles (100% agreement).
Selection process
Two co-authors independently (EM and BB) screened 341 articles (title and article abstract review) to determine eligibility of articles for the current review. From this initial screening, 189 articles were identified as potentially eligible and were subjected to a second screening. The two co-authors then independently reviewed the 189 articles (full text) to ensure their eligibility for this review. Articles that were not empirical work (e.g., protocol, commentary), and articles that were not N-of-1 trials or did not have a medical outcome variable were excluded independently leading to a total of 115 articles that met all inclusion criteria (100% agreement).
Coding process
There were 4 coders. Two were experts in statistical methodology and three were experts in SCEDs. One co-author (EM), as the primary coder, conducted data extraction from 115 eligible articles. To obtain inter-reliability estimates, 30 (26.09%) of the included articles were additionally coded by two other co-authors (BB and SC) through random assignment. Before coding the articles included in the review, researchers calibrated coding reliability by using the coding tool to analyze studies that did not meet the inclusion criteria. Interobserver agreements during calibration were measured at 94.3%. When discussing whether a specific study met an individual indicator, areas of incongruity were discussed until researchers reached consensus. Once reliability above 90% was established, researchers began coding the articles included in the review (coding tool available from first author). Interobserver agreement for all coded articles was measured at 93.1%. Finally, the first author (PNB) recoded all the articles to ensure 100% agreement between the first author and the coding of the other three co-authors.
Risk of bias assessment
as given by the Risk of Bias in Systematic Reviews Tool (ROBIS) ( is in Table 1. Additionally, the online Supplementary Table 2 gives the risk of bias in not meeting evidence standards, in reporting treatment effect, and in inappropriate data analysis for each study.
Rating evidence
All studies were N-of-1 studies. According to Oxford center for evidence-based medicine (OCEBM, all these studies will be level 3 studies because they manipulate the control arm of a randomized trial (Fig. 1).
Statistical analysis
Descriptive analysis such as frequency and percentages are reported. Table 2 outlines information on number of studies meeting the WWC [3, 4] requirements for meeting evidence standards. Table 3 outlines information on number of studies demonstrating how treatment effect was determined per WWC standards [3, 4] (immediacy, changes in level/trend, effect sizes/confidence or credible intervals, consistency, effect sizes), and different methodological characteristics (e.g., type of analysis conducted, whether this was appropriate for the data [if data met the assumptions of the analyses], and whether autocorrelations were included in the models). Additionally, we coded the characteristics of the study such as the number of phases, phase length, type of outcome variable, types of effect sizes, data distribution assumptions met, accounting for carryover effect, intraclass correlation, sensitivity analysis, and subgroup analysis.
link