[BibTeX] [RIS]
RoBuster: A Corpus Annotated with Risk of Bias Text Spans in Randomized Controlled Trials
Type of publication: Article
Citation:
Publication status: Under review
Journal: JMIR Preprints
Year: 2024
Month: January
DOI: 10.2196/preprints.55127
Abstract: Background: Risk of bias (RoB) assessment of randomized clinical trials (RCTs) is vital to answering systematic review questions accurately. Manual RoB assessment for hundreds of RCTs is a cognitively demanding and lengthy process. Automation has the potential to assist reviewers in rapidly identifying text descriptions in RCTs that indicate potential risks of bias. However, no RoB text span annotated corpus could be used to fine-tune or evaluate large language models (LLMs), and there are no established guidelines for annotating the RoB spans in RCTs. Objective: The revised Cochrane RoB Assessment 2 (RoB 2) tool provides comprehensive guidelines for RoB assessment; however, due to the inherent subjectivity of this tool, it cannot be directly used as RoB annotation guidelines. Our objective was to develop precise RoB text span annotation instructions that could address this subjectivity and thus aid the corpus annotation. Methods: We leveraged RoB 2 guidelines to develop visual instructional placards that serve as text annotation guidelines for RoB spans and risk judgments. Expert annotators employed these visual placards to annotate a dataset named RoBuster, consisting of 41 full-text RCTs from the domains of physiotherapy and rehabilitation. We report inter-annotator agreement (IAA) between two expert annotators for text span annotations before and after applying visual instructions on a subset (9 out of 41) of RoBuster. We also provide IAA on bias risk judgments using Cohen's Kappa. Moreover, we utilized a portion of RoBuster (10 out of 41) to evaluate an LLM using a straightforward evaluation framework. This evaluation aimed to gauge the performance of LLM (here GPT 3.5) in the challenging task of RoB span extraction and demonstrate the utility of this corpus using a straightforward evaluation framework. Results: We present a corpus of 41 RCTs with fine-grained text span annotations comprising more than 28,427 tokens belonging to 22 RoB classes. The IAA at the text span level calculated using the F1 measure varies from 0% to 90%, while Cohen's kappa for risk judgments ranges between -0.235 and 1.0. Employing visual instructions for annotation increases the IAA by more than 17 percent points. LLM (GPT-3.5) shows promising but varied observed agreements with the expert annotation across the different bias questions. Conclusions: Despite having comprehensive bias assessment guidelines and visual instructional placards, RoB annotation remains a complex task. Utilizing visual placards for bias assessment and annotation enhances IAA compared to cases where visual placards are absent; however, text annotation remains challenging for the subjective questions and the questions for which annotation data is unavailable in RCTs. Similarly, while GPT-3.5 demonstrates effectiveness, its accuracy diminishes with more subjective RoB questions and low information availability.
Keywords: corpus, Information extraction, large language model, Natural Language Processing, risk of bias, systematic reviews
Authors Dhrangadhariya, Anjani
Hilfiker, Roger
Sattelmayer, Martin
Naderi, Nona
Giacomino, Katia
Caliesch, Rahel
Higgins, Julian
Marchand-Maillet, Stéphane
Müller, Henning
Added by: []
Total mark: 0
Attachments
    Notes
      Topics