Extraction Task | Ground Truth Labels | Experi-ment | GPT 3.5 | GPT- 4 t | GPT- 4o |
---|
Avg Precision | Avg Recall | Weighted Avg F1 | Avg Precision | Avg Recall | Weighted Avg F1 | Avg Precision | Avg Recall | Weighted Avg F1 |
---|
Verbal Ability | VSS | MCP- 1.5Y | .69 | .63 | .51 | .72 | .71 | .66 | .74 | .73 | .68 |
CFCS | MCP- 1.5Y | .69 | .63 | .52 | .68 | .67 | .63 | .72 | .71 | .66 |
Ambulatory Ability | GMFCS | MCP- 1.5Y | .73 | .75 | .69 | .86 | 0.9 | .88 | .88 | .91 | .90 |
- The precision, recall, and weighted-average F1 scores are reported across 3 GPT versions (GPT- 3.5, GPT- 4 t, ad GPT- 4o) and for verbal and ambulatory extraction tasks. All extractions were performed using multi-class prediction on notes written within 1.5 years of patient enrollment in the registry. For the verbal ability extraction, the extraction results utilizing ground truth labels from both the VSS and CFCS assessments were evaluated. The top scoring GPT model ground truth label set is bolded, and the top scoring label and GPT model combination for each extraction task is underlined