GatorTronGPT is a clinical language model that uses a GPT-3 architecture to generate and evaluate healthcare texts, such as clinical notes, medical reports, prescriptions, and texts related to patient care.
GatorTronGPT creates synthetic clinical data to train AI models in understanding medical terminology.
It was developed by a research team from the University of Florida and NVIDIA. The model runs on the University of Florida’s HiPerGator AI supercomputer and was presented in a paper published in npj Digital Medicine.
You can check out the website and code of GatorTronGPT here.
Overview
GatorTronGPT was created for the purpose of biomedical natural language processing (NLP), such as generating and evaluating texts related to medical research and healthcare. The model was developed in 4 stages:
a. train the model from scratch using a GPT-3 architecture with up to 20 billion parameters. The model was trained on a massive corpus of 277 billion words of text, including 82 billion words of de-identified clinical text from the University of Florida Health and 195 billion diverse English words from the Pile dataset.
b. use the model to solve two important biomedical NLP tasks: biomedical relation extraction and question answering. The researchers used a unified P-tuning base text generation architecture to solve both of these tasks.
c. use GatorTronGPT to generate 20 billion words of synthetic clinical text, which was further used to train a new NLP model using BERT architecture, called GatorTronS (‘S’ stands for synthetic).
d. conduct a Turing test to evaluate the performance of GatorTronGPT. The model generated 30 paragraphs of synthetic clinical text that were randomly mixed with 30 real-world paragraphs written by the University of Florida Health physicians.
The manual evaluation of clinical paragraphs was performed by two internal medicine sub-specialists in endocrinology and cardiology. Their task was to identify which paragraphs were real-world and which were synthetic, based on the text quality, coherence, and relevance.
The results showed that the physicians were unable to distinguish between the text generated by GatorTronGPT and the text written by humans.
Training and validation
The GatorTronGPT models were trained on 560 NVIDIA A100 GPUs running NVIDIA’s Megatron-LM package. Training the 5 billion-parameter GatorTronGPT model took approximately 6 days, while the 20 billion-parameter model required around 20 days using the same hardware configuration.
The next figure shows the training loss and validation loss for GatorTronGPT.
Evaluation of GatorTronGPT and GatorTronS
GatorTronGPT was evaluated and compared with other existing biomedical models on several biomedical NLP tasks, including:
- Clinical concept extraction
- Medical relation extraction
- Semantic textual similarity
- Natural language inference
- Question answering
The results showed that GatorTronGPT outperformed the state-of-the-art models, such as BioLinkBERT, ClinicalBERT, and BioGPT, as well as GPT-2 model, for most of these tasks. The researchers also compared GatorTronGPT with ChatGPT and showed that GatorTronGPT generated more relevant and accurate texts for healthcare queries.
GatorTronS (a synthetic NLP model trained on 20 billion words of synthetic clinical text generated with GatorTronGPT) was compared with other existing BERT-based models:
- ClinicalBERT (trained on a publicly available dataset of 20 million clinical notes from the MIMIC-III database)
- GatorTron (a clinical NLP model trained using real-world 90 billion words of text).
The results showed that GatorTronS outperformed ClinicalBERT and GatorTron for most of the tasks.
Conclusion
GatorTronGPT is an LLM that uses the GPT-3 architecture to generate and evaluate texts from the healthcare domain, including clinical notes, medical reports, and prescriptions. It has the ability to process and comprehend biomedical language, thereby enhancing patient care and medical research.
However, it is important to note that GatorTronGPT cannot replace human expertise and judgment.
Read more:
- Research paper: “A study of generative large language model for medical research and healthcare” (on npj Digital Medicine)
- Project page