BioGPT: generative pre-trained Transformer for biomedical text generation and mining

Microsoft Research team presents BioGPT, a variant of GPT-2 model for the biomedical domain.

BioGPT follows the Transformer language model and was pre-trained on 15M PubMed abstracts corpus. It outperforms previous models, such as BioBERT and PubMedBERT, on various NLP tasks, including generative tasks, relation extraction, and question answering.

BioGPT achieved a new record of 78.2% accuracy on the PubMedQA task and 81.0% accuracy for the larger BioGPT-Large model.

Framework of BioGPT when adapting to downstream tasks

BERT-like vs GPT-like models

BERT and GPT (Generative Pre-trained Transformer) are two popular pre-trained language models. BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model developed by Google, and GPT with its variants GPT-2 and GPT-3, was developed by OpenAI.

In the biomedical domain, BERT-based models, such as BioBERT and PubMedBERT have shown great success and have been widely adopted. They have been pre-trained on biomedical texts and are successful in tasks such as sequence classification and sequence labeling.

The main drawback of  BERT-like models is that they don’t perform well on generation tasks.

In comparison, the GPT-like models are often used in generation tasks like abstract generation, or knowledge triplet generation. BioGPT model shows remarkable abilities for biomedical text generation, being able to generate comprehensive and fluent descriptions for biomedical terms.

Conclusion and future research

The main contributions of BioGPT can be summarized as follows:

  • it is a generative pre-trained Transformer language model on biomedical domain
  • it can be used for biomedical literature text generation and mining
  • it achieved top results on 4 benchmarks: BC5CDR, KD-DTI, DDI relation extraction and PubMedQA question answering
  • it has shown to be capable of generating biomedical text better than the standard GPT that was trained on general text data

The team suggests that future studies should use BioGPT with larger biomedical datasets and test it on more downstream applications.

Learn more:

Other popular posts