OpenAI has recently unveiled GPT-4 (Generative Pre-trained Transformer 4), a multimodal large language model (LLM) capable of processing both image and text inputs and generating text outputs.

GPT-4 surpasses its predecessor, GPT-3, not only in its ability to generate text that sounds more natural and to solve problems with greater precision, but also in its capability to process visual inputs.
GPT-4’s capability of understanding images is shown below:
User’s prompt: “What is unusual about this image?”
GPT-4: “The unusual thing about this image is that a man is ironing clothes on an ironing board attached to the roof of a moving taxi.”
GPT-4 exhibits impressive performance that’s comparable to human-level on diverse professional and academic benchmarks. It successfully passed a simulated version of the Uniform Bar Examination with a score in the top 10% of test takers. The exam questions consisted of both multiple choice and free-response questions, with images and separate prompts designed for each format.
GPT – main features of each version
There are currently four versions of GPT that have been released by OpenAI. Here is a brief overview of each version:
GPT-1: Released in 2018, GPT-1 was the first version of the GPT series. It consisted of a transformer architecture with 12 transformer layers, 117 million parameters, and was trained on a dataset of 40 GB of text. GPT-1 was primarily designed for language modeling, and it demonstrated significant improvements in language generation compared to earlier models.
GPT-2: Released in 2019, the second version of GPT is a larger and more powerful language model, with 1.5 billion parameters. GPT-2 demonstrated significant improvements over GPT-1 in language generation tasks, such as story writing and machine translation.
GPT-3: It was another significant upgrade in terms of size and performance. Released in 2020, its architecture comprised 175 billion parameters and 96 layers. GPT-3 has shown impressive performance on a variety of language tasks, such as language translation, text completion, and question-answering.
GPT-4: The latest model of GPT is currently the largest and most powerful model in the GPT series. Out of a total of 5,214 prompts that were presented to both GPT-3.5 and GPT-4, the responses generated by GPT-4 were considered to be better or more preferable than those generated by GPT-3.5 on 70.2% of the prompts.
Limitations of GPT-4
While GPT-4 is a significant improvement over previous models, it still has limitations and should not be relied upon entirely. It may produce errors and “hallucinate” facts, so caution should be exercised when using its outputs, especially in high-stakes contexts. GPT-4 has a limited context window and does not learn from experience.
Conclusion, future improvements
With the release of GPT-4, numerous researchers and AI enthusiasts are now able to thoroughly investigate the model’s strengths and weaknesses.
Those interested in implementing GPT-4 for their applications can request access, while those looking to engage in conversation with the program must subscribe to ChatGPT Plus. This premium service, priced at $20 per month, offers users the option to “talk” with a chatbot powered by either GPT-3.5 or GPT-4.
It should be noted that the increased capability of GPT-4 presents new risks, and efforts have been made to understand and improve its safety and alignment. While more work remains to be done, GPT-4 represents a significant step towards broadly useful and safely deployed AI systems.
Learn more:
- Announcement article
- Research paper: “GPT-4 Technical Report” (on arXiv)
- GPT-4 System Card
- GPT-4 Developer Livestream (demo showcasing some of its capabilities/limitations)







