On December 6, 2023, Google launched Gemini, a cutting-edge multimodal AI model that can work with different types of content, such as text, images, video, audio, and code. It comes in three sizes: Gemini Ultra, Gemini Pro, and Gemini Nano.
Gemini is a more powerful successor of LaMDA and PaLM (two previous models that focus on natural language understanding and generation). It’s a competitor to OpenAI’s GPT-4, which was released in October 2023.
Gemini is the result of a collaboration between DeepMind and Google Brain, two research teams that merged into Google DeepMind in April 2023. The new research laboratory produced many breakthroughs in the field of AI, such as:
AlphaGo | a Go-playing AI model |
AlphaFold | a protein structure prediction model |
AlphaStar | an AI model that can play the video game StarCraft II at a professional level |
AlphaCode | a code-generating AI model |
Bard | an AI chatbot that can have natural and engaging conversations with humans |
Outstanding performance
According to Google, Gemini is the most capable model to date and the first AI model to surpass human experts on the Massive Multitask Language Understanding (MMLU) benchmark.
It is a “natively multimodal” model, being pre-trained on different modalities and then fine-tuned with additional multimodal data to further refine its effectiveness.
Google DeepMind’s CEO Demis Hassabis said that Gemini combines the amazing language capabilities of other Google LLMs with some capabilities of AlphaGo-type systems, such as planning and problem-solving.
Google did not disclose the details of Gemini’s architecture, size, and training data, but it claimed to be their largest and most capable model to date.
Google tested and evaluated Gemini’s performance on a wide range of tasks and benchmarks. These benchmarks measure Gemini’s ability to understand and generate natural language, images, audio, and video, as well as to perform mathematical reasoning and logic. The results show that Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in LLM research and development.
Gemini Ultra surpasses human experts and GPT-4 on MMLU, which is a challenging benchmark that measures the model’s world knowledge and problem-solving skills on 57 subjects, such as math, physics, history, law, medicine and ethics. Gemini Ultra scored 90.0% on MMLU, which is higher than the human expert score of 89.8% and the GPT-4 score of 86.4%. This means that Gemini Ultra has given more correct answers than human experts and GPT-4 on these subjects.
Google has also introduced a new benchmark approach to MMLU, which enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions.
High flexibility
According to Google, Gemini is the most adaptable model ever – it can run efficiently on any device, from cloud servers to smartphones.
The first version of Gemini comes in 3 sizes:
- Gemini Ultra – the largest and most capable model for highly complex tasks will debut next year
- Gemini Pro – the best model for scaling across a wide range of tasks
- Gemini Nano – the most efficient model for on-device tasks
Safety
Google tries to identify and mitigate any potential biases, errors, toxicity assessments, or vulnerabilities in Gemini, at each stage of its development.
The team conducted novel research into potential risk areas like cyber-offense, persuasion and autonomy, and have applied Google Research’s best-in-class adversarial testing techniques to help identify critical safety issues in advance of Gemini’s deployment.
Google recognizes that Gemini may face some challenges and risks, such as bias, misuse, and adversarial attacks. It may also have some social and ethical implications, such as the impact on human creativity, communication, and employment.
How to access Gemini
You can try Gemini inside Bard, which is upgraded with a fine-tuned version of Gemini Pro. It enhances Bard’s capabilities in reasoning, planning, understanding and more. This is the most significant improvement to Bard since its launch. It will be accessible in English across more than 170 countries and territories, and Google plans to extend it to different modes, languages, and regions in the future.
The team is also introducing Gemini to Pixel 8 Pro (the first phone designed to run Gemini Nano).
Gemini will also be available in more of Google’s products and services like Search, Ads, Chrome and Duet AI in the next few months. For example, Gemini will allow users to search for information using multiple modalities, such as text, images, or voice.
Google is already experimenting with Gemini in Search, making the Search Generative Experience (SGE) faster for users, with a 40% reduction in latency in English in the U.S., along with quality enhancements.
Gemini for developers
Google is currently working on making it available to developers through the Google Cloud API, starting from December 13, 2023. Once Gemini is available through the API, in Google AI Studio or Google Cloud Vertex AI, developers will be able to integrate it into their own applications.