The Technology Innovation Institute (TII) launched Falcon 180B, which is a scaled-up version of Falcon 40B. With 180 billion parameters, it is the largest openly available large language model (LLM), being comparable in performance to Google’s PaLM 2 and OpenAI’s GPT-4.
Falcon 180B is the highest-performing open-source LLM on the Hugging Face Leaderboard. It outperforms other large language models on a variety of tasks, including machine translation, text summarization, and question answering.
Falcon 180B inherits the multiquery attention technique from Falcon 40B, but also introduces some other improvements, such as:
- FlashAttention (a faster and more efficient way to compute attention)
- Grouped-query attention (a compromise between multiquery and multi-head attention that uses an intermediate number of key-value heads)
Using Amazon SageMaker, Falcon 180B was trained on a massive dataset of 3.5 trillion tokens from TII’s RefinedWeb dataset in a single-epoch pretraining process, which is the longest for any public model.
How good is Falcon 180B?
Falcon 180B surpasses Llama 2 and GPT-3.5 on various natural language understanding tasks, including machine translation, text summarization, and question answering. It also matches the performance of PaLM 2-Large, a private LLM from Google that powers Bard, on several challenging benchmarks.
Falcon 180B is the best public large language model (LLM) that has been pre-trained, scoring 68.74 on the Hugging Face Leaderboard. It beats Google’s PaLM and Meta’s LLaMA 2, which has a score of 67.35 (see the table below).
|Feature||Falcon 180B||Llama 2|
|Number of parameters||180 billion||70 billion|
|Dataset size||3.5 trillion tokens||1.5 trillion tokens|
|Pre-training time||~7 million GPU hours||~1.5 million GPU hours|
|Number of GPUs used||Up to 4096||Up to 1024|
|License||Apache License 2.0||MIT License|
As we can see, Falcon 180B used 4 times more compute power (up to 4096 GPUs) and 2.3 times more data than Llama 2 for training. This resulted in a larger and more diverse model that outperforms Llama 2.
Falcon 180B is a powerful open-access LLM that can generate high-quality text and answer complex questions.
Anyone can use it, without having to pay any licensing fees. This makes it a more affordable option than other LLMs, such as GPT-4.
The model is still under development and the community can further improve Falcon 180B by fine-tuning it on specific domains and datasets.
Release announcement: “Spread Your Wings: Falcon 180B is here” (on Hugging Face)