llamafile turns any LLM into a binary easy to run and distribute

December 4, 2023

llamafile is an open source project that allows you to package your LLM and its dependencies into a single executable file that can run on six different operating systems with no installation required.

llamafile was released by Mozilla’s innovation group and Justine Tunney via the Mozilla Internet Ecosystem (MIECO) and aims to make it easier for both developers and end users to use open the source LLMs. The release was announced in this blog post.

LLMs are powerful AI systems that can generate natural language for various purposes, such as chatbots, text summarization, code generation, and more. However, they are also complex and resource-intensive, requiring a lot of dependencies, such as libraries, frameworks, weights, and hardware, to run properly. This makes it challenging for AI developers to distribute and run their LLMs on different platforms and devices, and for end users to access and use them easily. llamafile simplifies the process of running and sharing LLMs as a single binary file (called a “llamafile”).

Overview

llamafile combines two open source projects: llama.cpp and Cosmopolitan Libc to enable cross-platform and multi-OS compatibility.

llama.cpp is a popular C++ chatbot framework for LLMs that supports various models, formats, and features, such as quantization, compression, and multi-language generation.
Cosmopolitan Libc is a project that makes C a build-once run-anywhere language. It allows C programs to be compiled and run on many platforms and architectures, without requiring any external dependencies or installation.

How to use llamafile

There are two kinds of llamafiles: server llamafiles (you chat via a browser UI) and command-line llamafiles (they run in the terminal and you provide command-line arguments).

The easiest way is to download and run the example of llamafile for LLaVA 1.5, then use the browser chat UI. Follow the steps mentioned in the quickstart. Other provided binaries are Mistral-7B-Instruct and WizardCoder-Python-13B.
You can create your own llamafiles using any compatible model weights. See the instructions here.
llamafile can be used with external weights, which are separate weight files that are not embedded in the llamafile itself. This can be useful for Windows users, as it allows them to work around the 4GB executable file size limit. You can download the llamafile software without any weights included from the releases page and then use it with any external weights.

The llamafile software is released under the Apache 2.0 license and the team welcomes contributions from the community.