Multimodal

Transfusion, a multi-modal model for text and image generation

Transfusion is a multi-modal AI tool designed to handle both text and images…

September 20, 2024
TinyChart, a powerful AI that understands charts

TinyChart is an open-source multimodal large language model specifically designed for chart understanding.…

May 6, 2024
Idefics2 by Hugging Face, a strong multimodal model with 8B parameters

Hugging Face has launched Idefics2, an 8B parameters multimodal model that rivals the…

April 19, 2024
AniPortrait generates animations from portraits and audio

AniPortrait is a new framework that creates dynamic and expressive animated portraits from…

April 7, 2024
Google DeepMind’s SIMA, a generalist AI gaming partner

Google DeepMind’s new Scalable Instructable Multiworld Agent (SIMA) is a cutting-edge AI that…

March 18, 2024
NVIDIA Canary 1B, a speech recognition and translation model

Canary is a new multilingual speech-to-text recognition and translation model from the NVIDIA…

February 17, 2024
StreamDiffusion is a new AI model for real-time image generation

StreamDiffusion is a new diffusion pipeline specifically tailored for real-time image generation. It…

January 13, 2024
TaskWeaver is a smart planning agent for data analytics

Microsoft’s TaskWeaver is a code-first framework for planning and executing complex data analytics…

December 20, 2023
Google launches Gemini, its most advanced AI model

On December 6, 2023, Google launched Gemini, a cutting-edge multimodal AI model that…

December 7, 2023
DiagrammerGPT generates better diagrams using LLMs

DiagrammerGPT is a new framework that uses large language models (LLMs) to generate…

November 2, 2023