Multimodal
-

Transfusion, a multi-modal model for text and image generation
Transfusion is a multi-modal AI tool designed to handle both text and images…
-

TinyChart, a powerful AI that understands charts
TinyChart is an open-source multimodal large language model specifically designed for chart understanding.…
-

Idefics2 by Hugging Face, a strong multimodal model with 8B parameters
Hugging Face has launched Idefics2, an 8B parameters multimodal model that rivals the…
-

AniPortrait generates animations from portraits and audio
AniPortrait is a new framework that creates dynamic and expressive animated portraits from…
-

Google DeepMind’s SIMA, a generalist AI gaming partner
Google DeepMind’s new Scalable Instructable Multiworld Agent (SIMA) is a cutting-edge AI that…
-

NVIDIA Canary 1B, a speech recognition and translation model
Canary is a new multilingual speech-to-text recognition and translation model from the NVIDIA…
-

StreamDiffusion is a new AI model for real-time image generation
StreamDiffusion is a new diffusion pipeline specifically tailored for real-time image generation. It…
-

TaskWeaver is a smart planning agent for data analytics
Microsoft’s TaskWeaver is a code-first framework for planning and executing complex data analytics…
-

Google launches Gemini, its most advanced AI model
On December 6, 2023, Google launched Gemini, a cutting-edge multimodal AI model that…
-

DiagrammerGPT generates better diagrams using LLMs
DiagrammerGPT is a new framework that uses large language models (LLMs) to generate…