Multimodal
-
Alibaba released Qwen2.5 with more than 100 open-source AI models
Alibaba Cloud recently announced the release of over 100 open-sourced Qwen 2.5 multimodal…
-
LLaMA-Omni lets you speak to LLMs and get instant responses
LLaMA-Omni is an open-source AI tool designed for real-time voice interaction with large…
-
Transfusion, a multi-modal model for text and image generation
Transfusion is a multi-modal AI tool designed to handle both text and images…
-
TinyChart, a powerful AI that understands charts
TinyChart is an open-source multimodal large language model specifically designed for chart understanding.…
-
Idefics2 by Hugging Face, a strong multimodal model with 8B parameters
Hugging Face has launched Idefics2, an 8B parameters multimodal model that rivals the…
-
AniPortrait generates animations from portraits and audio
AniPortrait is a new framework that creates dynamic and expressive animated portraits from…
-
Google DeepMind’s SIMA, a generalist AI gaming partner
Google DeepMind’s new Scalable Instructable Multiworld Agent (SIMA) is a cutting-edge AI that…
-
NVIDIA Canary 1B, a speech recognition and translation model
Canary is a new multilingual speech-to-text recognition and translation model from the NVIDIA…
-
StreamDiffusion is a new AI model for real-time image generation
StreamDiffusion is a new diffusion pipeline specifically tailored for real-time image generation. It…
-
TaskWeaver is a smart planning agent for data analytics
Microsoft’s TaskWeaver is a code-first framework for planning and executing complex data analytics…