Multimodal
-
AniPortrait generates animations from portraits and audio
AniPortrait is a new framework that creates dynamic and expressive animated portraits from…
-
Google DeepMind’s SIMA, a generalist AI gaming partner
Google DeepMind’s new Scalable Instructable Multiworld Agent (SIMA) is a cutting-edge AI that…
-
NVIDIA Canary 1B, a speech recognition and translation model
Canary is a new multilingual speech-to-text recognition and translation model from the NVIDIA…
-
StreamDiffusion is a new AI model for real-time image generation
StreamDiffusion is a new diffusion pipeline specifically tailored for real-time image generation. It…
-
TaskWeaver is a smart planning agent for data analytics
Microsoft’s TaskWeaver is a code-first framework for planning and executing complex data analytics…
-
Google launches Gemini, its most advanced AI model
On December 6, 2023, Google launched Gemini, a cutting-edge multimodal AI model that…
-
DiagrammerGPT generates better diagrams using LLMs
DiagrammerGPT is a new framework that uses large language models (LLMs) to generate…
-
Introducing Lemur-70B and Lemur-70B-Chat: the open-source models harmonizing text and code for powerful language agents
Lemur and Lemur-Chat are openly accessible language models optimized for both natural language…
-
Multimodal foundation models: the future of AI assistants
Researchers from Google AI and Hugging Face present a comprehensive survey of multimodal…
-
Meta’s SeamlessM4T can translate and transcribe speech and text across nearly 100 languages
Meta launched SeamlessM4T (Massively Multilingual & Multimodal Machine Translation), a new AI model…