Multimodal
-
BAGEL from ByteDance, an open-source multimodal AI
Have you ever wondered how AI can generate detailed captions from images, answer…
-
Universal geometry links multimodal embedding spaces
What if all the embedding spaces used in AI — text, images, speech,…
-
Qwen3 by Alibaba, a new open-source model with hybrid reasoning
Released on April 28, 2025, Qwen3 is an open-source multimodal LLM that extends…
-
Meta’s Llama 4, advanced multimodal models with long context
Meta released Llama 4, a new suite of AI models which offers advanced…
-
InfiniteYou, photo customization with identity preservation
ByteDance introduced InfiniteYou (InfU), a powerful model that allows flexible photo modifications based…
-
Gemma 3 matches 98% DeepSeek-R1 and runs on a single GPU or TPU
Gemma 3, Google’s latest AI model, offers multi-modal capabilities and achieves 98% of…
-
Baidu released two advanced LLMs, ERNIE 4.5 and ERNIE X1
Chinese technology giant Baidu is challenging leading AI models with its most recent…
-
Cosmos simulates physical worlds for training AI systems
NVIDIA has released the Cosmos World Foundation Model Platform, an advanced AI toolkit…
-
Alibaba released Qwen2.5 with more than 100 open-source AI models
Alibaba Cloud recently announced the release of over 100 open-sourced Qwen 2.5 multimodal…
-
LLaMA-Omni lets you speak to LLMs and get instant responses
LLaMA-Omni is an open-source AI tool designed for real-time voice interaction with large…