Gemma 4 12B Brings Encoder-Free Multimodal Agentic AI to Local Devices

infoq / development 8h ago 8

Google's Gemma 4 12B is built to run agentic, multimodal AI directly on everyday laptops using a novel encoder-free architecture. Instead of separate vision and audio encoders, raw pixel patches and audio wave frames are projected straight into the decoder-only transformer, reducing latency and memory fragmentation. The model supports on-device coding, visual reasoning, and tool use via Google AI Edge, LiteRT-LM, and llama.cpp, and is available through Hugging Face and Ollama. Early users praise its local performance and context handling, though some note it excels at simpler tasks rather than replacing larger coding models.

Read full article →

More AI

Biohub open-sources AI world model for protein biology and drug design

Google to pay SpaceX $920M monthly for xAI data center GPU capacity

OpenAI Ships Million-Line Product Written Entirely by Codex Agents