Gemma 4 QAT checkpoints bring sub-1GB local AI to mobile and laptops

tldr / ai 20h ago 8

Google DeepMind has released new Quantization-Aware Training (QAT) checkpoints for the Gemma 4 model family, dramatically reducing memory requirements for local deployment. The new Q4_0 and mobile-specific formats cut the Gemma 4 E2B footprint to just 1GB while preserving model quality through training-time quantization simulation. A custom mobile schema uses static activations, channel-wise quantization, and targeted 2-bit compression to optimize edge hardware performance. The checkpoints are available on Hugging Face with support for popular tools including llama.cpp, Ollama, vLLM, and LiteRT-LM for on-device deployment.

Read full article →

More AI

Biohub open-sources AI world model for protein biology and drug design

Google to pay SpaceX $920M monthly for xAI data center GPU capacity

OpenAI Ships Million-Line Product Written Entirely by Codex Agents