Gemma 4 QAT checkpoints bring sub-1GB local AI to mobile and laptops
Google DeepMind has released new Quantization-Aware Training (QAT) checkpoints for the Gemma 4 model family, dramatically reducing memory requirements for local deployment. The new Q4_0 and mobile-specific formats cut the Gemma 4 E2B footprint to just 1GB while preserving model quality through training-time quantization simulation. A custom mobile schema uses static activations, channel-wise quantization, and targeted 2-bit compression to optimize edge hardware performance. The checkpoints are available on Hugging Face with support for popular tools including llama.cpp, Ollama, vLLM, and LiteRT-LM for on-device deployment.