Title: Blazing fast on-device GenAI with LiteRT-LM - Google Developers Blog
Open Graph Title: Blazing fast on-device GenAI with LiteRT-LM
X Title: Google for Developers Blog - News about Web, Mobile, AI and Cloud
Description: Google AI Edge’s LiteRT-LM provides a production-proven, highly optimized infrastructure for running Gemma 4 across cross-platform mobile and edge environments. It actively unlocks the model's native multimodal and agentic features on-device by utilizing memory-efficient dynamic loading, Multi-Token Prediction for up to a 2.2x speedup, and advanced orchestration tools like Thinking Mode and Constrained Decoding. Furthermore, the engine is rapidly expanding its integration surfaces beyond Android, introducing new native Swift APIs for Apple ecosystems and WebGPU-accelerated JavaScript APIs for high-performance, serverless browser inference.
Mail addresses
name@example.com?subject=Check out this site&body=Check out {url}
Domain: googledevelopers.blogspot.com
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [{
"@type": "ListItem",
"position": 1,
"name": "Google for Developers Blog",
"item": "https://developers.googleblog.com/"
},{
"@type": "ListItem",
"position": 2,
"name": "Blazing fast on-device GenAI with LiteRT-LM",
"item": "https://developers.googleblog.com/blazing-fast-on-device-genai-with-litert-lm/"
}]
}
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Blazing fast on-device GenAI with LiteRT-LM",
"description": "Google AI Edge’s LiteRT-LM provides a production-proven, highly optimized infrastructure for running Gemma 4 across cross-platform mobile and edge environments. It actively unlocks the model's native multimodal and agentic features on-device by utilizing memory-efficient dynamic loading, Multi-Token Prediction for up to a 2.2x speedup, and advanced orchestration tools like Thinking Mode and Constrained Decoding. Furthermore, the engine is rapidly expanding its integration surfaces beyond Android, introducing new native Swift APIs for Apple ecosystems and WebGPU-accelerated JavaScript APIs for high-performance, serverless browser inference.",
"image": "https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/may2026_liteRT-LM_v2_2x.2e16d0ba.fill-800x400.png",
"datePublished": "2026-05-19",
"author": [
{ "@type": "Person", "name": "Tenghui Zhu", "url": "/search/?author=Tenghui+Zhu" },
{ "@type": "Person", "name": "Yu-hui Chen", "url": "/search/?author=Yu-hui+Chen" },
{ "@type": "Person", "name": "Ram Iyengar", "url": "/search/?author=Ram+Iyengar" }
]
}
| twitter:card | summary_large_image |
| og:image | https://storage.googleapis.com/gweb-developer-goog-blog-assets/images/Gemini_Generated_Image_7r4n957r4n.2e16d0ba.fill-1200x600.jpg |
Links:
Viewport: width=device-width, initial-scale=1