All posts
on-device AIedge AImobile ML

On-Device AI for Sub-₹15K Android Phones: The Real Constraints

Dr Ishit Karoli
May 9, 2026
2 min read· 7 sections

On-Device AI for Sub-₹15K Android Phones: The Real Constraints

"On-device AI on Android" usually means a Pixel or a flagship Samsung in a developer’s pocket. The Indian mass-market device profile — Redmi 14C, Realme C-series, Samsung A1x — has 4GB RAM, mid-tier Snapdragon 4-series silicon, and storage that fills up monthly. Models that run beautifully on a Pixel quietly OOM-kill on these. Here is the constraint set you actually engineer to.

The numbers that matter

  • RAM budget for your model: 150–300MB peak. The OS, your app, and other background apps share the rest of 4GB. Going higher gets you killed mid-inference.
  • Disk budget: 50–100MB for the model file. Users delete apps over storage long before they delete them over RAM.
  • Cold-start budget: <800ms to first inference. Above that, users perceive the app as slow.
  • Battery budget: Sustained ML use over 5 minutes shouldn’t drop battery faster than playing video. Otherwise users blame your app.

What survives these constraints

  • Quantised vision models for OCR, document detection, simple classification. INT8 quantised MobileNet / EfficientNet-Lite class models work well.
  • Small embedding models for on-device semantic search over user-local content. 50–100MB models like the smaller sentence-transformers run fine.
  • Small classifiers and detectors for utility tasks — language detection, content safety, NSFW filter, simple intent classification.
  • Speech-to-text via OS-native APIs — Android offers offline ASR for many languages now. Use it; don’t ship your own.

What does not work and never will on this profile

On-device LLMs for free-form generation. The 1–3B parameter models that run on flagships either don’t fit, run at 1 token/second, or melt the device. Use cloud for generation, on-device for everything else. The hybrid pattern is real; the all-on-device fantasy is not, for this device class.

Tooling that we keep coming back to

  • TensorFlow Lite with NNAPI / GPU delegates — most widely supported across Indian Android chipsets.
  • ONNX Runtime Mobile — better cross-platform if you also ship iOS.
  • MediaPipe — fastest path for hand tracking, face detection, pose estimation if those fit your use case.
  • MLC LLM — only consider for flagship-tier Android, not mass market.

Testing on the right devices

If your test rack is iPhone 15 Pro and a Pixel 9, you have no idea how your app behaves for 70% of your Indian users. Build a test rack with Redmi 14C, Samsung A15, Realme C61, and Itel A70. Performance and OOM behaviour on these is what counts.

Battery behaviour matters as much as latency

Indian users keep apps installed based on battery impact rankings in settings. If your app appears in the "draining battery" view, uninstall rate spikes. Measure mAh-per-task and optimise it.

How we approach this at Velura Labs

Our Mobile App Development service ships on-device AI tuned for the Indian device profile, not just flagships. Pair with Custom LLM Applications when you need hybrid on-device + cloud inference. Read our Flutter vs React Native for the framework decision and Bharat design patterns for the broader UX framing. Talk to us if your retention drops sharply on entry-level Android — odds are an ML feature is the culprit.

Now booking Q3 2026

Let's build the
next chapter of your business.

Quick chat on WhatsApp. We'll map your highest-leverage AI bet, show you a reference architecture, and price the first slice.

80+
shipped projects
12
industries
ISO 9001:2015
certified
98.4%
CSAT