Computer Vision in Indian Retail: What Works on a Crowded Shop Floor

Shelf-monitoring CV models trained on European modern-trade footage fail spectacularly in Indian retail. SKUs are stacked three-deep, lighting is mixed fluorescent and daylight, and the same brand has six pack variants on the same shelf. The CV stack you ship for India looks meaningfully different from the global one. Here is what we’ve learned shipping it.

The conditions that break off-the-shelf models

Stacking and occlusion. SKUs partially hidden by other SKUs. Detection models trained on cleanly-faced shelves under-detect 30–50% of stock.
Pack variant proliferation. One brand, six sizes, three flavours, two packaging refreshes a year. Image classification needs constant refresh.
Mixed lighting. Tube light + door daylight + phone flash. White balance drifts across the same store visit.
Phone-captured input. Reps walk the aisle and snap photos. Angles, distances, and motion blur vary wildly.

The stack that performs

SKU-level detection with India-specific fine-tuning. Start with a strong general detector (YOLO11, RT-DETR), fine-tune on locally captured shop-floor data — at least 5,000 frames across store formats. Generic models are 60–70% on Indian shelves; tuned models hit 85–92%.
Embedding-based recognition for SKU identity. Detection finds the box; embedding retrieval matches it to the SKU catalogue. Embedding gallery is updated whenever a pack refreshes — far cheaper than retraining the classifier.
Pack-variant matrix. Don’t collapse "Lays 50g classic" and "Lays 50g cream & onion" into "Lays 50g." Maintain pack-level granularity in the catalogue and the eval.
Field-app feedback loop. Reps confirm or correct detections on-device. Corrections go back to the training pipeline.

These are different problems and need different evals. Planogram compliance asks "is this SKU in its designated facing?" — needs spatial accuracy. Share-of-shelf asks "what percentage of the visible shelf is brand X?" — tolerates spatial error but needs accurate SKU classification. Build the right metrics for the question the brand is actually paying for.

On-device vs server inference

Field connectivity is variable. We default to on-device detection for immediate rep feedback (Tensorflow Lite or ONNX runtime) and server-side reprocessing for analytics-grade accuracy. The two paths use the same training data; the on-device model is quantised and pruned. Don’t make reps wait for the cloud.

The label problem nobody warns you about

Shop-floor data is expensive to label well. SKUs are tiny in frame, partial, sometimes upside-down. Plan for 8–12 seconds per box, not 2. Set up a Tier-2 city annotation team — costs are 3–5× lower than metros and the quality is comparable when supervised properly.

How we approach this at Velura Labs

Our AI & Data Solutions service handles end-to-end retail CV — data collection, annotation, model training, and on-device deployment. Pair with Mobile App Development for the rep-facing field app. Read our annotation pipelines piece for the labelling layer and our Bharat design patterns for the rep UX. Talk to us if your shelf-monitoring accuracy plateaued after the initial rollout — that’s usually the embedding gallery drifting.

Available to businesses across the United States (Washington, California, Texas, New York), Europe (France, Italy and the wider EU), the Middle East (Dubai and the Gulf) and India. Get in touch to scope your build.

Computer Vision in Indian Retail: What Works on a Crowded Shop Floor

Computer Vision in Indian Retail: What Works on a Crowded Shop Floor

The conditions that break off-the-shelf models

The stack that performs

On-device vs server inference

The label problem nobody warns you about

How we approach this at Velura Labs

Related services.

Keep reading.

Synthetic Data for Indian Languages: When It Helps and When It Hurts

The AI Observability Stack You Actually Need on Day One

Let's build the
next chapter of your business.

Computer Vision in Indian Retail: What Works on a Crowded Shop Floor

Computer Vision in Indian Retail: What Works on a Crowded Shop Floor

The conditions that break off-the-shelf models

The stack that performs

Planogram compliance vs share-of-shelf

On-device vs server inference

The label problem nobody warns you about

How we approach this at Velura Labs

Related services.

Keep reading.

Synthetic Data for Indian Languages: When It Helps and When It Hurts

The AI Observability Stack You Actually Need on Day One

Let's build thenext chapter of your business.

Let's build the
next chapter of your business.