All posts
KYCvision LLMdocument AI

KYC at Scale with Vision-LLMs: PAN, Aadhaar, Passport Extraction in Production

Dr Ishit Karoli
March 30, 2026
2 min read· 6 sections

KYC at Scale with Vision-LLMs: PAN, Aadhaar, Passport Extraction in Production

KYC document extraction was a regex-and-rules problem until 2024. By the end of 2025, vision-LLMs had quietly replaced almost every traditional OCR-plus-template pipeline in production. The accuracy lift is real, the cost is manageable, and the deployment patterns are different enough that it’s worth resetting your mental model.

Why the old stack stopped working

Traditional KYC pipelines used Tesseract or AWS Textract for OCR plus a per-document-type template to extract fields. The accuracy was acceptable on clean originals — and brittle the moment a customer scanned a slightly tilted PAN card on a phone camera in poor light. Banks accepted the tradeoff and built large human-review queues to absorb the noise.

Vision-LLMs (Claude’s vision, Gemini 2.5 multimodal, GPT-5 vision) handle document variance natively. A photo of a folded passport, a slightly water-damaged Aadhaar card, or a printed PAN with non-standard spacing — all extract reasonably without per-template engineering.

What you give up — and how to get it back

Vision-LLMs are slower (4–8 seconds per document vs sub-second for traditional OCR) and more expensive (~$0.01–0.04 per document vs sub-cent). For high-volume use cases, that matters. Two patterns recover most of the gap:

  • Hybrid pipeline. Run cheap OCR first; if confidence is high, ship the result. If confidence drops below threshold, fall back to vision-LLM. Most documents go through the cheap path.
  • Field-level confidence routing. Per extracted field, ask the model for a confidence score. High-confidence fields skip review; low-confidence ones queue.

The validation layer matters more than the extraction

Vision-LLMs sometimes confidently extract wrong values. Your pipeline must include cross-field validation:

  • Aadhaar number must pass the Verhoeff checksum.
  • PAN format must match the structural pattern (5 letters + 4 digits + 1 letter).
  • Date of birth must be plausible (between 1900 and today).
  • Name fields must match across documents (PAN name vs Aadhaar name) with appropriate fuzzy matching.

Catching extraction errors at validation is cheaper than catching them downstream — and necessary for regulatory compliance.

Citations as a compliance feature

Modern vision-LLMs can return bounding-box citations along with extracted values. Persist these in your audit log. When a regulator asks "where did this date of birth come from," you can show the exact pixel region of the source document. This is becoming a baseline ask in BFSI audits and we ship it by default.

On-prem deployment

Several Indian PSU banks now require on-prem deployment for KYC processing. Open-weight vision models (Llama 3.3 Vision, Qwen 2.5 VL) deployed on a small GPU cluster can match commercial APIs on accuracy for clean Indian KYC documents. The throughput is lower but the compliance posture is stronger.

How we approach this at Velura Labs

Our Document Processing service ships KYC pipelines using the hybrid pattern above, with citations, cross-field validation, and audit logging built in. For larger archives needing scanning before extraction, see Document Digitisation & Scanning. Read our guardrails guide for the broader compliance posture. Talk to us if your KYC review queue is bigger than it should be.

Now booking Q3 2026

Let's build the
next chapter of your business.

Quick chat on WhatsApp. We'll map your highest-leverage AI bet, show you a reference architecture, and price the first slice.

80+
shipped projects
12
industries
ISO 9001:2015
certified
98.4%
CSAT