Every engagement ships these as concrete artifacts you own — not slides, not hand-waving.
Connectors for SharePoint, Confluence, Drive, S3, databases. Incremental sync, ACL-aware so people only retrieve what they're allowed to read.
Dense (BGE / OpenAI embeddings) + sparse (BM25) + reranker (Cohere / cross-encoder) — tuned to your corpus, not stock defaults.
Golden Q&A set, RAGAS faithfulness / answer-relevance / context-precision scores, regression checks on every change.
Every answer shows source chunks with page-level deep-links. “I don’t know” is a first-class response, not a failure.
Sample your documents, profile structure, identify the long-tail formats (scanned PDFs, tables, code) that wreck naive RAG.
Build the ingestion + chunking + retrieval pipeline. Tune chunk sizes (300–600 tokens with overlap is the start, but every corpus is different).
Build the golden Q&A set, score with RAGAS, iterate on chunking, reranker, and prompts. Ship when faithfulness ≥85%.
Wire to your front-end, add ACL filtering, deploy, hand off ops runbook.
Best-in-class where it matters; boring and battle-tested everywhere else.
Cost depends on corpus size, format diversity (PDFs / scans / structured), connector count, and ACL complexity. Typical engagements run 4–8 weeks. Audit + scope conversation is free.
Usually yes, but not always. For corpora under ~50K chunks, Postgres + pgvector is often enough. Above that, Pinecone / Weaviate / Qdrant become worth the line item.
We use vision-LLM-aware parsers (LlamaParse, Docling) for structured docs, and store table content separately from prose. Naive PDF-to-text loses 30% of the signal.
Yes — we propagate ACLs from the source system into retrieval-time filters. Critical for HR, legal, and finance corpora.
Almost never. Domain-specific reranking + better chunking gets you 90% of the win at 5% of the cost.