Plug AI into your docs.
An AI assistant that queries your internal documents in real time, cites its sources, respects your ACLs. Confluence, Notion, SharePoint, Google Drive, your own database: we index what you have, we don't ask you to migrate.
What's broken.
A company's knowledge sits scattered across ten tools. Wikis are dead, drives overflow, tacit knowledge walks out with senior leavers. Off-the-shelf SaaS (Glean, Mendable, Microsoft Copilot for M365) are either expensive or non-EU, and homemade RAGs stall at the prototype stage. Here's what we keep finding.
- 01
Your knowledge workers lose 1 hour a day searching for info
McKinsey and Gartner studies converge: a knowledge worker spends 19 to 25% of their time searching for information. On a 100-person company, that's the equivalent of 20 FTEs doing search instead of producing.
- 02
Knowledge bases are graveyards
Everyone knows you should document. No one knows how to find anything back. The internal wiki search bar is universally hated. The useful pages date from 2022 and no one updates them.
- 03
Knowledge walks out with the people
When a senior leaves, their tacit expertise (historical trade-offs, the why-we-do-it-this-way) is nowhere in the docs. Three months later, the team can't reproduce their decisions and stumbles.
- 04
SaaS solutions don't pass compliance
Glean, Mendable, Notion AI, Microsoft Copilot for M365 are SaaS, often non-EU, with data processing terms that don't fly for compliance-heavy mid-market. And making third-party platforms compliant is rarely possible.
- 05
80% of homemade RAGs stall at POC
Bad chunking, wrong embeddings, no reranking, no proper eval set. Homemade RAGs work in demos and collapse in prod: 50% hallucinations, 5-second latency, ignored ACLs. It's the graveyard of internal AI projects.
How we do it.
RAG is a data quality project, not an AI project. The LLM is almost interchangeable, what makes the difference is everything upstream. Here's how we approach the topic.
70% of the work is upstream of the LLM
Ingestion, cleaning, semantic chunking, metadata enrichment, deduplication. That's what makes RAG quality. The choice of LLM downstream (GPT, Claude, Mistral) accounts for maybe 10% of the final result.
Incremental indexing, not batch
Your docs change every day. The index has to keep up in real time. We hook into Confluence, Notion, Drive, Slack webhooks to reindex as soon as a document is modified. No midnight cron that lags new content by a day.
Hybrid retrieval and reranking are mandatory
Pure vector similarity misses too many short or technical queries. We mix BM25 (lexical) with dense retrieval, then rerank with Cohere, Voyage, or a local model. Typical gain: 20 to 30% precision improvement on top-K.
Systematic evaluation with your domain experts
We build with your experts a proprietary eval set of 100 to 500 reference questions. Precision@k, hallucination rate, latency: everything is measured on this eval at every deployment. No silent regressions.
Sovereign by default
Self-hosted vector store (Qdrant, Weaviate, pgvector) or sovereign cloud SecNumCloud. Local embeddings available (BGE-M3, Mistral Embed). Local LLM on on-premise GPU possible. No data leaves the EU if that's your requirement.
Native ACL enforcement
Filtering at indexing and query time. The agent only sees documents the current user has access to in the source. ACL synchronization via SSO or via the source's native APIs (Confluence, SharePoint).
Shipped on a real mission.
For Peps Digital, we shipped a RAG assistant plugged into all the product documentation. Incremental indexing, systematic source citation, hallucination guardrails. 80% of questions get an answer without human, and the index follows product changes in real time.
80% of customer support digitized
A RAG-powered AI chatbot integrated into the Peps Digital platform, answering PSDM users' questions directly from the interface, 24/7.
Our process.
Inventory and source mapping
We list your documentary sources, measure their volume, quality, and update cycle. We identify priority sources and those to exclude (obsolete docs, confidential scope). You walk out with an actionable map, independently of the AI project.
Ingestion and indexing pipeline
Build the pipeline that extracts, cleans, chunks, embeds, and indexes. Choice of vector store, embedding model, semantic chunking strategy. Setup of reranking and metadata (author, date, type, ACL).
Build the eval set with your domain team
2 to 3-day workshop with your experts to build a reference question-answer set. It's the measurement grid for all future deployments. Without an eval set, you fly blind.
Pilot, instrumentation, industrialization
Deployment on a restricted channel (one team, one doc segment). Full instrumentation (latency, hallucinations, user satisfaction). Iteration on chunks and retrieval. Progressive rollout to other sources.
Frequently asked questions.
Got a question before we go further? Reach out directly.
01What documentary sources can be indexed?
Anything that's text or structured. Confluence, Notion, SharePoint, Google Drive, GitHub wiki, Slack messages, Linear, Jira, Salesforce notes, PDF or DOCX files on a NAS, your own database. We've shipped every combination on missions.
02What document volume is manageable?
From a few thousand to several million documents without architectural concerns. Beyond that, we move to sharding and hierarchical retrieval, but it's doable. Infra cost scales linearly with indexed volume.
03How do you handle permissions and ACLs?
Two-level filtering: at indexing (sensitive docs are never indexed or are marked), and at query time (the agent only sees documents the user can access in the source). Sync via SSO and source APIs. No risk of lateral leakage.
04How do you guarantee GDPR compliance?
Self-hosted vector store (Qdrant, Weaviate, pgvector) or SecNumCloud-certified cloud. Embeddings computed locally with BGE-M3 or Mistral Embed to expose nothing. Local LLM on on-premise GPU possible. No data leaves the EU if that's your requirement.
05How long to ship?
For a working POC on a single source with a simple consumption channel, 3 to 4 weeks. For a multi-source production deployment with proper eval set and instrumentation, plan for 8 to 12 weeks depending on complexity.
06Which models do you use?
For the LLM: GPT-4o, Claude Sonnet, or Mistral Large depending on the case. For embeddings: OpenAI text-embedding-3 or BGE-M3 locally. For reranking: Cohere Rerank, Voyage Rerank, or a local model. The choice is informed during the audit based on your constraints (GDPR, latency, budget).
07How do you limit hallucinations?
Systematic source citation. Explicit refusal to answer when retrieved context isn't relevant. Hallucination rate measured in the eval set at every deployment. On comparable missions we sit under 2% in production. See also our insight on AI agent security in 2026.
Local LLMs in 2026: which open-source model to pick for your enterprise
Mistral, Llama, Qwen, DeepSeek, Gemma: the local LLM landscape is rich but tricky to navigate. Our pragmatic buyer's guide for enterprises planning an on-premise deployment.
Sovereign note-takers: why big enterprises want 100% local
Large enterprises refuse consumer SaaS note-takers that ship their sensitive meetings to public LLMs. We build custom, fully local alternatives.
AI agent security: the real attack surface in 2026
Prompt injection, tool poisoning, silent exfiltration, RAG poisoning. What can break an AI agent in production today, and the layered defense framework we apply on every engagement.
Ready to
automate everything
We listen. We analyze. We build. With you.