Finding information about test coverage today means opening TestRail to search cases, digging through Confluence for spec docs, asking teammates who might remember. It's slow and inconsistent. The RAG Chat tab turns your entire QA knowledge base into a single queryable interface — answered in seconds, grounded in your actual documentation.

The Problem It Solves

Ask any QA engineer what happens when someone on the team asks "do we have test coverage for the 2FA login flow?" The answer is a combination of: opening TestRail and typing in the search box, opening Confluence and hunting through nested pages, pinging someone in Slack who might know, and hoping the answer is right. This is not a knowledge problem — the documentation exists. It's an access problem. The information is scattered across three systems and a human's memory.

The Talk to Tests chat interface solves this by indexing all of that documentation — TestRail cases, Confluence pages, uploaded PDFs and Word docs — into a vector database, and letting engineers query all of it through a single conversational interface. Ask in plain English. Get an answer in seconds. Traceable to specific documents.

What Powers It: The Knowledge Base

Admin dashboard showing knowledge base stats
Admin dashboard — the knowledge base behind Talk to Tests. Documents are ingested, chunked, and embedded into ChromaDB. The system statistics block shows live document and chunk counts.

The knowledge base is built from three sources synced via the Admin panel:

TestRail sync streaming in real time
TestRail sync streaming in real time — thousands of test cases pulled across multiple projects and indexed into ChromaDB. Every case becomes a retrievable chunk.

All embeddings are stored locally in ChromaDB — no test case content or documentation leaves your infrastructure when using a local LLM. The vector database is persistent across restarts and only updated when you trigger a sync or upload a new document.

How RAG Works Here

RAG — Retrieval-Augmented Generation — means the LLM generates its answer from retrieved context, not from its training data. Here's the retrieval pipeline for every question asked in Talk to Tests:

1
Query embedding

The engineer's question is converted into a vector embedding using the same model used during document indexing. The question and the documents now live in the same vector space, making similarity comparison possible.

2
Similarity search

ChromaDB performs a cosine similarity search across all stored chunks, returning the top-K most relevant pieces. These chunks might come from TestRail cases, Confluence pages, or uploaded documents — all searched simultaneously.

3
Context injection

The retrieved chunks are injected into the LLM prompt as context, along with the original question. The prompt instructs the model to answer only from the provided context and to say "I don't know" if the answer isn't there.

4
Grounded generation

The LLM generates an answer strictly from the provided context, not from general training knowledge. This is what prevents hallucination — the answer is always traceable to a specific document or TestRail case. If the information isn't in your knowledge base, the model says so.

Why "grounded" matters

A general-purpose LLM asked "what are our card payment test cases?" will either refuse or hallucinate plausible-sounding test cases that don't exist. A RAG system asked the same question retrieves the actual cases from your TestRail instance and quotes them directly. The difference between a useful answer and a dangerous one is grounding.

Using the Chat

Talk to Tests chat interface with starter questions
Talk to Tests — the chat interface with suggested starter questions. Engineers can ask anything about test coverage, product flows, or documentation without leaving the portal.

The chat interface opens in the Customer portal under the "Talk to Tests" tab. Engineers can ask questions in plain English — the system handles the retrieval and generation. Common use cases:

A question typed into the Talk to Tests chat
A question typed into the chat — "Give me the top test cases for Card Payments in SG". The system will retrieve relevant TestRail cases and product spec chunks before generating the answer.
AI preparing a response in Talk to Tests
The AI preparing a response — grounded in the actual TestRail cases and Confluence docs indexed during sync. The "AI Thinking..." state shows the retrieval and generation pipeline running.

Responses stream back token by token, so engineers see the answer forming in real time rather than waiting for a complete response. The streaming is handled via Server-Sent Events on the Flask backend.

Talk to Tests final result

The complete Talk to Tests response — the AI surfaces the relevant test cases and documentation, grounded in the indexed TestRail and Confluence data.

Multi-LLM Support

The chat backend supports three LLM providers, configurable via environment variable. ChromaDB remains local regardless of which LLM is chosen — only the generation step changes.

OpenAI GPT-4Highest quality · cloud
Google GeminiCost-effective · cloud
Ollama (local)Fully on-premise · no data egress
Vector DBChromaDB · always local

Switch provider with LLM_PROVIDER=openai|google|ollama in your environment config. When using Ollama, the LLM runs on your machine — no data leaves your network at any stage of the pipeline. This is important for teams with strict data residency requirements.

On-premise option

Teams that cannot send test case content or product documentation to cloud LLMs can run the full stack locally: Ollama for generation, ChromaDB for vector storage, Flask for the API. Zero data egress. The trade-off is response quality — GPT-4 and Gemini produce notably more coherent answers for complex multi-document questions.

You've seen the full system

Five features, three repositories, one continuous loop — generation, authoring, triaging, healing, and now the knowledge layer that ties it all together. Read the full architecture overview or explore another feature.