Skip to main content
IBM GoldenRetriever
IBM GoldenRetriever

Ask your documents.
Get answers.

Upload documents into collections, build a RAG pipeline automatically, and let AI agents answer questions grounded in your data. With inline citations, multi-format parsing, and six LLM providers — enterprise document intelligence in one platform.

Document collections RAG pipeline Inline citations 6 LLM providers Vector memory MCP tools AI agents Enterprise auth
Platform status
API Storage Redis Vector DB Scheduler
API docs

Retrieval-Augmented Generation

From documents to answers,
in four steps.

01

Upload documents

Drop PDFs, CSVs, HTML, JSON, or plain text into a collection. Multi-format parsing extracts clean text automatically.

02

Chunk & embed

Text is split into overlapping chunks and converted to vector embeddings using sentence transformers — stored for fast similarity search.

03

Retrieve context

When a question arrives, the most relevant chunks are found via semantic search and assembled into a context window for the LLM.

04

Generate answer

The LLM produces a grounded answer with inline citations pointing back to exact source documents and character ranges.

Document ingestion

Upload anything.
We handle the rest.

Drop files into a collection and the pipeline parses, chunks, embeds, and indexes them automatically. No preprocessing scripts, no format conversion — just upload and ask.

  • PDF documents with page-level navigation in the viewer
  • Spreadsheets — CSV and XLSX with row-level chunking
  • Web content — HTML pages and scraped URLs
  • Structured data — JSON and JSONLINES
  • Text — Markdown, plain text, and rich documents
Upload documents

Retrieval-Augmented Generation

Answers grounded in
your actual data.

RAG combines document retrieval with LLM generation. Instead of relying on the model's training data, the system searches your uploaded documents, finds the most relevant passages, and generates an answer that cites exact sources — so you can verify every claim.

  • Retrieve — your question is embedded and matched against document chunks via semantic similarity
  • Augment — the top matching passages are injected into the LLM prompt as context
  • Generate — the LLM produces an answer constrained to the provided context, with inline citations
  • Eliminates hallucination — answers come from your documents, not the model's imagination
Try collection Q&A

Document collections

Your documents become
a searchable knowledge base.

Create collections, upload files in any format, and instantly get a fully indexed knowledge base. Ask questions directly or attach a collection to an agent for grounded conversations.

  • Multi-format ingestion — PDF, CSV, XLSX, HTML, JSON, Markdown, plain text
  • Automatic chunking with configurable overlap and size
  • Semantic, keyword, and hybrid search modes
  • Streaming Q&A with inline source citations and document viewer
  • Per-collection embedding model and LLM selection
Manage collections
Remote MCP

MCP protocol servers

Connect any remote MCP endpoint. The agent gets structured, typed tools it can call directly — full tool schema, parallel execution, iteration tracking.

STDIO transport HTTP/SSE transport Parallel tool calls
API

REST API wrapper

Point to any REST base URL. The agent gets a call_api tool to reach any endpoint with full method and payload control — no MCP server required.

Any HTTPS endpoint call_api("POST", "/run") No extra setup

Tools

Two ways to connect
external capabilities.

Attach tool servers to any agent. Tool calls show live in the chat UI — the agent's reasoning, which tools it used, and what they returned is always visible.

Manage tools

Multi-provider

Use the model that fits the task.

Each agent independently selects its provider and model. One interface, any model underneath.

Anthropic Claude
OpenAI GPT
AWS Bedrock
IBM WatsonX
Ollama (local)
llama.cpp (local)
Per-agent model selection
Every agent independently selects its provider and model from a configurable list.
Streaming responses
All providers stream token-by-token via SSE. Tool calls and reasoning render live.
OIDC authentication
Enterprise SSO via AWS Cognito or any OIDC provider. User context tracked per session.
Docker-native scaling
Stateless workers behind a load balancer. Redis + PostgreSQL + Vector DB for shared state.

Analytics

Understand how your agents
are being used.

Track conversation volume, model usage, tool calls, and response patterns across your entire agent fleet. Scheduled runs, channel traffic, and memory usage — all in one dashboard.

  • Conversation and token usage over time
  • Per-agent and per-channel activity breakdown
  • Tool call frequency and success rates
  • Full session history with message-level replay
View analytics

Collections

Build a knowledge base

Upload documents, configure search modes, and get AI-generated answers with inline citations — all from one interface.

Create collection

Agents

AI-powered assistants

Build agents with system instructions, tools, skills, and knowledge. Attach a collection for grounded document Q&A.

Create agent

Chat

Start a conversation

Multi-turn streaming conversations with any agent. File uploads, tool visibility, and collection-backed Q&A built in.

Open chat
API Keys

Create New Key

Name your key to identify its purpose. Keys can only be viewed once - copy it immediately after creation.

Your Keys

Loading…

Tutorial
Tutorials
Getting Started Create a collection, upload documents, build an agent, configure tools & skills, and test in chat — with API references at every step
Advanced Skills Master template variables, auto-load, context-aware skills, and advanced patterns — includes API endpoints for each feature