Skip to main content

Searching Your Knowledge Base

Inherent provides three search modes across every document in your workspace: semantic (vector similarity), hybrid (BM25 + vector fusion), and keyword (pure BM25). Pick the mode that matches your query shape.

note

Searching requires an API key with search permission. See Authentication for details.

How Search Works

  1. Your query is routed to the selected mode (semantic by default).
  2. Semantic: the query is converted into a vector embedding and matched against chunk embeddings via Weaviate nearVector.
  3. Hybrid: BM25 term-frequency scoring is fused with vector similarity, weighted by alpha (default 0.7, vector-heavy).
  4. Keyword: pure BM25 -- no embeddings needed.
  5. Results are scored, ranked, and returned with the matching chunk content.

This means a natural-language query like "revenue growth in Q1" will find passages about "first quarter earnings increase" (semantic/hybrid), while an exact query like ErrRateLimitExceeded will find literal occurrences (keyword/hybrid).

Search Request

Send a POST request to /v1/search with a JSON body.

curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What was the revenue growth in Q1?",
"limit": 5,
"min_score": 0.3
}'

Request Parameters

ParameterTypeRequiredDefaultDescription
querystringYesThe search query. Maximum 1,000 characters.
limitintegerNo10Number of results to return. Range: 1–100.
min_scorefloatNo0.0Minimum relevance score. Range: 0.0–1.0. Results below this threshold are excluded.
document_idsstring[]NoRestrict search to specific documents by ID.
search_modestringNo"semantic"One of "semantic", "hybrid", or "keyword".
alphafloatNo0.7Hybrid fusion weight (1.0 = vector-heavy, 0.0 = keyword-heavy). Ignored unless search_mode="hybrid".
include_contextbooleanNofalseIf true, includes neighbouring chunks in context_before / context_after.
context_windowintegerNo2Chunks before AND after each match (0–5). Ignored when include_context=false.

Choosing a Search Mode

The search_mode field controls which retrieval engine the server uses. The response always echoes back search_mode so you can confirm routing.

ModeBest for
semantic (default)Natural-language questions, paraphrased queries, conceptual lookups
hybridQueries that mix prose and literals -- e.g. "rate limit error 429"
keywordExact-match queries -- error codes, identifiers, code snippets
# Semantic is the default -- no need to specify search_mode
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What was the revenue growth in Q1?",
"limit": 5
}'
Availability

Semantic and hybrid modes require chunk embeddings to be present in Weaviate. Keyword mode has full retrieval quality on all environments. Semantic/hybrid quality improves once ingestion-side chunk embeddings land (tracked as ENG-S083).

Context-aware Results

Set include_context: true to receive the chunks immediately before and after each match inside the same document. This gives your LLM coherent surrounding context without a second round-trip.

curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "What was the revenue growth in Q1?",
"limit": 5,
"include_context": true,
"context_window": 2
}'

The response for each result includes context_before and context_after arrays (up to context_window chunks on each side), and the top-level total_tokens field sums token_count across every returned chunk so you can plan LLM prompt budgets without re-tokenising client-side.

{
"query": "What was the revenue growth in Q1?",
"total_results": 1,
"processing_time_ms": 52.1,
"search_mode": "semantic",
"total_tokens": 980,
"results": [
{
"chunk_id": "chk_x1y2z3",
"document_id": "doc_abc123",
"document_name": "q1-2026-revenue-report.pdf",
"content": "Revenue grew 23% year-over-year in Q1 2026...",
"score": 0.87,
"metadata": { "quarter": "Q1-2026" },
"context_before": [
{
"chunk_id": "chk_x1y2z1",
"chunk_index": 11,
"content": "The board approved the Q1 forecast in January...",
"token_count": 195
}
],
"context_after": [
{
"chunk_id": "chk_x1y2z4",
"chunk_index": 13,
"content": "Enterprise segment drove the majority of the increase...",
"token_count": 210
}
]
}
]
}

Key rules:

  • context_window: 0 -- arrays are present but empty ([]).
  • include_context: false (default) -- context_before / context_after are null and no extra DB query runs.
  • Arrays are clamped at document edges (first/last chunk gets an empty neighbour array on that side).

Response

The response top level includes search_mode (echoed back) and total_tokens (sum of all chunk token counts, useful for LLM prompt budgeting):

{
"query": "What was the revenue growth in Q1?",
"total_results": 2,
"processing_time_ms": 48.3,
"search_mode": "semantic",
"total_tokens": 720,
"results": [
{
"chunk_id": "chk_x1y2z3",
"document_id": "doc_abc123",
"document_name": "q1-2026-revenue-report.pdf",
"content": "Revenue grew 23% year-over-year in Q1 2026, driven primarily by expansion in the enterprise segment...",
"score": 0.87,
"metadata": {
"department": "finance",
"quarter": "Q1-2026"
},
"context_before": null,
"context_after": null
},
{
"chunk_id": "chk_a4b5c6",
"document_id": "doc_def456",
"document_name": "board-deck-march-2026.pdf",
"content": "First quarter results exceeded projections by 8%, with revenue reaching $4.2M...",
"score": 0.72,
"metadata": {},
"context_before": null,
"context_after": null
}
]
}

Tuning Search Results

Precision vs. Recall

  • Higher min_score (e.g., 0.7): Fewer results, but each one is highly relevant. Good for RAG pipelines where you need precise context.
  • Lower min_score (e.g., 0.1) or omitted: More results, including tangentially related passages. Good for exploratory search.
  • Smaller limit: Return only the top matches. Useful when feeding results into an LLM with a limited context window.
  • Larger limit: Cast a wider net. Useful when you need comprehensive coverage of a topic.

Filtering by Document

Use document_ids to restrict search to specific documents. This is useful when you know which documents are relevant and want to avoid noise from unrelated content.

curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "quarterly revenue",
"document_ids": ["doc_abc123", "doc_def456"],
"limit": 10
}'

Best Practices

  • Write natural language queries for semantic/hybrid mode. "What was the revenue in Q1?" works better than "revenue Q1" because the embedding model captures more meaning from complete sentences.
  • Use keyword or hybrid mode for exact tokens. Error codes, SDK function names, and identifiers are better served by BM25 (keyword) or a fusion (hybrid) than pure vector similarity.
  • Be specific. "How did the enterprise segment perform in Q1 2026?" returns more targeted results than "enterprise performance."
  • Start with a moderate min_score. A threshold of 0.3 is a reasonable starting point. Adjust up or down based on result quality.
  • Use include_context for RAG. Passing neighbouring chunks to your LLM reduces hallucination and provides coherent passage context. Use total_tokens to stay within the model's context window.
  • Handle empty results gracefully. If results is empty, the query did not match any chunks above your min_score threshold. Try lowering the threshold, rephrasing the query, or switching modes.
  • Combine search with context retrieval. Use search to find the most relevant chunks, then use the context retrieval endpoints to get the full document context for your LLM.