Searching Your Knowledge Base

Inherent provides three search modes across every document in your workspace: semantic (vector similarity), hybrid (BM25 + vector fusion), and keyword (pure BM25). Pick the mode that matches your query shape.

note

Searching requires an API key with search permission. See Authentication for details.

How Search Works

Your query is routed to the selected mode (semantic by default).
Semantic: the query is converted into a vector embedding and matched against chunk embeddings via Weaviate nearVector.
Hybrid: BM25 term-frequency scoring is fused with vector similarity, weighted by alpha (default 0.7, vector-heavy).
Keyword: pure BM25 -- no embeddings needed.
Results are scored, ranked, and returned with the matching chunk content.

This means a natural-language query like "revenue growth in Q1" will find passages about "first quarter earnings increase" (semantic/hybrid), while an exact query like ErrRateLimitExceeded will find literal occurrences (keyword/hybrid).

Search Request

Send a POST request to /v1/search with a JSON body.

cURL
Python
JavaScript

curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What was the revenue growth in Q1?",
    "limit": 5,
    "min_score": 0.3
  }'

import requests
import os

response = requests.post(
    "https://api.inherent.sh/v1/search",
    headers={
        "X-API-Key": os.environ["INHERENT_API_KEY"],
        "Content-Type": "application/json",
    },
    json={
        "query": "What was the revenue growth in Q1?",
        "limit": 5,
        "min_score": 0.3,
    },
)

results = response.json()["results"]
for r in results:
    print(f"[{r['score']:.2f}] {r['document_name']}: {r['content'][:100]}")

const response = await fetch("https://api.inherent.sh/v1/search", {
  method: "POST",
  headers: {
    "X-API-Key": process.env.INHERENT_API_KEY,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    query: "What was the revenue growth in Q1?",
    limit: 5,
    min_score: 0.3,
  }),
});

const { results } = await response.json();
results.forEach((r) => {
  console.log(`[${r.score.toFixed(2)}] ${r.document_name}: ${r.content.slice(0, 100)}`);
});

Request Parameters

Parameter	Type	Required	Default	Description
`query`	string	Yes	—	The search query. Maximum 1,000 characters.
`limit`	integer	No	10	Number of results to return. Range: 1–100.
`min_score`	float	No	0.0	Minimum relevance score. Range: 0.0–1.0. Results below this threshold are excluded.
`document_ids`	string[]	No	—	Restrict search to specific documents by ID.
`search_mode`	string	No	`"semantic"`	One of `"semantic"`, `"hybrid"`, or `"keyword"`.
`alpha`	float	No	`0.7`	Hybrid fusion weight (`1.0` = vector-heavy, `0.0` = keyword-heavy). Ignored unless `search_mode="hybrid"`.
`include_context`	boolean	No	`false`	If `true`, includes neighbouring chunks in `context_before` / `context_after`.
`context_window`	integer	No	`2`	Chunks before AND after each match (0–5). Ignored when `include_context=false`.

Choosing a Search Mode

The search_mode field controls which retrieval engine the server uses. The response always echoes back search_mode so you can confirm routing.

Mode	Best for
`semantic` (default)	Natural-language questions, paraphrased queries, conceptual lookups
`hybrid`	Queries that mix prose and literals -- e.g. "rate limit error `429`"
`keyword`	Exact-match queries -- error codes, identifiers, code snippets

Semantic (default)
Hybrid
Keyword

# Semantic is the default -- no need to specify search_mode
curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What was the revenue growth in Q1?",
    "limit": 5
  }'

# Hybrid: BM25 fused with vector similarity
# alpha=0.7 (default) = vector-weighted; alpha=0.0 = pure keyword
curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "ErrRateLimitExceeded connection reset",
    "search_mode": "hybrid",
    "alpha": 0.7,
    "limit": 5
  }'

# Pure BM25 -- great for identifiers and error codes
curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "ink_",
    "search_mode": "keyword",
    "limit": 10
  }'

Availability

Semantic and hybrid modes require chunk embeddings to be present in Weaviate. Keyword mode has full retrieval quality on all environments. Semantic/hybrid quality improves once ingestion-side chunk embeddings land (tracked as ENG-S083).

Context-aware Results

Set include_context: true to receive the chunks immediately before and after each match inside the same document. This gives your LLM coherent surrounding context without a second round-trip.

curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What was the revenue growth in Q1?",
    "limit": 5,
    "include_context": true,
    "context_window": 2
  }'

The response for each result includes context_before and context_after arrays (up to context_window chunks on each side), and the top-level total_tokens field sums token_count across every returned chunk so you can plan LLM prompt budgets without re-tokenising client-side.

{
  "query": "What was the revenue growth in Q1?",
  "total_results": 1,
  "processing_time_ms": 52.1,
  "search_mode": "semantic",
  "total_tokens": 980,
  "results": [
    {
      "chunk_id": "chk_x1y2z3",
      "document_id": "doc_abc123",
      "document_name": "q1-2026-revenue-report.pdf",
      "content": "Revenue grew 23% year-over-year in Q1 2026...",
      "score": 0.87,
      "metadata": { "quarter": "Q1-2026" },
      "context_before": [
        {
          "chunk_id": "chk_x1y2z1",
          "chunk_index": 11,
          "content": "The board approved the Q1 forecast in January...",
          "token_count": 195
        }
      ],
      "context_after": [
        {
          "chunk_id": "chk_x1y2z4",
          "chunk_index": 13,
          "content": "Enterprise segment drove the majority of the increase...",
          "token_count": 210
        }
      ]
    }
  ]
}

Key rules:

context_window: 0 -- arrays are present but empty ([]).
include_context: false (default) -- context_before / context_after are null and no extra DB query runs.
Arrays are clamped at document edges (first/last chunk gets an empty neighbour array on that side).

Response

The response top level includes search_mode (echoed back) and total_tokens (sum of all chunk token counts, useful for LLM prompt budgeting):

{
  "query": "What was the revenue growth in Q1?",
  "total_results": 2,
  "processing_time_ms": 48.3,
  "search_mode": "semantic",
  "total_tokens": 720,
  "results": [
    {
      "chunk_id": "chk_x1y2z3",
      "document_id": "doc_abc123",
      "document_name": "q1-2026-revenue-report.pdf",
      "content": "Revenue grew 23% year-over-year in Q1 2026, driven primarily by expansion in the enterprise segment...",
      "score": 0.87,
      "metadata": {
        "department": "finance",
        "quarter": "Q1-2026"
      },
      "context_before": null,
      "context_after": null
    },
    {
      "chunk_id": "chk_a4b5c6",
      "document_id": "doc_def456",
      "document_name": "board-deck-march-2026.pdf",
      "content": "First quarter results exceeded projections by 8%, with revenue reaching $4.2M...",
      "score": 0.72,
      "metadata": {},
      "context_before": null,
      "context_after": null
    }
  ]
}

Tuning Search Results

Precision vs. Recall

Higher min_score (e.g., 0.7): Fewer results, but each one is highly relevant. Good for RAG pipelines where you need precise context.
Lower min_score (e.g., 0.1) or omitted: More results, including tangentially related passages. Good for exploratory search.
Smaller limit: Return only the top matches. Useful when feeding results into an LLM with a limited context window.
Larger limit: Cast a wider net. Useful when you need comprehensive coverage of a topic.

Filtering by Document

Use document_ids to restrict search to specific documents. This is useful when you know which documents are relevant and want to avoid noise from unrelated content.

curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "quarterly revenue",
    "document_ids": ["doc_abc123", "doc_def456"],
    "limit": 10
  }'

Best Practices

Write natural language queries for semantic/hybrid mode. "What was the revenue in Q1?" works better than "revenue Q1" because the embedding model captures more meaning from complete sentences.
Use keyword or hybrid mode for exact tokens. Error codes, SDK function names, and identifiers are better served by BM25 (keyword) or a fusion (hybrid) than pure vector similarity.
Be specific. "How did the enterprise segment perform in Q1 2026?" returns more targeted results than "enterprise performance."
Start with a moderate min_score. A threshold of 0.3 is a reasonable starting point. Adjust up or down based on result quality.
Use include_context for RAG. Passing neighbouring chunks to your LLM reduces hallucination and provides coherent passage context. Use total_tokens to stay within the model's context window.
Handle empty results gracefully. If results is empty, the query did not match any chunks above your min_score threshold. Try lowering the threshold, rephrasing the query, or switching modes.
Combine search with context retrieval. Use search to find the most relevant chunks, then use the context retrieval endpoints to get the full document context for your LLM.

How Search Works​

Search Request​

Request Parameters​

Choosing a Search Mode​

Context-aware Results​

Response​

Tuning Search Results​

Precision vs. Recall​

Filtering by Document​

Best Practices​