Skip to main content

Search

Perform semantic search across all documents in your workspace.

Need auth and endpoint context first? See the API Overview and Authentication guides.

POST /v1/search

Authentication

Requires an API key with the search permission.

HeaderValue
X-API-Keyink_live_abc123...
Content-Typeapplication/json

Request Body

FieldTypeRequiredDefaultDescription
querystringYes--The search query. Must be between 1 and 1000 characters.
limitintegerNo10Maximum number of results to return. Range: 1--100.
min_scorefloatNo0.0Minimum similarity score threshold. Range: 0.0--1.0.
document_idsstring[]NonullRestrict search to specific documents by their IDs.
search_modestringNo"semantic"One of "semantic" (vector similarity), "hybrid" (BM25 + vector fusion), or "keyword" (pure BM25).
alphafloatNo0.7Hybrid fusion weight. 1.0 = vector-heavy, 0.0 = keyword-heavy. Ignored unless search_mode="hybrid". Range: 0.0--1.0.
include_contextbooleanNofalseIf true, each result includes neighbouring chunks in context_before / context_after.
context_windowintegerNo2Number of chunks before AND after each match. Range: 0--5. Ignored when include_context=false.

Search Modes

Pick the mode that matches your query shape. The response echoes back search_mode so you can confirm routing.

ModeEngineWhen to use
semantic (default)Vector similarity (nearVector) over 384-dim embeddingsNatural-language questions, paraphrased queries, conceptual lookups
hybridBM25 fused with vector similarity using alphaBest of both worlds -- use when queries mix prose + literals (e.g. "rate limit error 429")
keywordPure BM25Exact-match queries -- error codes, identifiers, code snippets, short tokens that semantic embeddings blur
Availability

Semantic and hybrid modes require chunk embeddings to be present in Weaviate. On current local/dev environments, keyword mode is the only mode with full retrieval quality measured. Semantic/hybrid quality improves once ingestion-side chunk embeddings land (tracked as ENG-S083).

Context Window

Setting include_context: true enriches every result with the chunks immediately before and after the match inside the same document. This gives an LLM coherent surrounding context without a second round-trip, at the cost of extra tokens on the wire.

  • context_window: 2 (default) -- up to 2 chunks before + 2 chunks after each match, clamped at document edges.
  • context_window: 0 -- arrays are present but empty ([]), signalling "context was requested, window size zero".
  • When include_context: false, context_before / context_after are null and no extra DB query runs.

The response also adds a top-level total_tokens field -- the sum of token_count across every match and every context chunk -- so you can plan LLM prompt budgets without re-tokenising client-side.

Code Examples

# Default semantic search with context window
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "How do I authenticate API requests?",
"limit": 5,
"min_score": 0.5,
"include_context": true,
"context_window": 2
}'

# Hybrid search for exact-match + semantic fusion
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "ErrRateLimitExceeded",
"search_mode": "hybrid",
"alpha": 0.7
}'

# Pure keyword search for literal strings
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "ink_",
"search_mode": "keyword"
}'

Response

Status: 200 OK

{
"query": "How do I authenticate API requests?",
"total_results": 2,
"processing_time_ms": 45.2,
"search_mode": "hybrid",
"total_tokens": 1840,
"results": [
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000001",
"document_id": "a1b2c3d4-0001-4000-8000-000000000010",
"document_name": "API Authentication Guide.md",
"content": "All API requests require authentication using a Bearer token or X-API-Key header...",
"score": 0.94,
"metadata": {
"chunk_index": 5,
"category": "documentation"
},
"context_before": [
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000003",
"chunk_index": 3,
"content": "Before you can make any request, you must generate an API key...",
"token_count": 210
},
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000004",
"chunk_index": 4,
"content": "API keys start with `ink_` and can be scoped per workspace...",
"token_count": 190
}
],
"context_after": [
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000006",
"chunk_index": 6,
"content": "The same header is used for Bearer tokens via OAuth...",
"token_count": 180
}
]
}
]
}

Response Fields

FieldTypeDescription
querystringThe original search query (echoed back)
total_resultsintegerNumber of results returned
processing_time_msfloatServer-side query processing time in milliseconds
search_modestringThe mode the server actually used ("semantic", "hybrid", or "keyword")
total_tokensintegerSum of token_count across all match and context chunks in this response
resultsarrayArray of matching search results, ordered by score descending
results[].chunk_idstringUnique identifier for the matched chunk
results[].document_idstringID of the parent document
results[].document_namestringName of the parent document
results[].contentstringText content of the matched chunk
results[].scorefloatRelevance score (higher is more relevant; scale depends on mode)
results[].metadataobject | nullChunk-level metadata, including chunk_index
results[].context_beforearray | nullUp to context_window chunks immediately before the match (null when include_context=false, [] when context_window=0 or match is at the first chunk)
results[].context_afterarray | nullUp to context_window chunks immediately after the match (same nullability rules)

Each ContextChunk object has chunk_id, chunk_index, content, and token_count.

Errors

StatusError TypeDescription
400bad-requestEmpty or missing query field
401unauthorizedMissing or invalid API key
403forbiddenAPI key does not have search permission
422validation-errorRequest body failed validation (e.g., context_window > 5, alpha > 1.0, invalid search_mode)
429rate-limit-exceededRate limit exceeded
500internal-errorWeaviate or PostgreSQL error. Errors propagate explicitly -- there is no silent fallback between modes.
503service-unavailableUpstream vector or metadata store is temporarily unreachable

Example Error Response

{
"type": "https://api.inherent.sh/errors/validation-error",
"title": "Unprocessable Entity",
"status": 422,
"detail": "context_window must be between 0 and 5",
"instance": "/v1/search",
"trace_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"timestamp": "2026-04-24T12:34:56.789Z"
}