Search
Perform semantic search across all documents in your workspace.
Need auth and endpoint context first? See the API Overview and Authentication guides.
POST /v1/search
Authentication
Requires an API key with the search permission.
| Header | Value |
|---|---|
X-API-Key | ink_live_abc123... |
Content-Type | application/json |
Request Body
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | -- | The search query. Must be between 1 and 1000 characters. |
limit | integer | No | 10 | Maximum number of results to return. Range: 1--100. |
min_score | float | No | 0.0 | Minimum similarity score threshold. Range: 0.0--1.0. |
document_ids | string[] | No | null | Restrict search to specific documents by their IDs. |
search_mode | string | No | "semantic" | One of "semantic" (vector similarity), "hybrid" (BM25 + vector fusion), or "keyword" (pure BM25). |
alpha | float | No | 0.7 | Hybrid fusion weight. 1.0 = vector-heavy, 0.0 = keyword-heavy. Ignored unless search_mode="hybrid". Range: 0.0--1.0. |
include_context | boolean | No | false | If true, each result includes neighbouring chunks in context_before / context_after. |
context_window | integer | No | 2 | Number of chunks before AND after each match. Range: 0--5. Ignored when include_context=false. |
Search Modes
Pick the mode that matches your query shape. The response echoes back search_mode so you can confirm routing.
| Mode | Engine | When to use |
|---|---|---|
semantic (default) | Vector similarity (nearVector) over 384-dim embeddings | Natural-language questions, paraphrased queries, conceptual lookups |
hybrid | BM25 fused with vector similarity using alpha | Best of both worlds -- use when queries mix prose + literals (e.g. "rate limit error 429") |
keyword | Pure BM25 | Exact-match queries -- error codes, identifiers, code snippets, short tokens that semantic embeddings blur |
Semantic and hybrid modes require chunk embeddings to be present in Weaviate. On current local/dev environments, keyword mode is the only mode with full retrieval quality measured. Semantic/hybrid quality improves once ingestion-side chunk embeddings land (tracked as ENG-S083).
Context Window
Setting include_context: true enriches every result with the chunks immediately before and after the match inside the same document. This gives an LLM coherent surrounding context without a second round-trip, at the cost of extra tokens on the wire.
context_window: 2(default) -- up to 2 chunks before + 2 chunks after each match, clamped at document edges.context_window: 0-- arrays are present but empty ([]), signalling "context was requested, window size zero".- When
include_context: false,context_before/context_afterarenulland no extra DB query runs.
The response also adds a top-level total_tokens field -- the sum of token_count across every match and every context chunk -- so you can plan LLM prompt budgets without re-tokenising client-side.
Code Examples
- cURL
- Python
- JavaScript
# Default semantic search with context window
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "How do I authenticate API requests?",
"limit": 5,
"min_score": 0.5,
"include_context": true,
"context_window": 2
}'
# Hybrid search for exact-match + semantic fusion
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "ErrRateLimitExceeded",
"search_mode": "hybrid",
"alpha": 0.7
}'
# Pure keyword search for literal strings
curl -X POST https://api.inherent.sh/v1/search \
-H "X-API-Key: $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "ink_",
"search_mode": "keyword"
}'
import requests
response = requests.post(
"https://api.inherent.sh/v1/search",
headers={
"X-API-Key": "ink_live_abc123...",
"Content-Type": "application/json",
},
json={
"query": "How do I authenticate API requests?",
"limit": 5,
"search_mode": "hybrid", # vector + BM25 fusion
"alpha": 0.7, # vector-weighted
"include_context": True, # grab neighbouring chunks
"context_window": 2,
},
)
data = response.json()
print(f"mode used: {data['search_mode']}, total tokens: {data['total_tokens']}")
for result in data["results"]:
print(f"{result['document_name']}: {result['score']:.2f}")
print(f" match: {result['content'][:80]}...")
for ctx in (result.get("context_before") or []):
print(f" before: [{ctx['chunk_index']}] {ctx['content'][:60]}...")
for ctx in (result.get("context_after") or []):
print(f" after: [{ctx['chunk_index']}] {ctx['content'][:60]}...")
const response = await fetch("https://api.inherent.sh/v1/search", {
method: "POST",
headers: {
"X-API-Key": "ink_live_abc123...",
"Content-Type": "application/json",
},
body: JSON.stringify({
query: "How do I authenticate API requests?",
limit: 5,
search_mode: "hybrid",
alpha: 0.7,
include_context: true,
context_window: 2,
}),
});
const data = await response.json();
console.log(`mode=${data.search_mode} tokens=${data.total_tokens}`);
data.results.forEach((result) => {
console.log(`${result.document_name}: ${result.score}`);
(result.context_before ?? []).forEach((c) =>
console.log(` before [${c.chunk_index}]: ${c.content.slice(0, 60)}...`),
);
(result.context_after ?? []).forEach((c) =>
console.log(` after [${c.chunk_index}]: ${c.content.slice(0, 60)}...`),
);
});
Response
Status: 200 OK
{
"query": "How do I authenticate API requests?",
"total_results": 2,
"processing_time_ms": 45.2,
"search_mode": "hybrid",
"total_tokens": 1840,
"results": [
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000001",
"document_id": "a1b2c3d4-0001-4000-8000-000000000010",
"document_name": "API Authentication Guide.md",
"content": "All API requests require authentication using a Bearer token or X-API-Key header...",
"score": 0.94,
"metadata": {
"chunk_index": 5,
"category": "documentation"
},
"context_before": [
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000003",
"chunk_index": 3,
"content": "Before you can make any request, you must generate an API key...",
"token_count": 210
},
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000004",
"chunk_index": 4,
"content": "API keys start with `ink_` and can be scoped per workspace...",
"token_count": 190
}
],
"context_after": [
{
"chunk_id": "a1b2c3d4-0001-4000-8000-000000000006",
"chunk_index": 6,
"content": "The same header is used for Bearer tokens via OAuth...",
"token_count": 180
}
]
}
]
}
Response Fields
| Field | Type | Description |
|---|---|---|
query | string | The original search query (echoed back) |
total_results | integer | Number of results returned |
processing_time_ms | float | Server-side query processing time in milliseconds |
search_mode | string | The mode the server actually used ("semantic", "hybrid", or "keyword") |
total_tokens | integer | Sum of token_count across all match and context chunks in this response |
results | array | Array of matching search results, ordered by score descending |
results[].chunk_id | string | Unique identifier for the matched chunk |
results[].document_id | string | ID of the parent document |
results[].document_name | string | Name of the parent document |
results[].content | string | Text content of the matched chunk |
results[].score | float | Relevance score (higher is more relevant; scale depends on mode) |
results[].metadata | object | null | Chunk-level metadata, including chunk_index |
results[].context_before | array | null | Up to context_window chunks immediately before the match (null when include_context=false, [] when context_window=0 or match is at the first chunk) |
results[].context_after | array | null | Up to context_window chunks immediately after the match (same nullability rules) |
Each ContextChunk object has chunk_id, chunk_index, content, and token_count.
Errors
| Status | Error Type | Description |
|---|---|---|
400 | bad-request | Empty or missing query field |
401 | unauthorized | Missing or invalid API key |
403 | forbidden | API key does not have search permission |
422 | validation-error | Request body failed validation (e.g., context_window > 5, alpha > 1.0, invalid search_mode) |
429 | rate-limit-exceeded | Rate limit exceeded |
500 | internal-error | Weaviate or PostgreSQL error. Errors propagate explicitly -- there is no silent fallback between modes. |
503 | service-unavailable | Upstream vector or metadata store is temporarily unreachable |
Example Error Response
{
"type": "https://api.inherent.sh/errors/validation-error",
"title": "Unprocessable Entity",
"status": 422,
"detail": "context_window must be between 0 and 5",
"instance": "/v1/search",
"trace_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"timestamp": "2026-04-24T12:34:56.789Z"
}