Search

Perform semantic search across all documents in your workspace.

Need auth and endpoint context first? See the API Overview and Authentication guides.

POST /v1/search

Authentication

Requires an API key with the search permission.

Header	Value
`X-API-Key`	`ink_live_abc123...`
`Content-Type`	`application/json`

Request Body

Field	Type	Required	Default	Description
`query`	string	Yes	--	The search query. Must be between 1 and 1000 characters.
`limit`	integer	No	`10`	Maximum number of results to return. Range: 1--100.
`min_score`	float	No	`0.0`	Minimum similarity score threshold. Range: 0.0--1.0.
`document_ids`	string[]	No	`null`	Restrict search to specific documents by their IDs.
`search_mode`	string	No	`"semantic"`	One of `"semantic"` (vector similarity), `"hybrid"` (BM25 + vector fusion), or `"keyword"` (pure BM25).
`alpha`	float	No	`0.7`	Hybrid fusion weight. `1.0` = vector-heavy, `0.0` = keyword-heavy. Ignored unless `search_mode="hybrid"`. Range: 0.0--1.0.
`include_context`	boolean	No	`false`	If `true`, each result includes neighbouring chunks in `context_before` / `context_after`.
`context_window`	integer	No	`2`	Number of chunks before AND after each match. Range: 0--5. Ignored when `include_context=false`.

Search Modes

Pick the mode that matches your query shape. The response echoes back search_mode so you can confirm routing.

Mode	Engine	When to use
`semantic` (default)	Vector similarity (`nearVector`) over 384-dim embeddings	Natural-language questions, paraphrased queries, conceptual lookups
`hybrid`	BM25 fused with vector similarity using `alpha`	Best of both worlds -- use when queries mix prose + literals (e.g. "rate limit error `429`")
`keyword`	Pure BM25	Exact-match queries -- error codes, identifiers, code snippets, short tokens that semantic embeddings blur

Availability

Semantic and hybrid modes require chunk embeddings to be present in Weaviate. On current local/dev environments, keyword mode is the only mode with full retrieval quality measured. Semantic/hybrid quality improves once ingestion-side chunk embeddings land (tracked as ENG-S083).

Context Window

Setting include_context: true enriches every result with the chunks immediately before and after the match inside the same document. This gives an LLM coherent surrounding context without a second round-trip, at the cost of extra tokens on the wire.

context_window: 2 (default) -- up to 2 chunks before + 2 chunks after each match, clamped at document edges.
context_window: 0 -- arrays are present but empty ([]), signalling "context was requested, window size zero".
When include_context: false, context_before / context_after are null and no extra DB query runs.

The response also adds a top-level total_tokens field -- the sum of token_count across every match and every context chunk -- so you can plan LLM prompt budgets without re-tokenising client-side.

Code Examples

cURL
Python
JavaScript

# Default semantic search with context window
curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "How do I authenticate API requests?",
    "limit": 5,
    "min_score": 0.5,
    "include_context": true,
    "context_window": 2
  }'

# Hybrid search for exact-match + semantic fusion
curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "ErrRateLimitExceeded",
    "search_mode": "hybrid",
    "alpha": 0.7
  }'

# Pure keyword search for literal strings
curl -X POST https://api.inherent.sh/v1/search \
  -H "X-API-Key: $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "ink_",
    "search_mode": "keyword"
  }'

import requests

response = requests.post(
    "https://api.inherent.sh/v1/search",
    headers={
        "X-API-Key": "ink_live_abc123...",
        "Content-Type": "application/json",
    },
    json={
        "query": "How do I authenticate API requests?",
        "limit": 5,
        "search_mode": "hybrid",       # vector + BM25 fusion
        "alpha": 0.7,                  # vector-weighted
        "include_context": True,       # grab neighbouring chunks
        "context_window": 2,
    },
)

data = response.json()
print(f"mode used: {data['search_mode']}, total tokens: {data['total_tokens']}")
for result in data["results"]:
    print(f"{result['document_name']}: {result['score']:.2f}")
    print(f"  match:   {result['content'][:80]}...")
    for ctx in (result.get("context_before") or []):
        print(f"  before:  [{ctx['chunk_index']}] {ctx['content'][:60]}...")
    for ctx in (result.get("context_after") or []):
        print(f"  after:   [{ctx['chunk_index']}] {ctx['content'][:60]}...")

const response = await fetch("https://api.inherent.sh/v1/search", {
  method: "POST",
  headers: {
    "X-API-Key": "ink_live_abc123...",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    query: "How do I authenticate API requests?",
    limit: 5,
    search_mode: "hybrid",
    alpha: 0.7,
    include_context: true,
    context_window: 2,
  }),
});

const data = await response.json();
console.log(`mode=${data.search_mode} tokens=${data.total_tokens}`);
data.results.forEach((result) => {
  console.log(`${result.document_name}: ${result.score}`);
  (result.context_before ?? []).forEach((c) =>
    console.log(`  before [${c.chunk_index}]: ${c.content.slice(0, 60)}...`),
  );
  (result.context_after ?? []).forEach((c) =>
    console.log(`  after  [${c.chunk_index}]: ${c.content.slice(0, 60)}...`),
  );
});

Response

Status: 200 OK

{
  "query": "How do I authenticate API requests?",
  "total_results": 2,
  "processing_time_ms": 45.2,
  "search_mode": "hybrid",
  "total_tokens": 1840,
  "results": [
    {
      "chunk_id": "a1b2c3d4-0001-4000-8000-000000000001",
      "document_id": "a1b2c3d4-0001-4000-8000-000000000010",
      "document_name": "API Authentication Guide.md",
      "content": "All API requests require authentication using a Bearer token or X-API-Key header...",
      "score": 0.94,
      "metadata": {
        "chunk_index": 5,
        "category": "documentation"
      },
      "context_before": [
        {
          "chunk_id": "a1b2c3d4-0001-4000-8000-000000000003",
          "chunk_index": 3,
          "content": "Before you can make any request, you must generate an API key...",
          "token_count": 210
        },
        {
          "chunk_id": "a1b2c3d4-0001-4000-8000-000000000004",
          "chunk_index": 4,
          "content": "API keys start with `ink_` and can be scoped per workspace...",
          "token_count": 190
        }
      ],
      "context_after": [
        {
          "chunk_id": "a1b2c3d4-0001-4000-8000-000000000006",
          "chunk_index": 6,
          "content": "The same header is used for Bearer tokens via OAuth...",
          "token_count": 180
        }
      ]
    }
  ]
}

Response Fields

Field	Type	Description
`query`	string	The original search query (echoed back)
`total_results`	integer	Number of results returned
`processing_time_ms`	float	Server-side query processing time in milliseconds
`search_mode`	string	The mode the server actually used (`"semantic"`, `"hybrid"`, or `"keyword"`)
`total_tokens`	integer	Sum of `token_count` across all match and context chunks in this response
`results`	array	Array of matching search results, ordered by score descending
`results[].chunk_id`	string	Unique identifier for the matched chunk
`results[].document_id`	string	ID of the parent document
`results[].document_name`	string	Name of the parent document
`results[].content`	string	Text content of the matched chunk
`results[].score`	float	Relevance score (higher is more relevant; scale depends on mode)
`results[].metadata`	object \| null	Chunk-level metadata, including `chunk_index`
`results[].context_before`	array \| null	Up to `context_window` chunks immediately before the match (`null` when `include_context=false`, `[]` when `context_window=0` or match is at the first chunk)
`results[].context_after`	array \| null	Up to `context_window` chunks immediately after the match (same nullability rules)

Each ContextChunk object has chunk_id, chunk_index, content, and token_count.

Errors

Status	Error Type	Description
`400`	`bad-request`	Empty or missing `query` field
`401`	`unauthorized`	Missing or invalid API key
`403`	`forbidden`	API key does not have `search` permission
`422`	`validation-error`	Request body failed validation (e.g., `context_window > 5`, `alpha > 1.0`, invalid `search_mode`)
`429`	`rate-limit-exceeded`	Rate limit exceeded
`500`	`internal-error`	Weaviate or PostgreSQL error. Errors propagate explicitly -- there is no silent fallback between modes.
`503`	`service-unavailable`	Upstream vector or metadata store is temporarily unreachable

Example Error Response

{
  "type": "https://api.inherent.sh/errors/validation-error",
  "title": "Unprocessable Entity",
  "status": 422,
  "detail": "context_window must be between 0 and 5",
  "instance": "/v1/search",
  "trace_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "timestamp": "2026-04-24T12:34:56.789Z"
}

Authentication​

Request Body​

Search Modes​

Context Window​

Code Examples​

Response​

Response Fields​

Errors​

Example Error Response​