Skip to main content

Retrieval

Retrieval is how you search your knowledge base and get relevant context for your AI applications.

Perform a semantic search:

curl -X POST https://api.inherent.systems/v1/search \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "How do I authenticate API requests?",
"limit": 5
}'

Response:

{
"chunks": [
{
"id": "chunk_abc123",
"content": "Include your API key in the Authorization header...",
"score": 0.94,
"document_id": "doc_xyz789",
"metadata": {
"title": "Authentication Guide"
}
}
],
"query_id": "qry_def456"
}

Search Parameters

ParameterTypeDefaultDescription
querystringrequiredYour search query
limitinteger10Max results (1-100)
thresholdfloat0.0Min similarity score (0-1)
filterobjectnullMetadata filters
include_metadatabooleantrueInclude metadata in results

Filtering by Metadata

Filter search results by metadata values:

{
"query": "authentication",
"filter": {
"category": "documentation",
"version": "2.0"
}
}

Filter Operators

{
"query": "API endpoints",
"filter": {
"category": {"$eq": "api"},
"version": {"$gte": "2.0"},
"tags": {"$in": ["api", "reference"]},
"deprecated": {"$ne": true}
}
}
OperatorDescription
$eqEquals
$neNot equals
$gt, $gteGreater than (or equal)
$lt, $lteLess than (or equal)
$inValue in array
$ninValue not in array

Combine semantic search with keyword matching:

{
"query": "OAuth2 authentication flow",
"hybrid": {
"enabled": true,
"alpha": 0.7
}
}

The alpha parameter controls the balance:

  • 1.0 = Pure semantic search
  • 0.0 = Pure keyword search
  • 0.7 = 70% semantic, 30% keyword (recommended)

Getting Full Documents

Retrieve a complete document:

curl https://api.inherent.systems/v1/documents/doc_abc123 \
-H "Authorization: Bearer $INHERENT_API_KEY"

Get all chunks from a document:

curl https://api.inherent.systems/v1/documents/doc_abc123/chunks \
-H "Authorization: Bearer $INHERENT_API_KEY"

Context Window Management

When building prompts for LLMs, manage your context window:

import tiktoken

def get_context(query, max_tokens=4000):
response = requests.post(
f"{base_url}/search",
headers={"Authorization": f"Bearer {api_key}"},
json={"query": query, "limit": 20}
)

chunks = response.json()["chunks"]
context = []
total_tokens = 0
encoder = tiktoken.get_encoding("cl100k_base")

for chunk in chunks:
chunk_tokens = len(encoder.encode(chunk["content"]))
if total_tokens + chunk_tokens > max_tokens:
break
context.append(chunk["content"])
total_tokens += chunk_tokens

return "\n\n".join(context)

Query Logging

All queries are logged for audit and debugging:

curl https://api.inherent.systems/v1/queries \
-H "Authorization: Bearer $INHERENT_API_KEY"

Each query log includes:

  • Query text
  • Results returned
  • Latency
  • Timestamp
  • User/API key used

Best Practices

  1. Be specific - "How do I authenticate with OAuth2?" beats "authentication"
  2. Use filters - Narrow results with metadata filters
  3. Set thresholds - Use threshold: 0.7 to filter low-quality matches
  4. Limit results - Don't retrieve more than you need
  5. Cache wisely - Cache frequent queries at the application level

Next Steps