Retrieval
Retrieval is how you search your knowledge base and get relevant context for your AI applications.
Basic Search
Perform a semantic search:
curl -X POST https://api.inherent.systems/v1/search \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "How do I authenticate API requests?",
"limit": 5
}'
Response:
{
"chunks": [
{
"id": "chunk_abc123",
"content": "Include your API key in the Authorization header...",
"score": 0.94,
"document_id": "doc_xyz789",
"metadata": {
"title": "Authentication Guide"
}
}
],
"query_id": "qry_def456"
}
Search Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | required | Your search query |
limit | integer | 10 | Max results (1-100) |
threshold | float | 0.0 | Min similarity score (0-1) |
filter | object | null | Metadata filters |
include_metadata | boolean | true | Include metadata in results |
Filtering by Metadata
Filter search results by metadata values:
{
"query": "authentication",
"filter": {
"category": "documentation",
"version": "2.0"
}
}
Filter Operators
{
"query": "API endpoints",
"filter": {
"category": {"$eq": "api"},
"version": {"$gte": "2.0"},
"tags": {"$in": ["api", "reference"]},
"deprecated": {"$ne": true}
}
}
| Operator | Description |
|---|---|
$eq | Equals |
$ne | Not equals |
$gt, $gte | Greater than (or equal) |
$lt, $lte | Less than (or equal) |
$in | Value in array |
$nin | Value not in array |
Hybrid Search
Combine semantic search with keyword matching:
{
"query": "OAuth2 authentication flow",
"hybrid": {
"enabled": true,
"alpha": 0.7
}
}
The alpha parameter controls the balance:
1.0= Pure semantic search0.0= Pure keyword search0.7= 70% semantic, 30% keyword (recommended)
Getting Full Documents
Retrieve a complete document:
curl https://api.inherent.systems/v1/documents/doc_abc123 \
-H "Authorization: Bearer $INHERENT_API_KEY"
Get all chunks from a document:
curl https://api.inherent.systems/v1/documents/doc_abc123/chunks \
-H "Authorization: Bearer $INHERENT_API_KEY"
Context Window Management
When building prompts for LLMs, manage your context window:
import tiktoken
def get_context(query, max_tokens=4000):
response = requests.post(
f"{base_url}/search",
headers={"Authorization": f"Bearer {api_key}"},
json={"query": query, "limit": 20}
)
chunks = response.json()["chunks"]
context = []
total_tokens = 0
encoder = tiktoken.get_encoding("cl100k_base")
for chunk in chunks:
chunk_tokens = len(encoder.encode(chunk["content"]))
if total_tokens + chunk_tokens > max_tokens:
break
context.append(chunk["content"])
total_tokens += chunk_tokens
return "\n\n".join(context)
Query Logging
All queries are logged for audit and debugging:
curl https://api.inherent.systems/v1/queries \
-H "Authorization: Bearer $INHERENT_API_KEY"
Each query log includes:
- Query text
- Results returned
- Latency
- Timestamp
- User/API key used
Best Practices
- Be specific - "How do I authenticate with OAuth2?" beats "authentication"
- Use filters - Narrow results with metadata filters
- Set thresholds - Use
threshold: 0.7to filter low-quality matches - Limit results - Don't retrieve more than you need
- Cache wisely - Cache frequent queries at the application level