Ingestion
Ingestion is the process of adding documents to your Inherent knowledge base. This guide covers all ingestion methods.
Ingestion Methods
| Method | Best For |
|---|---|
| Direct Upload | Text content, small files |
| File Upload | PDFs, Word docs, code files |
| URL Import | Web pages, public documents |
| Integrations | GitHub, Notion, Confluence |
Direct Text Upload
The simplest way to add content:
curl -X POST https://api.inherent.systems/v1/documents \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Your document content here...",
"metadata": {
"title": "Document Title",
"category": "documentation"
}
}'
File Upload
Upload files directly (PDF, DOCX, TXT, MD, code files):
curl -X POST https://api.inherent.systems/v1/documents/upload \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-F "file=@/path/to/document.pdf" \
-F "metadata={\"title\": \"My PDF\"}"
Supported File Types
| Type | Extensions | Max Size |
|---|---|---|
| Text | .txt, .md, .rst | 10 MB |
| Documents | .pdf, .docx, .doc | 50 MB |
| Code | .py, .js, .ts, .go, etc. | 10 MB |
| Data | .json, .yaml, .csv | 10 MB |
URL Import
Import content from public URLs:
curl -X POST https://api.inherent.systems/v1/documents/url \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/docs/getting-started",
"metadata": {
"source": "website"
}
}'
Chunking Strategies
Inherent automatically chunks documents for optimal retrieval. Choose a strategy based on your content:
| Strategy | Best For | Description |
|---|---|---|
tokens | General | Fixed token windows (default: 512 tokens) |
paragraphs | Prose | Splits on paragraph boundaries |
headings | Documentation | Splits on markdown/HTML headings |
Specify chunking strategy:
{
"content": "...",
"chunking": {
"strategy": "headings",
"max_tokens": 1024,
"overlap": 50
}
}
Metadata
Add metadata to improve search and filtering:
{
"content": "...",
"metadata": {
"title": "API Reference",
"category": "documentation",
"version": "2.0",
"author": "engineering-team",
"tags": ["api", "reference", "v2"]
}
}
Reserved Metadata Fields
| Field | Type | Description |
|---|---|---|
title | string | Document title (used in UI) |
source | string | Origin of the document |
created_at | datetime | Auto-set on creation |
updated_at | datetime | Auto-set on update |
Async Processing
Document ingestion is asynchronous. After uploading, you'll receive a document ID:
{
"id": "doc_abc123",
"status": "processing"
}
Check processing status:
curl https://api.inherent.systems/v1/documents/doc_abc123/status \
-H "Authorization: Bearer $INHERENT_API_KEY"
Possible statuses:
processing- Document is being chunked and embeddedcompleted- Ready for searchfailed- Processing failed (checkerrorfield)
Batch Ingestion
For large imports, use batch ingestion:
curl -X POST https://api.inherent.systems/v1/documents/batch \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{"content": "Doc 1...", "metadata": {"title": "Doc 1"}},
{"content": "Doc 2...", "metadata": {"title": "Doc 2"}}
]
}'
Batch limits:
- Maximum 100 documents per batch
- Maximum 10 MB total payload size