Skip to main content

Ingestion

Ingestion is the process of adding documents to your Inherent knowledge base. This guide covers all ingestion methods.

Ingestion Methods

MethodBest For
Direct UploadText content, small files
File UploadPDFs, Word docs, code files
URL ImportWeb pages, public documents
IntegrationsGitHub, Notion, Confluence

Direct Text Upload

The simplest way to add content:

curl -X POST https://api.inherent.systems/v1/documents \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "Your document content here...",
"metadata": {
"title": "Document Title",
"category": "documentation"
}
}'

File Upload

Upload files directly (PDF, DOCX, TXT, MD, code files):

curl -X POST https://api.inherent.systems/v1/documents/upload \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-F "file=@/path/to/document.pdf" \
-F "metadata={\"title\": \"My PDF\"}"

Supported File Types

TypeExtensionsMax Size
Text.txt, .md, .rst10 MB
Documents.pdf, .docx, .doc50 MB
Code.py, .js, .ts, .go, etc.10 MB
Data.json, .yaml, .csv10 MB

URL Import

Import content from public URLs:

curl -X POST https://api.inherent.systems/v1/documents/url \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/docs/getting-started",
"metadata": {
"source": "website"
}
}'

Chunking Strategies

Inherent automatically chunks documents for optimal retrieval. Choose a strategy based on your content:

StrategyBest ForDescription
tokensGeneralFixed token windows (default: 512 tokens)
paragraphsProseSplits on paragraph boundaries
headingsDocumentationSplits on markdown/HTML headings

Specify chunking strategy:

{
"content": "...",
"chunking": {
"strategy": "headings",
"max_tokens": 1024,
"overlap": 50
}
}

Metadata

Add metadata to improve search and filtering:

{
"content": "...",
"metadata": {
"title": "API Reference",
"category": "documentation",
"version": "2.0",
"author": "engineering-team",
"tags": ["api", "reference", "v2"]
}
}

Reserved Metadata Fields

FieldTypeDescription
titlestringDocument title (used in UI)
sourcestringOrigin of the document
created_atdatetimeAuto-set on creation
updated_atdatetimeAuto-set on update

Async Processing

Document ingestion is asynchronous. After uploading, you'll receive a document ID:

{
"id": "doc_abc123",
"status": "processing"
}

Check processing status:

curl https://api.inherent.systems/v1/documents/doc_abc123/status \
-H "Authorization: Bearer $INHERENT_API_KEY"

Possible statuses:

  • processing - Document is being chunked and embedded
  • completed - Ready for search
  • failed - Processing failed (check error field)

Batch Ingestion

For large imports, use batch ingestion:

curl -X POST https://api.inherent.systems/v1/documents/batch \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"documents": [
{"content": "Doc 1...", "metadata": {"title": "Doc 1"}},
{"content": "Doc 2...", "metadata": {"title": "Doc 2"}}
]
}'

Batch limits:

  • Maximum 100 documents per batch
  • Maximum 10 MB total payload size

Next Steps