Ingestion

Ingestion is the process of adding documents to your Inherent knowledge base. This guide covers all ingestion methods.

Ingestion Methods

Method	Best For
Direct Upload	Text content, small files
File Upload	PDFs, Word docs, code files
URL Import	Web pages, public documents
Integrations	GitHub, Notion, Confluence

Direct Text Upload

The simplest way to add content:

curl -X POST https://api.inherent.systems/v1/documents \
  -H "Authorization: Bearer $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Your document content here...",
    "metadata": {
      "title": "Document Title",
      "category": "documentation"
    }
  }'

File Upload

Upload files directly (PDF, DOCX, TXT, MD, code files):

curl -X POST https://api.inherent.systems/v1/documents/upload \
  -H "Authorization: Bearer $INHERENT_API_KEY" \
  -F "file=@/path/to/document.pdf" \
  -F "metadata={\"title\": \"My PDF\"}"

Supported File Types

Type	Extensions	Max Size
Text	`.txt`, `.md`, `.rst`	10 MB
Documents	`.pdf`, `.docx`, `.doc`	50 MB
Code	`.py`, `.js`, `.ts`, `.go`, etc.	10 MB
Data	`.json`, `.yaml`, `.csv`	10 MB

URL Import

Import content from public URLs:

curl -X POST https://api.inherent.systems/v1/documents/url \
  -H "Authorization: Bearer $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/docs/getting-started",
    "metadata": {
      "source": "website"
    }
  }'

Chunking Strategies

Inherent automatically chunks documents for optimal retrieval. Choose a strategy based on your content:

Strategy	Best For	Description
`tokens`	General	Fixed token windows (default: 512 tokens)
`paragraphs`	Prose	Splits on paragraph boundaries
`headings`	Documentation	Splits on markdown/HTML headings

Specify chunking strategy:

{
  "content": "...",
  "chunking": {
    "strategy": "headings",
    "max_tokens": 1024,
    "overlap": 50
  }
}

Metadata

Add metadata to improve search and filtering:

{
  "content": "...",
  "metadata": {
    "title": "API Reference",
    "category": "documentation",
    "version": "2.0",
    "author": "engineering-team",
    "tags": ["api", "reference", "v2"]
  }
}

Reserved Metadata Fields

Field	Type	Description
`title`	string	Document title (used in UI)
`source`	string	Origin of the document
`created_at`	datetime	Auto-set on creation
`updated_at`	datetime	Auto-set on update

Async Processing

Document ingestion is asynchronous. After uploading, you'll receive a document ID:

{
  "id": "doc_abc123",
  "status": "processing"
}

Check processing status:

curl https://api.inherent.systems/v1/documents/doc_abc123/status \
  -H "Authorization: Bearer $INHERENT_API_KEY"

Possible statuses:

processing - Document is being chunked and embedded
completed - Ready for search
failed - Processing failed (check error field)

Batch Ingestion

For large imports, use batch ingestion:

curl -X POST https://api.inherent.systems/v1/documents/batch \
  -H "Authorization: Bearer $INHERENT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {"content": "Doc 1...", "metadata": {"title": "Doc 1"}},
      {"content": "Doc 2...", "metadata": {"title": "Doc 2"}}
    ]
  }'

Batch limits:

Maximum 100 documents per batch
Maximum 10 MB total payload size

Ingestion Methods​

Direct Text Upload​

File Upload​

Supported File Types​

URL Import​

Chunking Strategies​

Metadata​

Reserved Metadata Fields​

Async Processing​

Batch Ingestion​

Next Steps​