Skip to main content

Upload from URL

Import a document from a public URL.

POST /v1/documents/url

Request Body

FieldTypeRequiredDescription
urlstringYesPublic URL to import
metadataobjectNoAdditional metadata
chunkingobjectNoChunking configuration

Example Request

curl -X POST https://api.inherent.systems/v1/documents/url \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/docs/getting-started.html",
"metadata": {
"source": "website",
"category": "documentation"
}
}'

Response

{
"id": "doc_url123xyz",
"status": "processing",
"version": 1,
"source_url": "https://example.com/docs/getting-started.html",
"metadata": {
"source": "website",
"category": "documentation",
"title": "Getting Started"
},
"created_at": "2024-01-15T10:30:00Z"
}

Supported URL Types

TypeSupportNotes
HTML pagesFullExtracts text content
Markdown filesFullRaw .md files
PDF filesFullPublic PDF URLs
Plain textFull.txt files
Protected URLsNoRequires authentication

Automatic Metadata

When importing from URLs, Inherent automatically extracts:

  • title - From page title or H1
  • source_url - Original URL
  • description - From meta description
  • fetched_at - Import timestamp

Errors

CodeDescription
400Invalid URL format
403URL is not accessible
404URL not found
415Unsupported content type
422Could not extract content
{
"error": {
"code": "url_not_accessible",
"message": "Could not fetch content from the provided URL",
"request_id": "req_abc123"
}
}