Upload from URL
Import a document from a public URL.
POST /v1/documents/url
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Public URL to import |
metadata | object | No | Additional metadata |
chunking | object | No | Chunking configuration |
Example Request
- cURL
- Python
- JavaScript
curl -X POST https://api.inherent.systems/v1/documents/url \
-H "Authorization: Bearer $INHERENT_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/docs/getting-started.html",
"metadata": {
"source": "website",
"category": "documentation"
}
}'
import requests
response = requests.post(
"https://api.inherent.systems/v1/documents/url",
headers={"Authorization": f"Bearer {api_key}"},
json={
"url": "https://example.com/docs/getting-started.html",
"metadata": {
"source": "website",
"category": "documentation"
}
}
)
const response = await fetch('https://api.inherent.systems/v1/documents/url', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://example.com/docs/getting-started.html',
metadata: {
source: 'website',
category: 'documentation',
},
}),
});
Response
{
"id": "doc_url123xyz",
"status": "processing",
"version": 1,
"source_url": "https://example.com/docs/getting-started.html",
"metadata": {
"source": "website",
"category": "documentation",
"title": "Getting Started"
},
"created_at": "2024-01-15T10:30:00Z"
}
Supported URL Types
| Type | Support | Notes |
|---|---|---|
| HTML pages | Full | Extracts text content |
| Markdown files | Full | Raw .md files |
| PDF files | Full | Public PDF URLs |
| Plain text | Full | .txt files |
| Protected URLs | No | Requires authentication |
Automatic Metadata
When importing from URLs, Inherent automatically extracts:
title- From page title or H1source_url- Original URLdescription- From meta descriptionfetched_at- Import timestamp
Errors
| Code | Description |
|---|---|
400 | Invalid URL format |
403 | URL is not accessible |
404 | URL not found |
415 | Unsupported content type |
422 | Could not extract content |
{
"error": {
"code": "url_not_accessible",
"message": "Could not fetch content from the provided URL",
"request_id": "req_abc123"
}
}