MCP Tools Reference
Nagelfluh exposes a subset of its REST API as MCP (Model Context Protocol) tools via fastapi-mcp, mounted at /mcp using the Streamable HTTP transport.
Authentication
All tools require a project-scoped API key:
Authorization: Bearer apk_<key>
API keys are scoped to a single project, so no project_id selection is needed at the session level — the key carries it.
Typical Workflow
1. list_environments — discover available environments and process type names
2. get_process_type_schema — fetch the JSON Schema for the specific type you want to run
3. upload_file — upload local input data (or use request_upload_token + curl for large files)
4. create_process — submit the job; save the returned id and version
5. get_process — poll until versions[-1].state is 'done' or 'failed'
6. get_dataset — resolve output URLs from versions[-1].outputs
7. curl '{url}' — download results; /files/ URLs need no authentication
Use describe_dataset before downloading to check columns, record counts, and bounding box.
Which endpoints are NOT exposed
Binary data download endpoints are excluded from MCP because they overflow LLM context windows:
- GET /dataset/{id}/data and /geography — use the url field from get_dataset + curl instead
- GET /files/{path} — auth-free, download directly with curl
- GET /uploads/{id} — use the url returned by upload_file
Processes
create_process
POST /process
Submit any type of job — data import, processing, inversion, forward modelling, etc. The job is queued and runs asynchronously in Kubernetes. Returns immediately with the process id and version number.
Retry vs. new process: If retrying a failed job or correcting parameters, pass the original id in the body to append a new version. Do NOT create a new process — that loses history. Omit id only when starting a genuinely new workflow.
Resource sizing for inversions: Never use defaults for inversions — the defaults (1 CPU, 2 Gi RAM, 1 h deadline) will cause OOM-kills or deadline failures with no output produced. Set resource_requests and deadline_seconds explicitly based on dataset size.
Query parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
project_id |
string | Yes | Project ID the job belongs to. |
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | Process type key, e.g. aem_inversion. Obtain from list_environments / get_process_type_schema. |
environment_id |
string | Yes | ID of the compute environment. Obtain from list_environments. |
params |
object | No | Process-type-specific parameters defined by the process type's JSON Schema. Fields with x-format: dataset expect a file URL from search_datasets or get_dataset. |
id |
string | No | Existing process ID. Provide to add a new version (retry/correction). Omit to create a new process. |
name |
string | No | Human-readable display name. Defaults to <type>-process. |
resource_requests |
object | No | Kubernetes resource requests. See below. |
deadline_seconds |
integer | No | Max wall-clock time before the job is killed. Default: 3600. Always set explicitly for inversions. |
resource_requests fields:
| Field | Type | Default | Description |
|---|---|---|---|
cpu |
string | "1000m" |
CPU request in Kubernetes notation, e.g. "500m" or "4". |
memory |
string | "2Gi" |
Memory request, e.g. "512Mi" or "16Gi". |
ephemeral-storage |
string | "10Gi" |
Temporary disk space for the job. |
Returns: {"id": "<process_id>", "versions": [{"version": <n>}]}
list_processes
GET /processes
List all processes the current user can access, with their status and outputs.
Each process has a versions array sorted ascending by version number; versions[-1] is the most recent run. Each version includes state, outputs, and parameters. Logs are not included — use get_process_logs for those.
Important: The URLs in outputs are /dataset/{id} metadata URLs, not directly usable as input_data. Call get_dataset to resolve the actual file URL.
| Parameter | Type | Required | Description |
|---|---|---|---|
project_id |
string | No | Filter to a specific project. Without this, returns all processes across all projects the user can access (or just the API key's scoped project). |
Returns: Array of process objects.
get_process
GET /process/{process_id}
Get a single process by ID, including all versions with state, parameters, and outputs. Prefer this over list_processes when you already have the ID — it fetches only the one record.
After create_process returns an id, poll this endpoint until versions[-1].state is done or failed, then read versions[-1].outputs for dataset URLs.
| Parameter | Type | Required | Description |
|---|---|---|---|
process_id |
string | Yes | Process ID from create_process. |
Returns: Single process object. Returns 404 if not found or not a project member.
get_process_logs
GET /process/{process_id}/logs
Retrieve execution logs for a process job. Use this to diagnose why a job failed (state == 'failed').
Always pass version when diagnosing a specific run — omitting it returns logs from all versions interleaved.
Pagination examples:
- offset=0, limit=100 → first 100 lines
- offset=100, limit=100 → next 100 lines
- offset=-50 → last 50 lines (tail)
- offset=-100, limit=50 → 50 lines starting 100 from the end
| Parameter | Type | Required | Description |
|---|---|---|---|
process_id |
string | Yes | Process ID. |
version |
integer | No | Version number. Omitting returns logs from all versions interleaved. |
offset |
integer | No | Positive = from start; negative = from end. Default: 0. |
limit |
integer | No | Maximum number of log entries to return. Omit for all entries from offset. |
Returns: Array of log entry objects with timestamps and messages.
clone_process_version
POST /process/{process_id}/versions/{version}/clone
Create a new version of a process by copying parameters from an existing version with optional overrides. Enables iterative tuning: run → inspect results → adjust one parameter → re-run, without re-specifying everything.
Resource limits and deadline are inherited from the source version unless explicitly overridden. For inversions, always override resources — the source may have been created with small defaults.
Returns the same {"id", "versions": [{"version"}]} format as create_process. Poll get_process to track state.
Path parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
process_id |
string | Yes | Process ID. |
version |
integer | Yes | Source version number to clone. |
Request body (optional):
| Field | Type | Description |
|---|---|---|
parameter_overrides |
object | Keys to change relative to the source version. All other parameters are copied unchanged. |
resource_requests |
object | Override resource limits (same fields as in create_process). |
deadline_seconds |
integer | Override the deadline (seconds). If omitted, inherits from source version. |
Returns: {"id": "<process_id>", "versions": [{"version": <n>}]}
cancel_process_version
POST /process/{process_id}/versions/{version}/cancel
Cancel a process version that is currently queued or running. Deletes the Kubernetes job (if submitted) and marks the version as failed. Returns 409 if the version is already in a terminal state (done or failed).
| Parameter | Type | Required | Description |
|---|---|---|---|
process_id |
string | Yes | Process ID. |
version |
integer | Yes | Version number to cancel. |
Returns: {"status": "cancelled"}
Datasets
search_datasets
GET /datasets
Search for datasets produced by completed processing jobs. Each result includes id, url (for use as input_data), dataset_name, process_name, and mime_type. The url can be downloaded directly with curl — no authentication required.
The search string is matched case-insensitively against <process_name> / v<version> / <dataset_name>.
| Parameter | Type | Required | Description |
|---|---|---|---|
search |
string | No | Name fragment to filter by. Default: "" (all datasets). |
project_id |
string | No | Restrict to one project. |
completed_only |
boolean | No | Default: true. Set false to include datasets from still-running or failed jobs. |
Returns: Array of dataset metadata objects.
get_dataset
GET /dataset/{dataset_id}
Return metadata for a specific dataset including its mime_type, parts structure, and the process version that produced it.
The url field in the response is the actual file URL — downloadable directly with curl (curl "{url}" -o /tmp/result.msgpack). Use this url as input_data when passing this dataset to create_process, not the /dataset/{id} URL from list_processes outputs.
| Parameter | Type | Required | Description |
|---|---|---|---|
dataset_id |
string | Yes | Dataset ID. |
Returns: Dataset metadata object including url, mime_type, and parts.
describe_dataset
GET /dataset/{dataset_id}/describe
Return compact statistics for a dataset without downloading the full content. Much cheaper than downloading, especially for large AEM files.
Returns (depending on mime_type):
- XYZ/AEM (application/x-aarhusxyz-msgpack): flightline_count, columns, value_ranges for numeric columns, bbox, crs
- GeoJSON (application/geo+json): feature_count, bbox
- JSON (application/json): record_count (if array), keys
| Parameter | Type | Required | Description |
|---|---|---|---|
dataset_id |
string | Yes | Dataset ID. |
Returns: Statistics object appropriate for the dataset's mime type.
Environments
list_environments
GET /environments
List available compute environments. Returns each environment's id, name, and process_types. By default process_types is a list of type name strings only.
| Parameter | Type | Required | Description |
|---|---|---|---|
include_schemas |
boolean | No | Include full JSON Schemas for each process type. Default: false. Use get_process_type_schema to fetch a single type's schema instead of embedding all schemas here. |
Returns: Array of environment objects.
get_process_types
GET /environments/{env_id}/process-types
Return all process types available in an environment, keyed by type name. Each entry is a JSON Schema describing the required and optional params for that process type. Fields with x-format: dataset expect a file URL from search_datasets.
Returns an empty dict if the environment has not finished registering its process types yet (environment setup is itself a process — check list_processes to see if it has completed).
| Parameter | Type | Required | Description |
|---|---|---|---|
env_id |
string | Yes | Environment ID from list_environments. |
Returns: Object mapping type name → JSON Schema.
get_process_type_schema
GET /environments/{env_id}/process-types/{type_name}
Return the JSON Schema for exactly one named process type. Even the largest schemas (~44 KB) fit in a single response.
| Parameter | Type | Required | Description |
|---|---|---|---|
env_id |
string | Yes | Environment ID from list_environments. |
type_name |
string | Yes | Process type key, e.g. import_skytem. |
Returns: JSON Schema object. Returns 404 if the environment or type name is not found.
create_environment
POST /environments
Register a Docker image as a named compute environment. Typically called automatically by a build pipeline after pushing a new image. The environment is immediately available for create_process; its process_types are populated once the environment's setup job completes.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Human-readable display name. |
docker_image |
string | Yes | Fully-qualified Docker image reference, e.g. registry.example.com/myenv:latest. |
process_id |
string | No | ID of the process that built this environment, if any. Links the environment back to its build job. |
Returns: Environment object including the generated id.
Uploads
upload_file
POST /upload
Upload a raw input file (e.g. AEM survey data, CSV) that is not the output of any process. The response url is a direct HTTP file URL ready to pass as input_data to create_process.
Supports two body formats, auto-detected from Content-Type:
Multipart/form-data (any file size):
curl -F "file=@data.xyz" "https://host/upload?project_id=..."
JSON + base64 (MCP-friendly, up to ~20 MB):
{
"filename": "data.xyz",
"content": "<base64-encoded bytes>",
"content_type": "application/x-aarhusxyz-msgpack"
}
For files larger than ~20 MB, use request_upload_token to get a short-lived token, then upload via curl:
curl -X POST "https://host/upload?project_id=..." \
-H "Authorization: Bearer upt_..." \
-F "file=@survey.xyz"
| Parameter | Type | Required | Description |
|---|---|---|---|
project_id |
string | No* | Project ID. Required unless using an upload token (upt_...) that already encodes the project. |
Returns: {"id": "<upload_id>", "filename": "<name>", "url": "<http_url>"}
request_upload_token
POST /upload/request-token
Issue a short-lived Bearer token (prefix upt_) for uploading large files via curl, without passing full session credentials. The token is a signed JWT that expires after 1 hour and is scoped to the same project as the current session.
Requires a project-scoped API key session.
No parameters.
Returns: {"token": "upt_<jwt>", "expires_in": 3600}
Workspaces
list_workspaces
GET /workspaces
List all saved workspaces. Returns id, title, and timestamps for each workspace (layout tree is not included).
No parameters.
Returns: Array of workspace summary objects.
get_workspace
GET /workspace/{workspace_id}
Get the full layout tree for a workspace. Returns a recursive JSON tree of nodes with id, widget, optional children, and widget-specific layoutConfig.
Call get_workspace_schema first to understand valid node structures and widget types.
| Parameter | Type | Required | Description |
|---|---|---|---|
workspace_id |
string | Yes | Workspace ID from list_workspaces. |
Returns: Workspace object including the full layout tree.
create_workspace
POST /workspace
Create a new workspace with a title and layout tree. If an id is provided and already exists, the workspace is updated (upsert behaviour).
The layout must conform to the schema from get_workspace_schema. Always call get_workspace_schema before constructing a layout to discover valid widget types and their layoutConfig schemas.
Request body:
| Field | Type | Required | Description |
|---|---|---|---|
title |
string | No | Display name. Default: "Untitled Workspace". |
layout |
object | No | Recursive node tree. Must conform to the schema from get_workspace_schema. |
id |
string | No | If provided and exists, updates the workspace (upsert). Omit to always create a new one. |
Returns: Created or updated workspace object including the generated id and full layout.
get_workspace_schema
GET /workspace-schema
Return the JSON Schema for the workspace layout format. The schema describes a recursive tree of layout nodes; container widgets (VerticalSplit, HorizontalSplit, TabSet) hold children arrays, and leaf widgets hold layoutConfig.
Returns 503 if widget schemas have not been generated yet. To generate them:
cd frontend && npm run export-schemas
No parameters.
Returns: JSON Schema object with $defs for all registered widget types.
get_app_url
GET /workspace/app-url
Build a deep-link URL that opens the app with specific state pre-selected. All parameters after workspace_id are optional — omit trailing ones to link at a coarser level.
| Parameter | Type | Required | Description |
|---|---|---|---|
workspace_id |
string | Yes | Workspace to open. |
project_id |
string | No | Pre-select a project. |
process_id |
string | No | Pre-select a process. |
version |
integer | No | Pre-select a specific process version. |
part |
string | No | Pre-select a dataset part path. |
sounding |
integer | No | Pre-select a specific sounding index. |
Returns: {"url": "https://..."}