System Architecture

Overview

Nagelfluh uses a distributed architecture with React frontend, FastAPI backend, and Kubernetes-based process execution:

Frontend (React) → Backend (FastAPI) → Kubernetes Cluster
                                       ├─> Kueue → Job → Pod (process execution)
                                       ├─> Log streaming via WebSocket
                                       └─> MinIO (development) / GCS/S3 (production)
                                           └─> Per-project buckets with IAM

Data Model

The user-facing data model consists of environments, process types, processes, and datasets:

┌─────────────────────────────────────────────────────────────┐
│ Environment                                                  │
│  - Collection of available process types                    │
│  - Defines Docker image and dependencies                    │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ Process Type (e.g., "fft", "inversion")              │  │
│  │  - Defines process behavior                          │  │
│  │  - JSON Schema for parameters                        │  │
│  │                                                       │  │
│  │  ┌────────────────────────────────────────────────┐ │  │
│  │  │ Process Instance                                │ │  │
│  │  │  - User-created execution                       │ │  │
│  │  │  - Name, resource requirements                  │ │  │
│  │  │  - Versions (parameter snapshots)               │ │  │
│  │  │                                                  │ │  │
│  │  │  ┌──────────────────────────────────────────┐  │ │  │
│  │  │  │ Parameters                                │  │ │  │
│  │  │  │  - Validated against schema              │  │ │  │
│  │  │  │  - May reference input datasets          │  │ │  │
│  │  │  │    (URLs to other process outputs)       │  │ │  │
│  │  │  └──────────────────────────────────────────┘  │ │  │
│  │  │                                                  │ │  │
│  │  │  ┌──────────────────────────────────────────┐  │ │  │
│  │  │  │ Output Datasets                          │  │ │  │
│  │  │  │  - Created by process execution          │  │ │  │
│  │  │  │  - Stored in project bucket              │  │ │  │
│  │  │  │  - Can be inputs to other processes      │  │ │  │
│  │  │  └──────────────────────────────────────────┘  │ │  │
│  │  └────────────────────────────────────────────────┘ │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Relationships:
  Environment → has many → Process Types
  Process Type → has schema → defines Parameters
  Process Type → instantiated as → Process Instances
  Process Instance → has → Parameters (validated by schema)
  Process Instance → creates → Output Datasets
  Process Instance → references → Input Datasets (from other processes)
  Dataset → stored in → Project Bucket (per-project isolation)

Key concepts:

Data flow example: 1. User selects Environment → sees available Process Types 2. User creates Process Instance → fills Parameters (validated by schema) 3. Parameters may reference Input Datasets (outputs from previous processes) 4. Process executes → creates Output Datasets 5. Output Datasets → available as inputs to subsequent processes

See also: - Process Types - Creating and registering process types - Environment - Docker images, entrypoints, and schema extraction

Backend Components

FastAPI Server

Kubernetes Client

Job Orchestrator

Log Collector

Database Schema

Frontend Components

Flexout Layout System

A custom drag-and-drop layout engine for flexible UI arrangement: - LayoutContext: Manages recursive layout tree structure - Built-in widgets: Split (vertical/horizontal), TabSet, Empty - Pane component: Individual draggable/droppable pane with header controls - Popout support: Detach panes to separate windows - MenuContext: Global menu registration system

Layout tree structure:

{
  id: "unique-id",
  widget: "WidgetName",  // e.g., "FlowView", "VerticalSplit"
  children: [...]         // For Split/TabSet widgets
}

State Management

ProcessContext

Global state for: - All processes and their versions - Active process selection - Real-time updates via WebSocket - Process creation and editing

API Client

Core Widgets

FlowView

ProcessEditor

Dual-mode editor: - Create mode (no active process): Form to create new process - Select process type - Fill JSON Schema form with parameters - Resource configuration (CPU, memory, deadline) - Cost estimation - Edit mode (active process): View/edit existing process - Create new versions with modified parameters - Cancel a queued or running version - View output datasets

ProcessLog

PlotView

Plotly-based visualization with: - Plot elements registry: Extensible element types (Line, Points, etc.) - Unit matching: Automatic axis assignment based on data units - Dynamic trace building: Loads data from datasets, builds Plotly traces - Configuration form: Add/configure plot elements with dataset selection

MapView

Geographic visualization of survey data with interactive features.

WebSocket Clients

Kubernetes Resources

Namespace

Kueue Configuration

Job Structure

Each process creates a Kubernetes Job with: - Name: process-{process_id}-v{version} - Labels: nagelfluh.app=process, process-id={id}, version={v} - Annotations: Kueue queue-name - Resource requests/limits: User-specified CPU/memory - Deadline: activeDeadlineSeconds for timeout enforcement - Backoff limit: 0 (no automatic retries) - TTL: 3600 seconds (1 hour cleanup after completion)

Pod Configuration

Data Flow

Process Creation

  1. User fills form in ProcessEditor
  2. Frontend validates parameters against JSON Schema
  3. POST to /process with:
  4. Process type
  5. Parameters (may include dataset URLs)
  6. Resource requirements
  7. Backend:
  8. Checks user balance vs. estimated cost
  9. Creates HOLD transaction
  10. Creates ProcessVersion record
  11. Creates Kubernetes Job with Kueue annotations
  12. Returns process ID

Process Cancellation

  1. User clicks "Cancel" in ProcessEditor (visible only for queued/running versions)
  2. POST to /process/{id}/versions/{version}/cancel
  3. Backend:
  4. Verifies version is in queued or running state (returns 409 otherwise)
  5. Deletes the Kubernetes Job if one was submitted
  6. Adds a log entry "Process cancelled by user"
  7. Marks version as failed and broadcasts state update via WebSocket

Process Execution

  1. Kueue admits Job when resources available
  2. Kubernetes creates Pod
  3. Pod container runs runner.py:
  4. Loads process type class via entrypoints
  5. Deserializes parameters
  6. Calls process_class.run(storage_context, **params)
  7. Writes outputs to storage
  8. Reports results to backend
  9. Backend:
  10. Collects logs from pod
  11. Updates ProcessVersion state
  12. Creates Dataset records for outputs
  13. Calculates actual cost
  14. Creates DEBIT and RELEASE transactions

Dataset Access

  1. Frontend requests dataset: GET /dataset/{id}
  2. Backend:
  3. Looks up dataset metadata (storage URL, mime type)
  4. Verifies user has access to parent project
  5. Fetches data from storage (S3/GCS/MinIO)
  6. Returns data with appropriate content-type
  7. Frontend consumes dataset (plots, downloads, etc.)

Real-time Updates

  1. Backend monitors pod logs via Kubernetes API
  2. New log lines:
  3. Stored in ProcessLog table
  4. Broadcast to WebSocket clients
  5. State changes:
  6. ProcessVersion.state updated
  7. Broadcast to WebSocket clients
  8. Frontend:
  9. ProcessLog widget displays logs
  10. FlowView updates process node status
  11. ProcessEditor shows current state

Security Model

Authentication

Storage Access Control

Network Policies

Future enhancement: Pod network isolation

Monitoring and Observability

Current Implementation

Future Enhancements