System Architecture
Overview
Nagelfluh uses a distributed architecture with React frontend, FastAPI backend, and Kubernetes-based process execution:
Frontend (React) → Backend (FastAPI) → Kubernetes Cluster
├─> Kueue → Job → Pod (process execution)
├─> Log streaming via WebSocket
└─> MinIO (development) / GCS/S3 (production)
└─> Per-project buckets with IAM
Data Model
The user-facing data model consists of environments, process types, processes, and datasets:
┌─────────────────────────────────────────────────────────────┐
│ Environment │
│ - Collection of available process types │
│ - Defines Docker image and dependencies │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Process Type (e.g., "fft", "inversion") │ │
│ │ - Defines process behavior │ │
│ │ - JSON Schema for parameters │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ Process Instance │ │ │
│ │ │ - User-created execution │ │ │
│ │ │ - Name, resource requirements │ │ │
│ │ │ - Versions (parameter snapshots) │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────┐ │ │ │
│ │ │ │ Parameters │ │ │ │
│ │ │ │ - Validated against schema │ │ │ │
│ │ │ │ - May reference input datasets │ │ │ │
│ │ │ │ (URLs to other process outputs) │ │ │ │
│ │ │ └──────────────────────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────┐ │ │ │
│ │ │ │ Output Datasets │ │ │ │
│ │ │ │ - Created by process execution │ │ │ │
│ │ │ │ - Stored in project bucket │ │ │ │
│ │ │ │ - Can be inputs to other processes │ │ │ │
│ │ │ └──────────────────────────────────────────┘ │ │ │
│ │ └────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Relationships:
Environment → has many → Process Types
Process Type → has schema → defines Parameters
Process Type → instantiated as → Process Instances
Process Instance → has → Parameters (validated by schema)
Process Instance → creates → Output Datasets
Process Instance → references → Input Datasets (from other processes)
Dataset → stored in → Project Bucket (per-project isolation)
Key concepts:
- Environment: A container environment with specific process types available (e.g., "Bootstrap" with basic types, custom environments with specialized libraries)
- Process Type: A template defining what a process does (FFT, inversion, etc.) and what parameters it accepts (via JSON Schema)
- Process Instance: A specific execution of a process type with user-provided parameters and resource requirements
- Parameters: User inputs validated against the process type's schema, may include references to datasets from other processes
- Datasets: Output files from process execution, stored in per-project S3/GCS buckets, can be used as inputs to other processes
Data flow example: 1. User selects Environment → sees available Process Types 2. User creates Process Instance → fills Parameters (validated by schema) 3. Parameters may reference Input Datasets (outputs from previous processes) 4. Process executes → creates Output Datasets 5. Output Datasets → available as inputs to subsequent processes
See also: - Process Types - Creating and registering process types - Environment - Docker images, entrypoints, and schema extraction
Backend Components
FastAPI Server
- REST API: Process creation, dataset retrieval, process type schemas
- WebSocket endpoints: Real-time log streaming and state updates
- Authentication: JWT-based user authentication
- Database: SQLite (development) / PostgreSQL (production) with Alembic migrations
Kubernetes Client
- Kubernetes API integration for job management
- Creates and monitors Jobs in the
nagelfluh-jobsnamespace - Handles job lifecycle and cleanup (TTL 1 hour after completion)
Job Orchestrator
- Creates Kubernetes Job manifests with:
- Resource limits (CPU, memory, ephemeral storage)
- Deadline enforcement
- Kueue annotations for queuing
- Storage credentials via Kubernetes secrets
- Environment variables for process configuration
Log Collector
- Streams pod logs to ProcessLog database table
- Broadcasts logs to connected WebSocket clients
- Real-time updates for process state changes
- Persistent log storage for historical viewing
Database Schema
- Users: Authentication and billing
- Projects: Multi-tenant project isolation
- Processes: Process definitions with versioning
- ProcessVersions: Parameter snapshots, outputs, state
- ProcessLogs: Timestamped log entries
- Datasets: Dataset metadata and references
- Uploads: User-uploaded files
- Transactions: Billing history (HOLD, DEBIT, RELEASE)
Frontend Components
Flexout Layout System
A custom drag-and-drop layout engine for flexible UI arrangement: - LayoutContext: Manages recursive layout tree structure - Built-in widgets: Split (vertical/horizontal), TabSet, Empty - Pane component: Individual draggable/droppable pane with header controls - Popout support: Detach panes to separate windows - MenuContext: Global menu registration system
Layout tree structure:
{
id: "unique-id",
widget: "WidgetName", // e.g., "FlowView", "VerticalSplit"
children: [...] // For Split/TabSet widgets
}
State Management
ProcessContext
Global state for: - All processes and their versions - Active process selection - Real-time updates via WebSocket - Process creation and editing
API Client
- Centralized API calls to backend (
http://localhost:8000) - Process CRUD operations
- Dataset retrieval
- Process type schema fetching
Core Widgets
FlowView
- Visual graph of processes using ReactFlow
- Shows process dependencies (input/output relationships)
- Click to set active process
- Drag to rearrange graph layout
ProcessEditor
Dual-mode editor: - Create mode (no active process): Form to create new process - Select process type - Fill JSON Schema form with parameters - Resource configuration (CPU, memory, deadline) - Cost estimation - Edit mode (active process): View/edit existing process - Create new versions with modified parameters - Cancel a queued or running version - View output datasets
ProcessLog
- Real-time log streaming with status badges
- WebSocket connection to backend
- Filterable by process
- Persistent across sessions
PlotView
Plotly-based visualization with: - Plot elements registry: Extensible element types (Line, Points, etc.) - Unit matching: Automatic axis assignment based on data units - Dynamic trace building: Loads data from datasets, builds Plotly traces - Configuration form: Add/configure plot elements with dataset selection
MapView
Geographic visualization of survey data with interactive features.
WebSocket Clients
- Log streaming: Real-time process logs
- State updates: Process state changes (running, completed, failed)
- Automatic reconnection on disconnect
- Multiplexed updates for multiple processes
Kubernetes Resources
Namespace
- Name:
nagelfluh-jobs - Purpose: Isolated environment for process execution
- Resources: Jobs, Pods, Secrets, ConfigMaps
Kueue Configuration
- Local queue:
nagelfluh-queue(namespace-scoped) - Cluster queue:
nagelfluh-cluster-queue(cluster-wide resource management) - Resource limits: Enforced CPU, memory, ephemeral storage
- Job queuing: Automatic queuing when resources unavailable
- Admission control: Jobs admitted based on available quota
Job Structure
Each process creates a Kubernetes Job with:
- Name: process-{process_id}-v{version}
- Labels: nagelfluh.app=process, process-id={id}, version={v}
- Annotations: Kueue queue-name
- Resource requests/limits: User-specified CPU/memory
- Deadline: activeDeadlineSeconds for timeout enforcement
- Backoff limit: 0 (no automatic retries)
- TTL: 3600 seconds (1 hour cleanup after completion)
Pod Configuration
- Image:
nagelfluh-base-runner:latest(or environment-specific image) - Environment variables:
PROCESS_TYPE: Type of process to runPROCESS_ID: Unique process identifierVERSION: Process version numberPROJECT_ID: Project identifier for multi-tenancyPARAMETERS_JSON: Serialized process parametersBACKEND_URL: Backend API endpointSTORAGE_BASE: S3/GCS bucket pathSTORAGE_ENDPOINT: MinIO endpoint (development only)AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY: From Kubernetes secret- Restart policy: Never (controlled by Kueue)
Data Flow
Process Creation
- User fills form in ProcessEditor
- Frontend validates parameters against JSON Schema
- POST to
/processwith: - Process type
- Parameters (may include dataset URLs)
- Resource requirements
- Backend:
- Checks user balance vs. estimated cost
- Creates HOLD transaction
- Creates ProcessVersion record
- Creates Kubernetes Job with Kueue annotations
- Returns process ID
Process Cancellation
- User clicks "Cancel" in ProcessEditor (visible only for queued/running versions)
- POST to
/process/{id}/versions/{version}/cancel - Backend:
- Verifies version is in
queuedorrunningstate (returns 409 otherwise) - Deletes the Kubernetes Job if one was submitted
- Adds a log entry "Process cancelled by user"
- Marks version as
failedand broadcasts state update via WebSocket
Process Execution
- Kueue admits Job when resources available
- Kubernetes creates Pod
- Pod container runs
runner.py: - Loads process type class via entrypoints
- Deserializes parameters
- Calls
process_class.run(storage_context, **params) - Writes outputs to storage
- Reports results to backend
- Backend:
- Collects logs from pod
- Updates ProcessVersion state
- Creates Dataset records for outputs
- Calculates actual cost
- Creates DEBIT and RELEASE transactions
Dataset Access
- Frontend requests dataset: GET
/dataset/{id} - Backend:
- Looks up dataset metadata (storage URL, mime type)
- Verifies user has access to parent project
- Fetches data from storage (S3/GCS/MinIO)
- Returns data with appropriate content-type
- Frontend consumes dataset (plots, downloads, etc.)
Real-time Updates
- Backend monitors pod logs via Kubernetes API
- New log lines:
- Stored in ProcessLog table
- Broadcast to WebSocket clients
- State changes:
- ProcessVersion.state updated
- Broadcast to WebSocket clients
- Frontend:
- ProcessLog widget displays logs
- FlowView updates process node status
- ProcessEditor shows current state
Security Model
Authentication
- JWT tokens for user sessions
- Secrets stored in environment variables
- Per-user database isolation
Storage Access Control
- Per-project S3/GCS buckets
- IAM policies with path-based conditions
- Process pods get scoped credentials:
- READ: All uploads and datasets in project
- WRITE: Only to own process directory
- No cross-project access
- See Storage Architecture for details
Network Policies
Future enhancement: Pod network isolation
Monitoring and Observability
Current Implementation
- Real-time log streaming to UI
- Process state tracking (pending, running, completed, failed)
- Job events from Kubernetes API
Future Enhancements
- Prometheus metrics collection
- CPU/memory usage tracking
- Grafana dashboards
- Alert notifications
- Performance profiling