Nagelfluh User Guide
This guide explains how to use Nagelfluh for geophysics data processing.
Getting Started
After installation (see Deployment Guide), open your browser to http://localhost:3000.
First Time Setup
- Select Environment: Choose "Bootstrap" from the environment dropdown (top of screen)
- Explore the Interface: The default layout shows:
- FlowView (left): Visual graph of processes
- ProcessEditor (top right): Create/edit processes
- ProcessLog (bottom right): Real-time logs
You can rearrange these widgets by dragging panes, creating splits, or opening tabs.
Understanding the Interface
Main Widgets
FlowView - Process Graph
Shows a visual graph of all processes and their dependencies:
- Nodes: Each process appears as a node
- Connections: Lines show data flow (input → process → output)
- Active Process: Highlighted node (click to select)
- Drag: Rearrange nodes for better visibility
- Zoom: Mouse wheel or pinch to zoom in/out
ProcessEditor - Create and Edit Processes
Dual-mode editor that changes based on whether a process is selected:
Create Mode (no process selected): 1. Select Process Type: Choose from dropdown (e.g., "fft", "inversion") 2. Enter Process Name: Give your process a meaningful name 3. Configure Resources: - CPU: 0.1 - 8 cores (default: 1) - Memory: 0.5 - 32 GB (default: 2 GB) - Deadline: 1 - 1440 minutes (default: 60 minutes) 4. View Cost Estimate: See maximum possible cost 5. Fill Parameters: Form fields based on process type 6. Submit: Click to create and run process
Edit Mode (process selected): - View current parameters - See output datasets - Create new version with modified parameters - Cancel a version that is still queued or running - View process status and history
ProcessLog - Real-time Logs
Displays logs from running and completed processes:
- Status Badges: Color-coded process states
- 🔵 Pending: Queued, waiting for resources
- 🟡 Running: Currently executing
- 🟢 Completed: Finished successfully
- 🔴 Failed: Execution error
- Auto-scroll: Automatically scrolls to latest logs
- Filter: Click process to see only its logs
- Persistent: Logs remain after process completes
PlotView - Data Visualization
Interactive scientific plotting:
- Add Plot Elements: Configure data visualization
- Select dataset from process outputs
- Choose plot type (Line, Points, etc.)
- Configure colors, labels, units
- Interactive: Zoom, pan, hover for details
- Multi-dataset: Overlay multiple datasets
- Unit Matching: Automatic axis assignment by units
MapView - Geographic Visualization
Display survey data on interactive maps:
- Flight Lines: Visualize survey paths
- Data Coverage: See spatial distribution
- Interactive: Pan, zoom, click for details
Layout Customization
Creating Splits
Right-click pane header → "Split Horizontal" or "Split Vertical"
Or drag a pane to edge of another pane to create split.
Creating Tabs
Drag a pane to the center of another pane to create tabs.
Popout Windows
Click ⧉ button in pane header to open in separate window (great for multi-monitor setups).
Changing Widget Type
Use dropdown in pane header to switch widget (e.g., PlotView → MapView).
Closing Panes
Click × button in pane header.
Creating and Running Processes
Process Lifecycle
- Create Process: Define parameters and resources
- Estimation: System calculates maximum cost
- Validation: Checks balance and parameter schema
- Hold Funds: Reserves maximum possible cost
- Queuing: Kueue queues job until resources available
- Execution: Kubernetes pod runs process, streams logs
- Completion: Actual cost charged, held funds released
- Outputs: Datasets registered and available for visualization
To stop a process before it finishes, click the Cancel button in ProcessEditor while the version is shown as pending or running. The Kubernetes job is deleted immediately and the version is marked as failed.
Step-by-Step: Creating a Process
-
Deselect any process: Click empty area in FlowView (ProcessEditor shows "Create" mode)
-
Select process type: Choose from dropdown
- fft: Fast Fourier Transform analysis
- inversion: Geophysical inversion
- processing: AEM data processing
-
import_data: Import external data
-
Name your process: Enter descriptive name (e.g., "FFT Analysis - Line 1")
-
Configure resources:
CPU Cores: - 0.1 cores: Light processing - 1 core: Standard processing (default) - 2-4 cores: Heavy computation - 8 cores: Maximum (very intensive)
Memory: - 0.5 GB: Minimal data - 2 GB: Standard (default) - 4-8 GB: Large datasets - 16-32 GB: Very large datasets
Deadline: - How long process is allowed to run before timeout - Be generous - unused time doesn't cost extra - Default: 60 minutes
- Review cost estimate: Shows maximum possible cost based on deadline
- Actual cost will be less (based on actual runtime)
-
Example: 1 core, 2 GB, 60 min → ~$0.0024 max
-
Fill in parameters:
- Parameters depend on process type
- Dataset fields: Use searchable dropdown to select from previous process outputs
- Type to search by process name or dataset name
- Format: "Process Name / v123 / dataset-name"
- Grouped when >4 datasets from same process
-
Other fields: Numbers, text, dropdowns as needed
-
Submit: Click "Create Process" button
-
Monitor progress:
- Process appears in FlowView
- Logs stream to ProcessLog
- Status updates in real-time
Example: Running FFT on Imported Data
Assume you've already run an "Import Data" process:
- Click "Create Process" mode in ProcessEditor
- Select process type: fft
- Name: "FFT - Survey Line 1"
- Resources: 1 core, 2 GB, 60 minutes (defaults are fine)
- Parameters:
- Input Data: Search "Import", select the import process output
- Click "Submit"
- Watch FlowView - new "FFT - Survey Line 1" node appears
- Watch ProcessLog - see "Starting FFT...", progress messages
- When complete, status shows 🟢 Completed
- Click the process to view outputs in ProcessEditor
Working with Datasets
What are Datasets?
Datasets are output files from processes. Each process can produce multiple datasets (e.g., "result", "diagnostics", "metadata").
Dataset Types
- AEM Data (.msgpack): Airborne electromagnetic survey data
- Resistivity Models (.msgpack): Inversion results
- Plots (.png, .jpg): Generated figures
- Tables (.csv): Tabular data
- Maps (.geojson, .geotiff): Geographic data
Accessing Datasets
In ProcessEditor (when process selected): - "Outputs" section lists all datasets - Click dataset name to download - Copy URL to share or use in API calls
In PlotView: - Add plot element - Select dataset from searchable dropdown - Visualize immediately
In MapView: - Select geographic datasets - Overlay on map
Dataset Search
When selecting datasets in forms:
- Type to search: Searches process names and dataset names
- Auto-complete: Matches partial names
- Grouped results: Many datasets from same process → shows count
- Click group: Refines search to that process
- Debounced: Waits 300ms after typing before searching
Monitoring Processes
In the UI
FlowView Status
- Node color: Indicates process state
- Connections: Show data dependencies
- Click node: Select process to see details
ProcessLog
Real-time log streaming: - All processes: Shows logs from all processes by default - Filter by process: Click process in FlowView to filter - Auto-scroll: Keeps latest logs visible - Status badges: Quick state overview
ProcessEditor Status
When a process is selected: - State: Current process state - Parameters: What settings were used - Outputs: Links to result datasets - History: Version history if process was modified
Via Command Line (Advanced)
For administrators or developers:
# Check jobs
kubectl get jobs -n nagelfluh-jobs
# Check pods
kubectl get pods -n nagelfluh-jobs
# Stream logs
kubectl logs -f <pod-name> -n nagelfluh-jobs
# Check queue status
kubectl get workloads -n nagelfluh-jobs
Billing and Costs
How Billing Works
Nagelfluh uses a hold/release billing model to ensure fair resource pricing:
- Process Creation:
- System calculates maximum possible cost (deadline × resources)
- Checks your account balance
- If sufficient: Creates HOLD transaction (reserves funds)
-
If insufficient: Rejects process creation
-
Process Execution:
- Pod runs in Kubernetes cluster
- Uses resources (CPU, memory)
-
Streams logs to UI
-
Process Completion:
- System calculates actual cost (actual runtime × resources used)
- Creates DEBIT transaction (charges actual cost)
- Creates RELEASE transaction (frees remaining held funds)
- Updates account balance
Cost Formula
Max Cost = (CPU cores × $0.0001/minute) + (Memory GB × $0.00002/minute) × Deadline
Actual Cost = (CPU cores × $0.0001/minute) + (Memory GB × $0.00002/minute) × Actual Runtime
Example Costs
| Configuration | Deadline | Max Cost | 5-second Runtime | 60-minute Runtime |
|---|---|---|---|---|
| 1 core, 2 GB | 60 min | $0.0024 | ~$0.0006 | $0.0024 |
| 4 cores, 8 GB | 120 min | $0.0384 | ~$0.0024 | $0.0192 |
| 8 cores, 16 GB | 240 min | $0.1536 | ~$0.0048 | $0.0768 |
Tips for Managing Costs
- Set realistic deadlines: Don't overestimate - you're not charged for unused time
- Right-size resources: Start with defaults (1 core, 2 GB), increase if needed
- Monitor usage: Check ProcessLog to see how long processes actually run
- Reuse results: Datasets persist - don't re-run unnecessarily
- Test with small data: Validate workflow before processing full datasets
Viewing Balance and Transactions
(UI features coming soon)
- View current balance
- See transaction history
- Filter by transaction type (HOLD, DEBIT, RELEASE)
- Download billing statements
Managing Projects and Environments
Projects
Each project has: - Isolated storage bucket: Per-project S3/GCS bucket - Separate processes: Processes don't cross projects - Dedicated credentials: Scoped IAM permissions - Independent billing: Track costs per project
Environments
Environments define the available process types and Docker images:
- Bootstrap: Default environment with basic process types
- Custom environments: Created via "create_environment" process
- Define custom Docker images
- Install specific libraries
- Configure environment variables
Creating Custom Environments
(Coming soon: Process-based environment builder)
- Run "create_environment" process
- Specify base image and dependencies
- System builds Docker image
- New environment appears in dropdown
Troubleshooting
Process Stuck in "Pending"
Cause: Insufficient cluster resources
Solutions:
1. Wait - Kueue will schedule when resources free up
2. Check cluster capacity: kubectl get nodes
3. Reduce resource requirements (fewer cores/memory)
4. Contact administrator to scale cluster
Process Failed Immediately
Cause: Parameter validation error or missing dependencies
Solutions: 1. Check ProcessLog for error messages 2. Verify all required parameters filled 3. Check dataset URLs are valid 4. Ensure input datasets exist
Can't Find Dataset in Selector
Cause: Dataset not created yet or search too broad
Solutions: 1. Verify source process completed successfully 2. Refine search - type more specific process name 3. Click grouped results to narrow search 4. Check ProcessEditor outputs of source process
Logs Not Updating
Cause: WebSocket connection lost
Solutions:
1. Refresh browser page
2. Check browser console for errors
3. Verify backend is running: curl http://localhost:8000
4. Check network connectivity
Process Exceeded Deadline
Cause: Process took longer than deadline setting
Solutions: 1. Increase deadline in next run 2. Optimize process parameters (smaller dataset, fewer iterations) 3. Increase CPU cores to speed up processing 4. Check if process hung (logs stopped updating)
Storage Permission Denied
Cause: IAM policy misconfiguration
Solutions:
1. Verify project storage was created automatically
2. Check Kubernetes secret exists: kubectl get secret project-{id}-storage
3. Contact administrator to verify IAM policies
4. Check ProcessLog for specific error message
Out of Balance
Cause: Insufficient funds for process creation
Solutions: 1. Check current balance (UI coming soon) 2. Contact administrator to add funds 3. Reduce resource requirements or deadline 4. Delete unnecessary processes to free held funds (if cancelled)
Best Practices
Process Naming
- Be descriptive: "FFT Line 1 - High Frequency" not "test1"
- Include context: Survey name, line number, variant
- Use consistent format: Makes searching easier
Resource Allocation
- Start conservative: Use defaults, increase if needed
- Monitor actual usage: Check logs for "actual runtime"
- Right-size: Don't request 8 cores for simple tasks
- Generous deadlines: Better to overestimate than timeout
Dataset Management
- Descriptive output names: Name outputs clearly in process code
- Document parameters: Include metadata in outputs
- Clean up old data: Delete unnecessary datasets (UI coming soon)
- Organize by project: Keep related work in same project
Workflow Organization
- Use FlowView: Arrange nodes to show workflow clearly
- Version control: Create new process versions rather than deleting
- Document decisions: Use process names to indicate variations
- Save layouts: Layout persists in browser
Performance Tips
- Parallel processing: Run independent processes simultaneously
- Reuse datasets: Don't re-import or re-process unnecessarily
- Optimize parameters: Reduce iterations, simplify models for testing
- Use smaller samples: Test workflows on subset before full dataset
Keyboard Shortcuts
(Coming soon)
- Ctrl+N: New process
- Ctrl+S: Save layout
- Ctrl+F: Search datasets
- Esc: Deselect process
- Delete: Remove selected process
Getting Help
Documentation
- Architecture Docs: Understand how it works
- Development Guide: For contributors
- Deployment Guide: For administrators
Support
- GitHub Issues: https://github.com/emerald-geomodelling/nagelfluh/issues
- Documentation: Check
/helpcommand in application - Logs: Always include ProcessLog output when reporting issues
Glossary
- Process: A computational job that transforms data
- Dataset: Output file from a process
- Environment: Collection of available process types
- Project: Isolated workspace with own storage and billing
- Widget: UI component (FlowView, ProcessEditor, etc.)
- Pane: Container for a widget in the layout
- Kueue: Job queuing system that manages cluster resources
- Pod: Kubernetes container that runs a process
- AEM: Airborne Electromagnetic (geophysical survey method)
- Inversion: Geophysical processing to estimate resistivity