Common Infrastructure
Documents shared infrastructure components used by all scientific pipelines in Nagelfluh.
Sensitivity Matrix Caching
Both magnetic inversion systems use a hash-keyed caching mechanism to avoid recomputing the sensitivity (Jacobian) matrix when the survey geometry and parameters are unchanged.
Algorithm
Source: sensitivity_cache.py (simplemag)
sensitivity_hash(receiver_locations, field_params, mesh_params, model_type) → SHA-256 hex string
The hash is computed from: - Receiver 3D locations (array of x, y, z coordinates) - Earth field (B₀ intensity, inclination, declination) - Mesh parameters (cell size, padding, depth, refinement levels) - Model type (scalar/vector)
Storage
Two modes:
1. "disk" (default for both systems): Sensitivity matrix G is computed once, written to sensitivity_path/<hash>/ as .npy files, and memory-mapped during inversion. Enables large surveys (>few GB).
2. "ram": Full G in memory — fastest but memory-bound.
3. "forward_only": Never cache (recompute each Jvec product) — slowest.
Blob Storage Sync
The process wrappers (equiv_source_process.py, inversion_3d_process.py) sync the hash-keyed sensitivity directories to/from blob storage:
- Download before inversion: Any existing cached G for this geometry is retrieved
- Upload after inversion: Updated G is persisted for future runs
This enables efficient iterative refinement — the expensive sensitivity computation is a one-time cost per survey geometry.
Data Formats
msgpack (Primary AEM data container)
AEM data flows as msgpack-serialized libaarhusxyz.XYZ objects. The format embeds:
- flightlines: DataFrame of per-sounding attributes (position, altitude, tilt, current)
- layer_data: Dict of per-gate DataFrames (Gate, InUse, STD, relErr)
- model_info: Metadata (gate times, projection, scalefactor)
- layer_params: Layer geometry (dep_top, dep_bot) for model data
The libaarhusxyz.export.msgpack module handles serialization/deserialization. The format supports numpy arrays efficiently via msgpack-numpy.
webxtile (Gridded output format)
Both AEM gridding and magnetic inversion outputs use webxtile — a tiled format designed for WebGL rendering (used by the gladly frontend). An xarray Dataset is partitioned into tiles along spatial dimensions, enabling progressive loading and client-side visualization without downloading the full volume.
MagData (Magnetics container)
AirMagTools.MagData wraps a pandas DataFrame with:
- MultiIndex: (line, fidcount)
- Columns: easting, northing, gpsalt, magcom, diurnal, surface, utctime, flight
- meta dict: CRS, field parameters, sample frequency
Persisted via pandas.to_pickle().
Entry Point Registration
The project uses Python's importlib.metadata entry points for plugin-like registration of all process types, system descriptions, and processing steps:
# setup.py or pyproject.toml
[entry_points]
nagelfluh.process_types =
import = aem_processes.aem_processes.import_process:Import
processing = aem_processes.aem_processes.processing_process:Processing
inversion = aem_processes.aem_processes.inversion_process:Inversion
forward = aem_processes.aem_processes.forward_process:Forward
gridding = aem_processes.aem_processes.gridding_process:Gridding
mag_import = mag_processes.mag_processes.import_process:MagImport
mag_processing = mag_processes.mag_processes.processing_process:MagProcessing
mag_equiv_source = mag_processes.mag_processes.equiv_source_process:MagEquivSource
mag_inversion_3d = mag_processes.mag_processes.inversion_3d_process:MagInversion3D
simpeg.static_instrument =
SingleMomentTEMXYZSystem = simpeg.electromagnetics.utils.static_instrument.single:SingleMomentTEMXYZSystem
DualMomentTEMXYZSystem = simpeg.electromagnetics.utils.static_instrument.dual:DualMomentTEMXYZSystem
emeraldprocessing.pipeline_step =
correct_altitude_and_topo = emeraldprocessing.tem.corrections:correct_altitude_and_topo
cull_roll_pitch_alt = emeraldprocessing.tem.culling:cull_roll_pitch_alt
moving_average_filter = emeraldprocessing.tem.corrections:moving_average_filter
...
mag_pipeline.filters =
set_constants = AirMagTools.magfilters:set_constants
diurnal_qc_for_15s_chord = AirMagTools.magfilters:diurnal_qc_for_15s_chord
noise_qc = AirMagTools.magfilters:noise_qc
...
nagelfluh.mag_equiv_source_systems =
MagEquivalentSourceSystem = mag_inversion.equivalent_source:MagEquivalentSourceSystem
nagelfluh.mag_inversion_3d_systems =
MagInversion3DSystem = mag_inversion.full_3d:MagInversion3DSystem
swaggerspect Schema Generation
The swaggerspect library dynamically generates JSON Schemas by introspecting entry-point groups. For process types that have variable parameters (system descriptions, processing steps), the schema is built at runtime:
# Inversion schema: reads simpeg.static_instrument entry points
schema["properties"]["system"] = swaggerspect.swagger_to_json_schema(
swaggerspect.get_apis("simpeg.static_instrument"),
multi=False
)
# Processing schema: reads emeraldprocessing.pipeline_step entry points
schema["properties"]["steps"] = swaggerspect.swagger_to_json_schema(
swaggerspect.get_apis("emeraldprocessing.pipeline_step"),
multi=True
)
This enables the frontend to auto-generate configuration forms from the available system descriptions and processing steps without hardcoding any UI.
Dataset Writing Utilities
Source: docker/base-runner/aem_processes/dataset_utils.py
The write_dataset() function handles:
1. Creating a UUID-based dataset directory under the process's storage path
2. Writing msgpack, GeoJSON, XYZ, and differential msgpack representations
3. Writing a info.json manifest with MIME types and file references
The magnetic pipeline uses write_webxtile_dataset() for gridded outputs, which writes the xarray Dataset as webxtile tiles and creates the dataset manifest.