DaFab’s Data Management with DASI
By Metin Cakircali, Simon Smart
Workflows processing Earth Observation (EO) data have a problem – the body of available EO data is vast. And growing rapidly. Within the DaFab EU project, AI-driven workflows must process massive quantities of EO data, made available by the Copernicus project, in an efficient and reliable manner. This presents a range of problems, including locating the relevant data, decoupling relatively fast and scalable compute tasks from slower data transfers, storing the data in a way that the workflows can use it, and managing the lifetime of any temporary copies required. This is where DASI (the Data Access and Storage Interface) plays a critical role. It provides the smart bridge between storage systems and compute environments. DASI’s semantically driven data management design helps build intelligent, scalable, and optimized AI workflows in the DaFab project.
What is DASI?
DASI is a library and a set of tooling to manage data across a range of different backend storage systems. It provides an interface and API to store and access data according to semantic metadata, and without applications needing to know where or how the data is physically stored.

Figure 1. DASI architecture
Think of DASI as a smart layer between applications and the underlying storage infrastructure – whether that’s a local file system, cloud object storage, or dedicated a high-performance computing (HPC) parallel filesystem (see Figure 1). Instead of relying on rigid file paths or UUIDs, DASI uses domain-specific metadata—such as model, experiment, or date—to identify and retrieve data (see Figure 2). This approach enables fast, efficient, and reproducible workflows across diverse environments which can be easily written and understood by domain scientists rather.

Figure 2. Semantic description of data.
DASI is built on ECMWF’s Fields Database (FDB), production-grade object store that handles all of the operational data throughput from ECMWF’s time-critical forecasting systems, and inherits ECMWF’s long experience managing data according to semantic data including the 650 PiB of meteorological data archived in the Meteorological Archival and Retrieval System (MARS).
DASI Data Schema
At the heart of DASI’s semantically driven architecture is a schema, which defines how data is described, organized, and accessed. A schema is a collection of rules, and each rule is a hierarchical tree of keywords that reflects the logical structure of the data. Each data object is described using a set of key-value pairs (for example project: DaFab, city: Bonn, year: 2025), which together form a unique identifier.
This approach allows users to query data semantically, for example: “Retrieve all data from year 2025.” Queries and data keys can be constructed systematically and even in advance of the production of data itself.
Further, the hierarchical levels of the schema drive the way in which DASI collocates data. Data which is most highly related, and most likely to be produced or accessed together will be stored physically close together on underlying storage media. Figure 3 illustrates how a DASI rule abstracts the storage backend and enables intelligent data access.
Figure 3. DASI rule that describes data and abstracts storage backends.
Learn more in the official documentation: https://dasi.readthedocs.io/en/latest/
Explore the source code on GitHub: https://github.com/DaFab-AI-eu/dasi
Earth Observation (EO) Data and Copernicus Integration
Within the DaFab ecosystem DASI is used to manage the Earth Observation (EO) data originating in the Copernicus programme (which is Europe’s flagship initiative for monitoring the environment using satellite and in-situ data). Copernicus produces vast amounts of EO data daily, covering climate, land, marine, and atmospheric domains. These datasets are:
- Large-scale assets: terabytes per day,
- Metadata: sensor type, resolution, acquisition time, geolocation, etc.
- Multi-dimensional: spatial, temporal, spectral.
These datasets are far too large to be consumed in one go, or stored locally for processing within the DaFab project. An approach whereby data is pulled incrementally into intermediate storage, and processed on a rolling basis is called for. In the product ingestion path, the Earth Observation metadata and assets are fetched and stored via DASI into the DaFab’s ecosystem. The process is depicted in Figure 4, which can be briefly summarized as,
- Searching products in Copernicus STAC catalogue,
- Extracting metadata that describes the product and its assets,
- Fetching the assets from Copernicus EO S3 endpoints,
- Storing the data in DASI using a defined schema rule.

Figure 4. Product ingestion (Metadata and Assets) from Copernicus via DASI.
Once ingested, EO products are consumed by AI-driven workflows via the DASI API (see Figure 5). These workflows include feature extraction from satellite imagery, climate variable classification and model training using EO time-series data.

Figure 5. AI workflows consuming original data via DASI API.
DASI ensures that data access is fast, reproducible, and infrastructure-independent—empowering researchers to focus on insights rather than logistics.
Final Thoughts
The wide availability of EO data presents huge opportunities to society – but there are real obstacles in the way of realising these benefits. Accessing these volumes of data is hard, and workflows processing it need to be carefully designed.
We believe that a semantically-driven approach, with domain-specialised tools for handling, storing and making data available to downstream workflows will simplify and facilitate their development and accelerate obtaining real, actionable, scientifically driven insight from EO datasets.
In the DaFab project, DASI provides this approach to data handling.
Stay tuned for future developments!
Metin Cakircali
,
Simon Smart
ECMWF
ECMWF