Rucio’s New Metadata Intelligence
Usability, Impact, and a New Horizon for DaFab and the Global Rucio Community
Over the past year, the DaFab project has become a catalyst for the evolution of the Rucio data management system. While initially designed to support the ATLAS experiment at Cern, today Rucio serves a far wider community of scientific collaborations with complex data needs. The DaFab initiative, centered on extracting value from massive Copernicus Earth Observation archives, has pushed Rucio into new territory, beyond file cataloguing and distributed data placement, and into the realm of rich semantic metadata and powerful filtering.
DaFab’s Data Management with DASI
Workflows processing Earth Observation (EO) data have a problem – the body of available EO data is vast. And growing rapidly. Within the DaFab EU project, AI-driven workflows must process massive quantities of EO data, made available by the Copernicus project, in an efficient and reliable manner. This presents a range of problems, including locating the relevant data, decoupling relatively fast and scalable compute tasks from slower data transfers, storing the data in a way that the workflows can use it, and managing the lifetime of any temporary copies required. This is where DASI (the Data Access and Storage Interface) plays a critical role. It provides the smart bridge between storage systems and compute environments. DASI’s semantically driven data management design helps build intelligent, scalable, and optimized AI workflows in the DaFab project.
Summer School: AI for Earth Observation and Scalable Data Management
Held in Ljubljana from 22nd to 25th of September 2025, this intensive summer school offers a unique opportunity to delve into the cutting-edge intersection of Artificial Intelligence (AI) and Earth Observation (EO), combined with the essential skills for managing large-scale data and workflows in modern computing environments. Participants will gain theoretical knowledge and practical experience in applying AI techniques to analyze EO data, optimizing AI performance, managing complex workflows with Kubernetes, and handling massive datasets.