Making Dark Data FAIR
The Making Dark Data FAIR project is generously funded by the EOSC.
Making Dark Data FAIR
Modern high-performance computing facilities (HPCs) generate a colossal amount of data. Recent studies have shown that a significant percentage (up to 3.41% of total storage capacity) of the data HPCs produce might never be used again — for reasons as mundane as improper labelling. Given that it’s often the case that large amounts of public money go into the production of that data, this is obviously hugely problematic.
The reasons for the proliferation of this so-called dark data (i.e., data that is not-reusable for a number of different reasons) are varied. Missing or incomplete metadata, non-standardised storage methods, and researchers simply forgetting that the data is there, are just some of the reasons so much data becomes non-reusable.
Recently, the FAIR principles concerning proper data management were introduced in an attempt to reduce the amount of data that is non-reusable. The principles outline a number of pragmatic measures institutions might enact in an effort to reduce the amount of data that is non-reusable.
The project “Making Dark Data FAIR” aims to provide analysis of why dark data goes dark in the first place, develop concrete strategies for how we might best enact the FAIR principles in an effort to reduce dark data, and finally to interrogate the FAIR principles themselves, in an effort to ensure they’re fulfilling the role intended for them (i.e., reducing dark data). While our primary focus is the generation of dark data at HPC facilities in particular, the results of this research are nonetheless widely applicable; the need for the FAIR principles is recognised by a wide variety of parties, not only those interested in high-performance computing.
As part of the project, we are running a series of workshops in late 2020 and early 2021, which will serve as a platform to both discuss and disseminate the results obtained from this project. Workshops will be held both online and in-person, including a workshop at TU Delft. These workshops will bring together a wide variety of stakeholders, including researchers, policymakers, data stewards, and HPC facility personnel.
The project is financed by the European Open Science Cloud Secretariat. The lead coordinator of the project is Juan M. Durán (TU Delft – ). Jack Casey (TU Delft) is the postdoc and contact person ().
The project is a collaboration between TU Delft (The Netherlands), the University of Exeter (UK), the University of Stuttgart (Germany), the CNR-IOM Center (Italy), and the ERC
- di 30 mrt.Zoom
- Data, Society, and Open Science II: Roundtable on the FAIR principles and data-driven scientific practicema 01 mrt.Zoom
Program Announced: Data, Society, and Open Science III Workshop - 30th March 2020
Program: Data, Society, and Open Science III – 30th March 2021
11:30-11:45 - Introduction
11:45-12:15 - Saeedeh Babaii - Institute for Humanities and Cultural Studies
Tehran, Iran - GAN; A Promising Approach to Mitigate the Problem of Bias in AI
Break 2 – 12:15-12:30
12:30-13:00 - Martin Thomas Horsch, Taras Petrenko and Björn Schembera – HLRS Stuttgart - Automated metadata extraction and epistemic FAIRness in the engineering sciences
14:30-15:00 - Juniper Lovato and Randall Harp – The Vermont Complex Systems Center - Ethical Considerations of Dark Data: Making FAIR Data Fair
Break 3 – 15:00-15:15
15:15-15:45 - Fionn McGrath – Trinity College Dublin - An Adornoian critique of machine learning
Break 4 – 15:45-16:00
16:00-16:30 - Giovanni Galli – University of Urbino - Understanding the data-centric sciences, from dark data to Covid-19 tracking
Second Workshop Announced: Data Society and Open Science II: Roundtable on the FAIR principles and data-driven scientific practice
We are delighted to announce our second upcoming workshop in the Data, Society, and Open Science series, which will take the form of a roundtable discussion between invited participants Professor Sabina Leonelli (University of Exeter), Dr Marta Teperek (TU Delft), Dr Niccolò Tempini (University of Exeter), Dr Manuela Fernandez Pinto (University of Los Andes), and Professor Michael Resch (University of Stuttgart). The speakers will engage in a roundtable discussion on a number of issues concerning the FAIR data principles, and data-driven scientific practice. The discussion will take place on the 1st March 2021 (15:30-17:00 CET), and will be held fully online. Please click the link below for more information on this and other upcoming events.
First Workshop Announced: Data, Society, and Open Science
We are pleased to announce the first in our series of three workshops we will be holding as part of the Making Dark Data FAIR project. The workshop will be held on the 10th November 2020, and will be completely online. Click the link below for more details.
First Speaker Announced: Mark Alfano
We are delighted to announce that our first speaker for the Data, Society, and Open Science workshop (10th Nov 2020) will be Mark Alfano, with a talk entitled 'A case study on open data for the public good: The international collaboration on social & moral psychology: COVID-19'.
Clink the link for more details.
Second Speaker Announced: Kees Vuik
We are excited to confirm our second speaker for the Data, Society and Open Science workshop (10th Nov 2020): TU Delft's Kees Vuik. Professor Vuik will be giving a talk entitled 'Reproducible Computational Science'.
Click the link below for more details.
Third Speaker Announced: Björn Schembera
We are delighted to announce that our third speaker for the Data, Society, and Open Science workshop (10th Nov 2020) will be Björn Schembera, with a talk entitled 'Update on Dark Data numbers and activity'
Clink the link below for more details.
Fourth Speaker Announced: Mark Wilkinson
We are excited to confirm that our fourth and final speaker for the Data, Society, and Open Science workshop (10th Nov 2020) will be Mark Wilkinson, with a talk entited FAIR data principles.
Click the link below for more details.