Five Steps to make your analysis KM3NeT FAIR¶
As basic thinking model:
Assume that everything of your analysis will be destroyed (including your private laptop and workspace), maybe even in the next five minutes and the only thing which remains is the tape drive in Lyon, the KM3NeT database and the Git server at ECAP. Everything you made has to be reproducible easily from these sources.
Assume that you are new to KM3NeT and your supervisor asks you to redo the analysis from 5 years ago with new data and new models. You don’t know where to start and would like to find everything to make life easier for you.
Step 1 - Put it in Git!¶
An (internally) public repository for all your processing scripts and final results is crucial for a transparent analysis - so put all your stuff in our Gitlab instance and update it regularly!
Step 2 - Reference your data!¶
Your data should not be stored in Git - but everyone needs to be able to find the data you worked with and use your scripts to reproduce your results.
Add, as minimum, a text list of files you used for your processing, and, if data was retrieved from the database, add the scripts that were used to pull the relevant data.
Add your (batch) data access scripts for CCLyon (or for your favorite computing cluster) to the repository, containing e.g. calls with xrootd or similar.
If you stored intermediate data (e.g. your own summary files), point to the current storage of these files (text list).
Step 3 - Add your processing scripts!¶
Everything that is “done” to the data needs to be reproducible in the same computing environment:
Add all your processing scripts to the repository, containing e.g. data reduction steps, data fits etc. (e.g. using aanet, python etc.)
- Capture your software environment
minimum: setenv-script, note down the modules loaded in Lyon
better: report software versions or provide/use Docker/Singularity containers for the software
Follow coding standards (use your favorite search engine to find tutorials on these)! Note the standards for software development provided for KM3NeT members.
Make it nice: use a workflow language (e.g. nextflow) - get in touch with the experts on that in the software WG.
Step 4 - Store your high level data for plots!¶
The “public plots” that will be produced in your analysis should be reproducible for everyone else, and modifications should be made easy.
Store your highest level data (only that, i.e. data displayed in your final plots) in the repository. If necessary, create the data by exporting e.g. bin content of the plot to a file or and provide script for the recreation of plot (max. a few MB).
As example see the public plot template
Step 5 - Always document what you are doing!¶
No data has any meaning unless you add documentation to it.
Start it now! While developing your analysis, write short notes for yourself or inline code comments - it will help you to understand later what you have done.
Provide your internal note/proceeding etc. as reproducible/accessible document (e.g. tex, google doc, autobuild with Git pages, …) and link it to your Git repository or place it there - this makes it easy for others to comment, reuse and share.
Add a README file to all your folders!