Dear Juan,
many thanks, that would be great.
We have a collaboration of ~30 groups that are spread out across a number of institutions, some of which at the same university, others at different universities. One group (us) will host the openBIS instance for everyone, but we cannot expose the filesystems, or provide storage for everyone.
A number of groups will generate “small” data that we (at least in the current plan) can attach to the ELN/LIMS, and/or essentially create an object to represent a measurement and then use parent/child relationships in the ELN to represent samples, measurement machines, the measured values themselves, and some additional information.
This will all fit into the openBIS instance (we think)
A smaller number of groups (around 5 of these 30) will generate a larger number of datasets, this can be simulations, as well as data created by, for example, electron microscopes with several dedicated detectors, or other machines.
The upper limit of what at least one group says they can achieve is 1 TB of data per day, although I would guess most of the groups that produce more data will, at least initially and for the near future, be around somewhere 10… 100 TB of data (for the foreseable future in total).
The various groups will need to exchange data, at least to some extend, and since most groups will not have the compute capacity to handle these data locally, we will need to do that on the central university cluster. It would be good to avoid copying data around and instead serve it from the central data store which is, in our case, only this S3 storage we have access to…
(the cluster also has storage, but that is considered volatile, so we cannot store data there longer term and expect it to be there)
Some of the data should be “pipelined” across the groups, i.e. group A records a large-ish amount of data, this will be analysed by group B at a different institution, and then passed on to group C, etc
(ideally, or so we hope, we can automate at least part of the process).
Budget is, for all intents and purposes, effectively 0 since all funding is going to different RDM initiatives that, at least for now and the near future, does not meet the requirements of the collaboration. Maybe that will change in the future but for now we don’t see an alternative to using openBIS.
Many thanks in advance and all the best
Ulrich