File size limits of DataSets?

farman · 12 February 2024 09:12

Hey everyone!
We have a need to store larger Machine Learning models as datasets. Sometimes we’d need to store a tarball at sizes up to 1 TB.
Have you tried storing large volumes in one DataSet, or do you recommend using another way to store it on OpenBIS?

My Idea is to use the Datamover service on our servers for rsync the data to openbis generic dropbox. Please let me know if you have any fancier solution in place!

Best,
Filip

artur.pedziwilk · 14 February 2024 10:38

Dear @farman,
please clarify bit more here.

Are you asking about to ensure openBIS can correctly handle single files of 1TB?
Or are you asking what are our experience and recommendation to design regular data flow of 1TB files into openBIS dropboxes?

Because I see your question a bit exclusive. Yes, we have experience storing large volumes in one dataset but does it mean large volumes of several files of 1TB? Yes, we have several data flow designed for large data sets, as well as thousands of different files reaching to 1TB files as well as single large files per import.

Datamover is more to address complex connectivity situation where you cannot reach data source directly from openBIS instance. It is not likely to address small or large files problem.
Desiging data flow the choice to be made is to use openBIS dropbox with “auto-detection” or “marker-file” as “incoming-data-completeness-condition”.
Fast bandwidth, less files of smaller size favour choosing “auto-detection”, while slow bandwidth, more files of larger size favour "marker-file.

farman · 14 February 2024 11:02

Thank you Artur for an elaborate answer.
I was most interested in knowing if openBIS can handle a single file of 1TB in a dataset. For now, these larger files will probably be 10-20 single 1TB files where each will be connected to it’s own dataset.
We’d use the dropbox importer with a marker-file as “incoming-data-completeness-condition”.

Based on your answer, I feel confident we can proceed with the strategy we have in mind.

Thank you!

Topic		Replies	Views
Dropbox: possible to automatically convert to "linked data"? Installation & maintenance	6	254	18 April 2023
Dataset Archiving Alternatives Data Management	2	26	7 March 2025
Interact with openBIS via python from scripts Data Management	25	948	5 February 2025
Handling of Datasets in openBIS 7 Developments	3	58	18 February 2025
Possible to disable upload for some users? Installation & maintenance	9	191	14 March 2024

File size limits of DataSets?

Related topics