openBIS for collaboration of independent groups with (some) replication?

Dear all,

We’re aiminig to use openBis as the research data management infrastructure for our proposed collaborative research effort. By the nature of the scheme, the groups are at independent research groups across multiple universities and research facilities, i.e. there is no central IT department that would serve them all.

We are currently investigating the best way to set this up, ideally we would like to have it all (yes, I know…) … All data and metadata should be available to all researchers, and each researcher is part of multiple collaborations (at last, some of them) - but they will all come to like openBis and manage their data with it (but as as different research efforts are independent, they cannot be shared)…

From my (still limited) understanding, our options would be:

  • have one central openBis instance that is used by all researchers within the collaboration for this research effort. That would enable us to share experiments, standard-operating procedures, data, etc. but it would put the burden of hosting the system and all data to the “volunteer” group that runs the system.

  • Use openBIS - Sync and have more than one instance of openBIS
    However, I’m not sure I understand the concept fully (or if it falls within our intended use-case)

    • is it limited to two instances (source/harvester) or can we have multiple instances syncing each other?
    • Do I understand correctly that the synchronisation happens both ways, i.e. data/metadata created on the source will be transferred to the harvester and the other way around? What happes with shared experiments/result/data?
    • If we create new templates/experiments/standard operating procedures/etc, is it sufficient to create them on one system and they will be propagated to others?
    • can the synchronisation happen partially (the documentation says something about this?)?
      I.e. if the research groups would want to use openBIS for multiple projects, can we specify which parts to synchronise (i.e. only the parts that are related to this project - but not the ones related to other projects)?

Many thanks again for your help and support

Thank you for the message. We are working on your questions. Will keep you updated.
Just a reminder.
All specific, openBIS-related questions can be also submitted to the openBIS help desk:
openbis-support@id.ethz.ch

With best regards.
openBIS team

Dear kerzel

As a former member of the openBIS development team I try to answer your question about the sync feature:

  1. openBIS sync isn’t limited to two instances. But you have to deal with name clashes by using prefixes on the harvester side if you sync with more than one data source. Look for configuration flag translate-using-data-source-alias in the documentation.
  2. Synchronization is always one way from data source to harvester. But you can set up harvester maintenance task on every openBIS instance. Sharing data should be possible but note that data with same identifier will be replaced by the data coming from the data source.
  3. Data created on one openBIS instance can be transfered to other instances if they have configured a harvester maintenance task.
  4. Partial synchronization is possible only on the level of spaces.
    Note, that we have used the sync feature only to rip off data from one openBIS instance into a freshly set-up new openBIS instance. I wouldn’t be surprised that a more complex scenario reveals some bugs.

Franz-Josef

Dear Franz-Josef,

happy new year and best wishes for 2023!
Many thanks for the detailed answer, in particular it’s very helpful to know that a more complex distributed setup would likely be more an exploratory endeavour.

Best wishes
Ulrich

Dear Franz-Josef,

maybe a follow up question on your comment: “Sharing data should be possible but note that data with same identifier will be replaced by the data coming from the data source.”
How exactly is this identifier determined, i.e. can we use something like namespaces to avoid such a clash?

many thanks again
Ulrich

Dear Ulrich

This should be possible by using the flag translate-using-data-source-alias as mentioned in my last reply.

Franz-Josef

Dear kerzel,
just to add my own experience:
I work at the scientific IT services of Empa, where each group has a separate instance (hosted by SIS through their RDM service).

We have similar requests from our users and realised synchronisation is always going to be a bit tricky because of name clashes. Perhaps one workaround would be to have one “reference” instance, to which all instances synchronise to- and from. Otherwise, the number of synchronisation tasks will growth with the square of the number of instances. In this case still the same caveat with name clashes applies: if you don’t prefix the sources you will have a difference reference instance state depending on the order of synchronisation.

All things considered, if data exchange and an uniform data model are of importance, I would consider the option of having only one instance and organising permissions using spaces.

Best

Simone

Dear Simone

many thanks - if I understand you correctly, a “star-shaped” topology might work best if we were to move to a distributed system at some point. I think we’ll start with one instance for the whole collaboration based on your and others kind advice but we may have to expand on this as the research groups are separate entitities that may (or not) want their own openBIS server, while not having multiple servers for multiple projects…
All the best
Ulrich

1 Like

Dear Ulrich,
this is my understanding based on the advice of other people above. As I am not related to ETH, do not take my advice for an official suggestion!

In my experience, separate instances can cause more problem should the need for collaboration arise. Through spaces you can manage permissions for groups already, while keeping a single DSS and AS instance.

Best

Simone

1 Like