More performant way of accessing children of objects pybis?

fringwald · 13 September 2023 08:29

Dear openBIS enthusiasts,

for some processing we need to check if objects of type ‘SAMPLE’ have a child of a certain type. The child object in question is many hierarchical levels further down. Currently we access the children using pybis as follows:

# Get objects of type 'SAMPLE' and store them in a DataFrame.
samples = o.get_samples(type='SAMPLE').df

# Iterate over the rows of the 'samples' DataFrame.
    for row in samples.iterrows():

        # Get an object using the identifier from the current row.
        obj = o.get_object(row[1].identifier)

        # Iterate through the children of the current object.
        for child in obj.children:
            # Get the child object.
            obj1 = o.get_object(child)
            for child2 in obj1.children:
                obj2 = o.get_object(child2)
                ... repeat and eventually check some condition

Currently, this is not very performant, especially since we have many objects of the type ‘SAMPLE’.
Is there any more performant way you can think of, how we can improve this query?

Thanks in advance and cheers
Friedemann

cedric.kessler · 13 September 2023 14:49

One possibility would be to search for the specific condition and iterate through those results. It may be quicker if you have very few samples that match the requirement.

Another way could be to use jython within python to iterate. The java API is faster.

The last solution could be using tags. You could add those manually with one python script, which would spare you some time afterwards. Or run an entity validation script. You could also add a non editable changeable dynamic property to every object, but this could be quite interrupting.

bafe · 14 September 2023 07:58

Another possibility to still use python is to construct a specific query using JSON directly. You could have a look at how pybis does searches internally and modify the code to achieve this:

A good way to see how search queries are built is to start a global search in the ELN/LIMS UI and intercept the API calls using your browser’s developer tools.

In general, your use case shows that there needs to be an advanced python API which mirrors the Java/Javascript V3 API

juanf · 1 December 2023 11:01

I would support if someone wants to create the advanced python api mirroring the java/Javascript API.
We could even take over the maintenance.
But someone needs to translate 1000+ dtos first and write the facade client.

bafe · 1 December 2023 12:45

I guess the better way would be to write an API schema first then use code generation to generate bindings for the various languages. In the best case the schema can be generated from annotations in the Java code, considering that to be the “golden” implementation.
Translating all of the DTOs manually looks like a fool’s errand

juanf · 15 January 2024 12:03

Going the API Schema route would be ideal.

Our API uses both Inheritance and Polymorphism what should at the end restrict the possible target languages
Also sometimes DOTs contains methods that will need manual translation.
Overall, we would need to make some changes. That said I think those changes are worth.

Do you know any tools that could be use for to generate the API Schema starting from the JavA API?

bafe · 15 January 2024 13:21

Hello Juan,
after my last post, I started working on a small prototype to add an API schema to openBIS.
Since there’s no widespread standard to add a schema to JSON-RPC (not to speak about the lack of tooling), I though the best way would be to add a REST controller to the openBIS AS using Spring boot. This would simply call the corresponding methods from IApplicationServerApi.
Once we have that, we can generate the OpenAPI schema using Springdoc with their Gradle plug-in. I tried and it seems to work pretty well.

The advantage of going the REST/OpenAPI route is that you can use the official OpenAPI code generators to generate DTOs and clients for Python, Typescript etc. There are still some open questions regarding the style of parameter passing, whether to pass them as query parameters, path parameters or body parameters, but thanks to the annotations provided by Spring MVC it is very easy to try different styles.

You can find the prototype in my clone of the openBIS repo here:

A more sophisticated approach would use a different protocol like gRPC. The main advantage of that would be the higher compression in the wire protocol as the data is serialized/deserialized as binary data. The drawback is that there’s hardly any tooling for generating Protobuf interfaces from Java code, which means you would have to write your custom annotation processor.

Yet another API to consider would be GraphQL. There seems to be annotation processors able to generate GraphQL controllers and schema from Java, e.g here

I would be happy to discuss these ideas in a meeting soon.

bafe · 19 January 2024 17:17

Update:

OpenAPI schemas generation works very well springdoc. All it needs is to write method calls for the ~50 methods of the OpenBIS facade (or for IApplicationServerAPI and decide what parameter style to use.
GraphQL schema generation using SPQR-GraphQL proves to be difficult because the API methods use polymorphism and inheritance, and the base classes (i.e SampleSearchCriteria) don’t implement any field, resulting in compile-time errors because GraphQL expects every type to declare fields. To make it work, might need a customized annotation processor. However, if that route is chosen a step-wise migration with a schema-first approach and new API endpoint might be more feasible. In that case, using spring-boot GraphQL could prove very useful as the rest of the application is also based on Spring.

In terms of what API to choose, I think migrating to openAPI and REST would be relatively easy but the client API would look very verbose and similar to the current Java API. GraphQL on the other hand would allow the clients to easily query for only the information they need. In a way it doesn’t look very different from the JSON-RPC calls where we nest multiple SearchCriteria and FetchOptions to retreive only the fields we are interested in.

Topic		Replies	Views
PyBis: accessing parents? Data Management	3	434	7 April 2023
PyBIS property types missing object type info Developments	2	43	13 February 2025
PyBIS 1.37.2 - get_children() returns only one child Developments	2	35	11 April 2025
How to get url from object in Pybis? Data Management	2	72	4 April 2025
pyBIS and apiV3: how to add and use a spreadsheet Developments	12	600	21 July 2023

More performant way of accessing children of objects pybis?

Related topics