pyBIS: 'Missing Scheme' for all dataset actions

Hi everybody,

I’m using pyBIS to query my data in a non-productive local openBIS installation running via docker & kubernetes. I can retrieve and add objects/samples, however, once I move on to datasets that I’ve added in the ELN (various functions and properties, e.g. file_list or save()) I keep getting the same error:

Exception in thread Thread-5 (upload_file):
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.10/site-packages/pybis/dataset.py", line 1062, in upload_file
    resp = requests.post(upload_url, data=f, verify=verify_certificates)
  File "/opt/conda/lib/python3.10/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/requests/sessions.py", line 573, in request
    prep = self.prepare_request(req)
  File "/opt/conda/lib/python3.10/site-packages/requests/sessions.py", line 484, in prepare_request
    p.prepare(
  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 368, in prepare
    self.prepare_url(url, params)
  File "/opt/conda/lib/python3.10/site-packages/requests/models.py", line 439, in prepare_url
    raise MissingSchema(
requests.exceptions.MissingSchema: Invalid URL '/datastore_server/session_workspace_file_upload?filename=abc.jpg&id=1&startByte=0&endByte=0&sessionID=admin-123': No scheme supplied. Perhaps you meant http:///datastore_server/session_workspace_file_upload?filename=abc.jpg&id=1&startByte=0&endByte=0&sessionID=admin-13?

Do you know what and why it is happening there? I was wondering if the requests API has changed since you’ve released pyBIS for Python 3.3 initially.

I’m using Python 3.10 in JupyterHub. I should add that I don’t have a dss mounted; however I do have access to the whole openbis folder including the dss locally. What would be the correct way to integrate it into pybis without using “mount” and sshfs, respectively? I’ve instead written my own “mount_local” method that sets the mountpoint to a local folder:

def mount_local(self, mountpoint):
    self.mountpoint = mountpoint
    return self.mountpoint
setattr(pybis.Openbis, 'mount_local', mount_local)

o = pybis.Openbis()
o.mount_local("/home/jovyan/openbis/dss_store/")

Thank you for your help and for all your efforts in general!

Best wishes
Henning

Dear Henning,
Thank you for the message.
Please, could you specify the pyBIS version you have been using?
Here is the link to the latest pyBIS release.

Hi rkuzyakiv,
Thanks for your willingness to help.
Sorry, I didn’t mention it. It’s 1.34.5 actually:
image
Best wishes
Henning

Hi Henning,

looks like you are not specifying a host in your code, e.g.

o = pybis.Openbis("https://openbis.yourorg.domain")

Regards, Swen

Hi Swen,

Thanks for your suggestion. The connection is working fine actually (with disabled certificates), I can retrieve and add objects for instance. The issue only happens when I want to work on datasets.

I had a look at the lines that are mentioned in the code where the error appears. The full download URL is composed of two elements (dataset.py line 408):

full_url = urljoin(self._get_download_url(), dss_endpoint)

dss_endpoint is also defined in dataset.py:

dss_endpoint = "/datastore_server/rmi-data-store-server-v3.json"

get_download_url is defined in dataset.py like this:

    def _get_download_url(self):
        download_url = ""
        if "downloadUrl" in self.data["dataStore"]:
            download_url = self.data["dataStore"]["downloadUrl"]
        else:
            # fallback, if there is no dataStore defined
            datastores = self.openbis.get_datastores()
            download_url = datastores["downloadUrl"][0]
        return download_url

I noticed that somehow in my dataset the datastore is not provided with a download_url:

ds1 = o.get_datasets()[0]
ds1

attribute 	value
code 	20221208132749237-47
permId 	20221208132749237-47
type 	ATTACHMENT
kind 	PHYSICAL
experiment 	/MATERIALS/YEASTS/YEAST_COLLECTION
sample 	/MATERIALS/YEASTS/YEA1
parents 	[]
children 	[]
components 	[]
containers 	[]
tags 	[]
accessDate 	2022-12-08 13:28:24
...
ds1.data["dataStore"]

{'@type': 'as.dto.datastore.DataStore',
 '@id': 22,
 'fetchOptions': {'@type': 'as.dto.datastore.fetchoptions.DataStoreFetchOptions',
  '@id': 23,
  'count': None,
  'from': None,
  'cacheMode': 'NO_CACHE',
  'sort': None,
  'sortBy': None},
 'code': 'DSS1',
 'downloadUrl': '',
 'remoteUrl': 'http://127.0.0.1:8081',
 'registrationDate': 1670498755098,
 'modificationDate': 1670519790938}
o.get_datastores()

	code 	downloadUrl 	remoteUrl
0 	DSS1 		http://127.0.0.1:8081

Can you imagine why this happens?

Best wishes
Henning

Check your DSS service.properties, it should have a download-url key, this key is the download url for the clients, is not used by the servers to communicate. Often is an address behind a proxy and there is no way the DSS knows, that is the reason it needs to be set explicitly. Let us know this solved your issue and on the future please use OTRS for support.

Alright, I got it working by setting the SERVER_HOST_PORT environment variable when starting the docker container to the correct address. I didn’t know that this variable was transported all the way up to the properties of the individual elements. Thank you for pointing me in the right direction!

General question: I thought it would be good to share this kind of question via the forum so other people can also read it in case the same question arises. However, it sounds like you prefer that these kinds of questions are asked via email and the ticketing system. Is there already a general guideline?

Best wishes
Henning