How to download models from HuggingFace through Azure Machine Learning Registry?

While I’m perfectly able to download any models from my own Azure Machine Learning Registry or even the “azureml” registry, if I run the exact same code against the HuggingFace registry I receive the error “Exception: Registry asset URI could not be parsed”.

Steps to reproduce (in my case I used an Azure Compute Instance):

registry_name = "HuggingFace"

from azure.ai.ml import MLClient
ml_client_registry = MLClient(credential=credential, registry_name=registry_name)
m_name    = "openai-gpt"
m_version = 12

m = ml_client_registry.models.get(name=m_name, version=m_version)

m_local_base_path = "./models_from_huggings_registry"

ml_client_registry.models.download(name=m_name, version=m_version, download_path=m_local_base_path)

If I print the “m” variable, it shows the model metadata:

Model({‘job_name’: None, ‘is_anonymous’: False,
‘auto_increment_version’: False, ‘name’: ‘openai-gpt’, ‘description’:
openai-gpt is a pre-trained language model available on the Hugging
Face Hub. It's specifically designed for the text-generation task
in the transformers library. If you want to learn more about the
model's architecture, hyperparameters, limitations, and biases, you
can find this information on the model's dedicated Model Card on the
Hugging Face Hub
.\n\nHere's an
example API request payload that you can use to obtain predictions
from the model:\n\n{\n "inputs": "My name is Julien and I like to"\n}\n\n’, ‘tags’: {‘modelId’: ‘openai-gpt’, ‘task’:
‘text-generation’, ‘library’: ‘transformers’, ‘license’: ‘mit’},
‘properties’: {‘skuBasedEngineIds’:
‘azureml://registries/HuggingFace/models/transformers-cpu-small/labels/latest,azureml://registries/HuggingFace/models/transformers-gpu-medium/labels/latest’,
‘engineEnvironmentVariableOverrides’: ‘{“AZUREML_HF_MODEL_ID”:
“openai-gpt”, “AZUREML_HF_TASK”: “text-generation”}’},
‘print_as_yaml’: True, ‘id’:
‘azureml://registries/HuggingFace/models/openai-gpt/versions/12’,
‘Resource__source_path’: None, ‘base_path’:
‘/mnt/batch/tasks/shared/LS_root/mounts/clusters/dsvm-general-optimized01/code/Users/mauro.minella/git_repos/azuremlnotebooks/MLOPS/notebooks
AMLv2’, ‘creation_context’:
<azure.ai.ml.entities._system_data.SystemData object at
0x7f2602efdf60>, ‘serialize’: <msrest.serialization.Serializer object
at 0x7f25bf52c130>, ‘version’: ‘12’, ‘latest_version’: None, ‘path’:
None, ‘datastore’: None, ‘utc_time_created’: None, ‘flavors’: None,
‘arm_type’: ‘model_version’, ‘type’: ‘preset_model’})

, however the very last instruction that should download the model actually returns the error above, whose full text is here below:

TypeError                                 Traceback (most recent call last)
File /anaconda/envs/azuremlsdkv2mm/lib/python3.10/site-packages/azure/ai/ml/_utils/_storage_utils.py:187, in get_ds_name_and_path_prefix(asset_uri, registry_name)
    186 try:
--> 187     split_paths = re.findall(STORAGE_URI_REGEX, asset_uri)
    188     path_prefix = split_paths[0][3]

File /anaconda/envs/azuremlsdkv2mm/lib/python3.10/re.py:240, in findall(pattern, string, flags)
    233 """Return a list of all non-overlapping matches in the string.
    234 
    235 If one or more capturing groups are present in the pattern, return
   (...)
    238 
    239 Empty matches are included in the result."""
--> 240 return _compile(pattern, flags).findall(string)

TypeError: expected string or bytes-like object

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
Cell In[21], line 6
      2 import mlflow
      4 m_local_base_path = "./models_from_huggings_registry"
----> 6 ml_client_registry.models.download(name=m_name, version=m_version, download_path=m_local_base_path)

File /anaconda/envs/azuremlsdkv2mm/lib/python3.10/site-packages/azure/ai/ml/_telemetry/activity.py:263, in monitor_with_activity.<locals>.monitor.<locals>.wrapper(*args, **kwargs)
    260 @functools.wraps(f)
    261 def wrapper(*args, **kwargs):
    262     with log_activity(logger, activity_name or f.__name__, activity_type, custom_dimensions):
--> 263         return f(*args, **kwargs)

File /anaconda/envs/azuremlsdkv2mm/lib/python3.10/site-packages/azure/ai/ml/operations/_model_operations.py:305, in ModelOperations.download(self, name, version, download_path)
    295 """Download files related to a model.
    296 
    297 :param str name: Name of the model.
   (...)
    301 :raise: ResourceNotFoundError if can't find a model matching provided name.
    302 """
    304 model_uri = self.get(name=name, version=version).path
--> 305 ds_name, path_prefix = get_ds_name_and_path_prefix(model_uri, self._registry_name)
    306 if self._registry_name:
    307     sas_uri = get_storage_details_for_registry_assets(
    308         service_client=self._service_client,
    309         asset_name=name,
   (...)
    314         uri=model_uri,
    315     )

File /anaconda/envs/azuremlsdkv2mm/lib/python3.10/site-packages/azure/ai/ml/_utils/_storage_utils.py:190, in get_ds_name_and_path_prefix(asset_uri, registry_name)
    188         path_prefix = split_paths[0][3]
    189     except Exception:
--> 190         raise Exception("Registry asset URI could not be parsed.")
    191     ds_name = None
    192 else:

Exception: Registry asset URI could not be parsed.