I’m working with the huggingface_hub client library, which works so smooth!
The reason to create this post is that I noticed that when calling the function
list_datasets with the parameter
siblings field (which has the names of the repository files) is always
However, when calling the function
list_repo_files with the parameter
repo_type set to “dataset” we can retrieve all the files in the repository.
siblings attribute are of the class
ModelFile. Is there an ongoing implementation for the DatasetFile or another way to retrieve the filenames of a dataset repository with the
Thank you in advance
list_datasets function is used to filter all datasets on Hub with a given filter, meanwhile
list_repo_files iterates over files of a given repository and thus you get siblings. Since they serve different purposes on different scopes, I’d suggest you to use
list_repo_files to list siblings in a given repository.
I understand the idea here, but I was wondering is why is there a field in
list_datasets while never gives back the file’s list.
Hi! I agree we should either fetch this info with
full=True or remove the field. cc @Wauplin
The idea is that we have in
ModelInfo object to describe a model. Depending on the use case, not all information is fetched from the server (especially listing all files from each repo when listing all repos). If an information is not fetched, is it set to
None. Maybe not optimal but not sure we want to change this anytime soon. A solution to change this would be to have different classes
ModelInfoXXX depending on the context but not sure it will be easier to use from a user perspective.