PLATFORM: Kaggle
The progress bar remains at 0% while loading the dataset even though dataset loads completely fine without any error.
Here’s code snippet:
# INPUT
dataset = load_dataset("audiofolder", data_dir=f"{DATASET_DIR}/train")
# OUTPUT
Resolving data files: 0%| | 0/275 [00:00<?, ?it/s]
Similar behaviour happens with dataset.map()
function. Since progress bar dosen’t showing any progress, I can’t figure out whether my code of dataset.map()
function is correct or not.
1 Like
The progress bar in the HF library is quite sensitive, and it sometimes appears and sometimes doesn’t. In the past, there have been cases where it didn’t appear due to bugs.
There may also be cases where it doesn’t work properly if the tqdm version is not compatible.
opened 12:59PM - 16 Aug 23 UTC
closed 02:24PM - 29 Apr 24 UTC
good first issue
At the moment, `huggingface_hub` overrides tqdm progress bars to be able to enab… le and disable them globally either using [HF_HUB_DISABLE_PROGRESS_BARS](https://huggingface.co/docs/huggingface_hub/package_reference/environment_variables#hfhubdisableprogressbars) environment variable or [`enable_progress_bars`/`disable_progress_bars`](https://huggingface.co/docs/huggingface_hub/package_reference/utilities#configure-progress-bars) helpers (env variable having priority). These helpers are meant to be reused by other HF libraries built on top of `huggingface_hub`.
As mentioned by @BenjaminBossan on [slack](https://huggingface.slack.com/archives/C038TF8B7UZ/p1692184828234889) (private), the problem with this approach is that a library can disable progress bars from other libraries which can be misleading. A possible solution would be to add a new parameter `group: Optional[str] = None` to the progress bars working similarly to logger names. Then, we can enable/disable progress bars only from specific libs:
```py
disable_progress_bars() # disable everything
disable_progress_bars("peft") # disable progress bar in a group starting by "peft" (e.g. `peft.foo.bar`)
disable_progress_bars("peft.foo")
```
Having something like this could prove useful to enable/disable progress bars when running PEFT, a transformers training or a huggingface_hub push without enabling/disabling other cases (typically disable some progress bars in a local training). And still have HF_HUB_DISABLE_PROGRESS_BARS to disable everything (typically disable all progress bars in a production environment). Such a change would be backward compatible and we could iteratively add groups to the existing progress bars we have.
No need to be as complete as the logging module. We can just work with prefixes and that's all. No need to handle configuration files or more detailed environment variables.
---
Other suggestion: instead of setting `group="peft"` manually in the library implementation, we could retrieve it from the stack traceback with a bit of Python magic (when `group=None`). Haven't tried it myself but it would make the change immediately backward compatible in existing libraries. I'm not so opinionated about it. **EDIT:** dropping this idea. Explicit is better than implicit.