I have a folder with 56818 files - totalling c. 135GB. Each file is of type .htm.
I am trying to push this to HF hub as a dataset.
I am using the huggingface_hub
Python library. Version = 0.20.2. I also have hf_transfer
enabled (version = 0.1.4).
I am using this:
api.upload_folder(
folder_path=docs.name,
path_in_repo="<path>",
repo_id="<repo>",
repo_type="dataset",
multi_commits=True,
multi_commits_verbose=True
).
The command keeps erroring out with 400 - Comment must be less than 65536 chars.
Looking through the code, it appears that the issue is likely in the function multi_commit_generate_comment
(_multi_commits.py
) on this line:
multi_commit_strategy="\n".join( str(commit) for commit in strategy.deletion_commits + strategy.addition_commits
.
I am unclear what the purpose of this line is.
Therefore, wondering if I am incorrectly using the multi_commits=True
argument incorrectly or whether this an edge condition that I have hit.
Any help appreciated on how to get my data on HF quickly. Thanks.