Hey everyone o/
I’m still trying to get plesent results on training SpeechT5 on Japanese and wanted to try switching to the Reazonspeech (medium) model.
Sadly I’m experiencing some strange behaviour when downloading this particular model. Once started, the download speeds up and after the first couple of MB (inconsistend when exactly) it slows down to some two digit kB/s before crashing with ‘ChunkedEncodingError: (ConnectionBroken: IncompleteRead(…))’.
Here is my output:
E:\Programming\python\projects\SpeechT5-jp\venv\Scripts\python.exe E:\Programming\python\projects\SpeechT5-jp\tts_fine-tune.py
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to C:\Users\Maximilian\.cache\huggingface\token
Login successful
Found cached dataset common_voice_13_0 (C:/Users/Maximilian/.cache/huggingface/datasets/mozilla-foundation___common_voice_13_0/ja/13.0.0/2506e9a8950f5807ceae08c2920e814222909fd7f477b74f5d225802e9f04055)
Downloading and preparing dataset reazonspeech/medium to C:/Users/Maximilian/.cache/huggingface/datasets/reazon-research___reazonspeech/medium/1.0.0/00f9d8f336dd718ea4c26dba7be9a2ce3795b9d92903c626baa912de3021ba2d...
Downloading data files: 0%| | 0/64 [00:00<?, ?it/s]
Downloading data: 0%| | 0.00/328M [00:00<?, ?B/s]
Downloading data: 0%| | 4.10k/328M [00:00<5:49:20, 15.7kB/s]
Downloading data: 0%| | 42.0k/328M [00:00<59:53, 91.4kB/s]
Downloading data: 0%| | 91.1k/328M [00:00<40:23, 135kB/s]
Downloading data: 0%| | 206k/328M [00:01<21:27, 255kB/s]
Downloading data: 0%| | 435k/328M [00:01<11:25, 479kB/s]
Downloading data: 0%| | 697k/328M [00:01<08:19, 656kB/s]
Downloading data: 0%| | 1.22M/328M [00:01<04:58, 1.10MB/s]
Downloading data: 1%| | 2.29M/328M [00:02<02:40, 2.04MB/s]
Downloading data: 1%| | 3.58M/328M [00:02<01:29, 3.65MB/s]
Downloading data: 1%|▏ | 4.12M/328M [00:13<28:47, 188kB/s]
Downloading data: 1%|▏ | 4.14M/328M [00:15<32:56, 164kB/s]
Downloading data: 1%|▏ | 4.28M/328M [00:27<32:55, 164kB/s]
Downloading data: 1%|▏ | 4.29M/328M [00:28<1:25:53, 62.9kB/s]
Downloading data: 1%|▏ | 4.30M/328M [00:29<1:27:27, 61.8kB/s]
Downloading data: 1%|▏ | 4.56M/328M [00:35<1:39:41, 54.1kB/s]
Downloading data: 2%|▏ | 6.07M/328M [00:36<31:08, 173kB/s]
Downloading data: 2%|▏ | 6.20M/328M [00:47<31:07, 173kB/s]
Downloading data: 2%|▏ | 6.20M/328M [00:48<1:11:01, 75.6kB/s]
Downloading data: 2%|▏ | 6.22M/328M [00:50<1:17:19, 69.4kB/s]
Downloading data: 2%|▏ | 6.35M/328M [01:00<2:03:35, 43.4kB/s]
Downloading data: 2%|▏ | 6.44M/328M [01:11<3:07:18, 28.6kB/s]
Downloading data: 2%|▏ | 6.45M/328M [01:13<3:28:36, 25.7kB/s]
Downloading data: 2%|▏ | 6.51M/328M [01:17<3:42:53, 24.1kB/s]
Downloading data: 2%|▏ | 6.56M/328M [01:23<4:45:41, 18.8kB/s]
Downloading data: 2%|▏ | 6.59M/328M [01:26<5:02:33, 17.7kB/s]
Downloading data: 2%|▏ | 6.62M/328M [01:29<5:47:07, 15.4kB/s]
Downloading data: 2%|▏ | 6.63M/328M [01:30<5:46:45, 15.5kB/s]
Downloading data: 2%|▏ | 6.64M/328M [01:31<6:05:06, 14.7kB/s]
Downloading data: 2%|▏ | 6.66M/328M [01:32<5:48:51, 15.4kB/s]
Downloading data: 2%|▏ | 6.68M/328M [01:35<7:34:16, 11.8kB/s]
Downloading data: 2%|▏ | 6.69M/328M [01:36<6:54:30, 12.9kB/s]
Downloading data: 2%|▏ | 6.71M/328M [01:38<8:14:41, 10.8kB/s]
Downloading data: 2%|▏ | 6.73M/328M [01:41<9:42:48, 9.20kB/s]
Downloading data: 2%|▏ | 6.74M/328M [01:41<7:34:59, 11.8kB/s]
Downloading data: 2%|▏ | 6.76M/328M [01:41<6:18:47, 14.2kB/s]
Downloading data: 2%|▏ | 6.78M/328M [01:44<8:31:51, 10.5kB/s]
Downloading data: 2%|▏ | 6.79M/328M [01:46<9:45:47, 9.15kB/s]
Downloading data: 2%|▏ | 6.81M/328M [01:47<8:11:04, 10.9kB/s]
Downloading data: 2%|▏ | 6.82M/328M [01:50<10:22:57, 8.60kB/s]
Downloading data: 2%|▏ | 6.84M/328M [01:51<9:25:45, 9.47kB/s]
Downloading data: 2%|▏ | 6.86M/328M [01:52<7:29:03, 11.9kB/s]
Downloading data: 2%|▏ | 6.87M/328M [01:53<6:57:28, 12.8kB/s]
Downloading data: 2%|▏ | 6.89M/328M [01:56<9:34:03, 9.33kB/s]
Downloading data: 2%|▏ | 6.91M/328M [01:57<8:24:59, 10.6kB/s]
Downloading data: 2%|▏ | 6.92M/328M [01:58<7:10:51, 12.4kB/s]
Downloading data: 2%|▏ | 6.94M/328M [01:59<7:35:52, 11.8kB/s]
Downloading data: 2%|▏ | 6.96M/328M [02:02<10:01:34, 8.90kB/s]
Downloading data: 2%|▏ | 6.97M/328M [02:03<8:18:23, 10.7kB/s]
Downloading data: 2%|▏ | 6.99M/328M [02:05<9:40:12, 9.23kB/s]
Downloading data: 2%|▏ | 7.01M/328M [02:07<10:11:46, 8.75kB/s]
Downloading data: 2%|▏ | 7.02M/328M [02:08<8:25:23, 10.6kB/s]
Downloading data: 2%|▏ | 7.04M/328M [02:09<6:45:15, 13.2kB/s]
Downloading data: 2%|▏ | 7.05M/328M [02:11<9:00:33, 9.91kB/s]
Downloading data: 2%|▏ | 7.07M/328M [02:13<9:18:13, 9.59kB/s]
Downloading data: 2%|▏ | 7.09M/328M [02:14<8:13:34, 10.8kB/s]
Downloading data: 2%|▏ | 7.10M/328M [02:17<10:28:12, 8.52kB/s]
Downloading data: 2%|▏ | 7.12M/328M [02:18<8:36:43, 10.4kB/s]
Downloading data: 2%|▏ | 7.14M/328M [02:19<7:18:50, 12.2kB/s]
Downloading data: 2%|▏ | 7.15M/328M [02:20<7:15:32, 12.3kB/s]
Downloading data: 2%|▏ | 7.17M/328M [02:23<9:47:37, 9.11kB/s]
Downloading data: 2%|▏ | 7.19M/328M [02:23<7:42:37, 11.6kB/s]
Downloading data: 2%|▏ | 7.20M/328M [02:26<10:06:27, 8.83kB/s]
Downloading data: 2%|▏ | 7.22M/328M [02:29<10:55:36, 8.16kB/s]
Downloading data: 2%|▏ | 7.23M/328M [02:29<8:30:16, 10.5kB/s]
Downloading data: 2%|▏ | 7.25M/328M [02:30<6:48:33, 13.1kB/s]
Downloading data: 2%|▏ | 7.27M/328M [02:33<9:47:49, 9.10kB/s]
Downloading data: 2%|▏ | 7.28M/328M [02:36<12:27:32, 7.16kB/s]
Downloading data: 2%|▏ | 7.30M/328M [02:40<14:51:08, 6.00kB/s]
Downloading data: 2%|▏ | 7.32M/328M [02:43<16:00:56, 5.57kB/s]
Downloading data: 2%|▏ | 7.33M/328M [02:47<16:49:50, 5.30kB/s]
Downloading data: 2%|▏ | 7.35M/328M [02:51<17:54:42, 4.98kB/s]
Downloading data: 2%|▏ | 7.37M/328M [02:54<18:09:36, 4.91kB/s]
Downloading data: 2%|▏ | 7.38M/328M [02:57<18:19:16, 4.87kB/s]
Downloading data: 2%|▏ | 7.40M/328M [03:01<18:26:22, 4.83kB/s]
Downloading data: 2%|▏ | 7.41M/328M [03:05<19:02:48, 4.68kB/s]
Downloading data: 2%|▏ | 7.43M/328M [03:08<18:57:13, 4.70kB/s]
Downloading data: 2%|▏ | 7.45M/328M [03:12<18:53:02, 4.72kB/s]
Downloading data: 2%|▏ | 7.46M/328M [03:15<19:21:29, 4.60kB/s]
Downloading data: 2%|▏ | 7.48M/328M [03:19<19:09:43, 4.65kB/s]
Downloading data: 2%|▏ | 7.50M/328M [03:22<19:01:21, 4.69kB/s]
Downloading data: 2%|▏ | 7.51M/328M [03:26<19:25:14, 4.59kB/s]
Downloading data: 2%|▏ | 7.53M/328M [03:29<19:11:59, 4.64kB/s]
Downloading data: 2%|▏ | 7.55M/328M [03:33<19:03:17, 4.68kB/s]
Downloading data: 2%|▏ | 7.56M/328M [03:37<19:28:06, 4.58kB/s]
Downloading data: 2%|▏ | 7.58M/328M [03:40<19:14:08, 4.63kB/s]
Downloading data: 2%|▏ | 7.60M/328M [03:43<19:04:41, 4.67kB/s]
Downloading data: 2%|▏ | 7.61M/328M [03:47<18:57:47, 4.70kB/s]
Downloading data: 2%|▏ | 7.63M/328M [03:51<19:24:04, 4.59kB/s]
Downloading data: 2%|▏ | 7.64M/328M [03:54<19:11:02, 4.64kB/s]
Downloading data: 2%|▏ | 7.66M/328M [03:57<19:02:39, 4.68kB/s]
Downloading data: 2%|▏ | 7.68M/328M [04:01<19:26:22, 4.58kB/s]
Downloading data: 2%|▏ | 7.69M/328M [04:05<19:13:04, 4.63kB/s]
Downloading data: 2%|▏ | 7.71M/328M [04:08<19:03:07, 4.67kB/s]
Downloading data: 2%|▏ | 7.73M/328M [04:12<19:27:00, 4.58kB/s]
Downloading data: 2%|▏ | 7.74M/328M [04:15<19:13:19, 4.63kB/s]
Downloading data: 2%|▏ | 7.76M/328M [04:19<19:04:06, 4.67kB/s]
Downloading data: 2%|▏ | 7.78M/328M [04:21<17:07:55, 5.20kB/s]
Downloading data: 2%|▏ | 7.79M/328M [04:21<12:25:10, 7.17kB/s]
Downloading data: 2%|▏ | 7.81M/328M [04:22<9:07:10, 9.76kB/s]
Downloading data: 2%|▏ | 7.86M/328M [04:22<4:15:23, 20.9kB/s]
Downloading data: 2%|▏ | 7.96M/328M [04:22<1:46:42, 50.0kB/s]
Downloading data: 2%|▏ | 8.14M/328M [04:22<45:35, 117kB/s]
Downloading data: 3%|▎ | 8.51M/328M [04:23<18:24, 290kB/s]
Downloading data: 3%|▎ | 9.25M/328M [04:23<07:45, 686kB/s]
Downloading data: 3%|▎ | 10.5M/328M [04:23<03:16, 1.61MB/s]
Downloading data: 3%|▎ | 11.0M/328M [04:23<02:58, 1.78MB/s]
Downloading data: 4%|▎ | 12.3M/328M [04:23<1:53:19, 46.5kB/s]
Traceback (most recent call last):
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\urllib3\response.py", line 710, in _error_catcher
yield
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\urllib3\response.py", line 835, in _raw_read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(12264344 bytes read, 316091496 more expected)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\requests\models.py", line 816, in generate
yield from self.raw.stream(chunk_size, decode_content=True)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\urllib3\response.py", line 940, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\urllib3\response.py", line 911, in read
data = self._raw_read(amt)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\urllib3\response.py", line 835, in _raw_read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "C:\Users\Maximilian\AppData\Local\Programs\Python\Python38\lib\contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\urllib3\response.py", line 727, in _error_catcher
raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(12264344 bytes read, 316091496 more expected)', IncompleteRead(12264344 bytes read, 316091496 more expected))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:\Programming\python\projects\SpeechT5-jp\tts_fine-tune.py", line 33, in <module>
dataset_train = load_dataset("reazon-research/reazonspeech", "medium", split="train")
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\load.py", line 1809, in load_dataset
builder_instance.download_and_prepare(
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\builder.py", line 909, in download_and_prepare
self._download_and_prepare(
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\builder.py", line 1670, in _download_and_prepare
super()._download_and_prepare(
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\builder.py", line 982, in _download_and_prepare
split_generators = self._split_generators(dl_manager, **split_generators_kwargs)
File "C:\Users\Maximilian\.cache\huggingface\modules\datasets_modules\datasets\reazon-research--reazonspeech\00f9d8f336dd718ea4c26dba7be9a2ce3795b9d92903c626baa912de3021ba2d\reazonspeech.py", line 84, in _split_generators
archive_paths = dl_manager.download(url)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\download\download_manager.py", line 427, in download
downloaded_path_or_paths = map_nested(
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\utils\py_utils.py", line 444, in map_nested
mapped = [
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\utils\py_utils.py", line 445, in <listcomp>
_single_map_nested((function, obj, types, None, True, None))
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\utils\py_utils.py", line 347, in _single_map_nested
return function(data_struct)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\download\download_manager.py", line 453, in _download
return cached_path(url_or_filename, download_config=download_config)
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\utils\file_utils.py", line 182, in cached_path
output_path = get_from_cache(
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\utils\file_utils.py", line 610, in get_from_cache
http_get(
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\datasets\utils\file_utils.py", line 402, in http_get
for chunk in response.iter_content(chunk_size=1024):
File "E:\Programming\python\projects\SpeechT5-jp\venv\lib\site-packages\requests\models.py", line 818, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(12264344 bytes read, 316091496 more expected)', IncompleteRead(12264344 bytes read, 316091496 more expected))
Downloading data files: 14%|█▍ | 9/64 [04:26<27:06, 29.57s/it]
Process finished with exit code 1
I would love to hear your thoughts on this issue. If you think you might need some additional information, feel free to ask and I’ll provide it.