Make Datasets use Google Storage bucket as Cache path

I am training a BERT model on GCP (linux vm) and don’t have enough storage on my vm. So It will be interesting for me to tell datasets library to use my GCP bucket as its cache path.

Can you tell me how to do this please?


Hi ! Have you considered using a streaming dataset ?

Otherwise I guess it should be possible to mount your GCS bucket on your VM, and point your cache to the local path of the GCS bucket.