Use SQL database as dataset?

Hey, I am doing some experiments with transformer models and I found it extremely easier to work with data stored in SQL database than arrow/parquet files.

It is easier to browse and select, or create subsets for different models and store them in different tables etc.

Is there any good library to use SQL as data source for :hugs: models? Specifically, I am looking for a lightweight solution to link Google Colab to use data from my postgresql database (running on a dedicated host).

1 Like

You could try implementing your own PyTorch Dataset class based on a Postgres client in python

Hello alexxale…great thanks for posting this question . are you found any information about your question?

Is there any good library to use SQL as data source for :hugs: models? Specifically, I am looking for a lightweight solution to link Google Colab to use data from my postgresql database (running on a dedicated host).

You can use IterableDataset.from_generator with a generator that fetches the data in batches (postgresql - Fetching data from postgres database in batch (python) - Stack Overflow)

PosgresML also looks promising for this task considering its integration with Transformers (but it requires Docker, so not Colab-friendly)