Hey, I am doing some experiments with transformer models and I found it extremely easier to work with data stored in SQL database than arrow/parquet files.
It is easier to browse and select, or create subsets for different models and store them in different tables etc.
Is there any good library to use SQL as data source for
models? Specifically, I am looking for a lightweight solution to link Google Colab to use data from my postgresql database (running on a dedicated host).
1 Like
You could try implementing your own PyTorch Dataset
class based on a Postgres client in python
Hello alexxale…great thanks for posting this question . are you found any information about your question?
Is there any good library to use SQL as data source for
models? Specifically, I am looking for a lightweight solution to link Google Colab to use data from my postgresql database (running on a dedicated host).
You can use IterableDataset.from_generator
with a generator that fetches the data in batches (postgresql - Fetching data from postgres database in batch (python) - Stack Overflow)
PosgresML also looks promising for this task considering its integration with Transformers (but it requires Docker, so not Colab-friendly)
1 Like