Is there an easier way to query a dataset via duckdb?

Hi,
I’m testing some duckdb query on this dataset:
https://huggingface.co/datasets/aborruso/open_cup_complessivo

Below a sample query and the output. It works great.

Is there an easier way to build it, without listing, for example, all the parquet files from which it is composed? Can wildcards be used?

Thank you

duckdb -c "
SELECT
    CODICE_NATURA_INTERVENTO,
    NATURA_INTERVENTO,
    COUNT(*) AS conteggio
FROM 
    read_parquet(
        list_transform(
            generate_series(0, 19), 
            n -> 'hf://datasets/aborruso/open_cup_complessivo/opencup.parquet/data_' || n || '.parquet' 
        )
    )
GROUP BY
    CODICE_NATURA_INTERVENTO,
    NATURA_INTERVENTO
ORDER BY
    conteggio DESC;
"
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ CODICE_NATURA_INTE…  β”‚                                          NATURA_INTERVENTO                                          β”‚ conteggio β”‚
β”‚       varchar        β”‚                                               varchar                                               β”‚   int64   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 07                   β”‚ CONCESSIONE DI INCENTIVI AD UNITA' PRODUTTIVE                                                       β”‚  15941069 β”‚
β”‚ 02                   β”‚ ACQUISTO O REALIZZAZIONE DI SERVIZI                                                                 β”‚   2649987 β”‚
β”‚ 06                   β”‚ CONCESSIONE DI CONTRIBUTI AD ALTRI SOGGETTI (DIVERSI DA UNITA' PRODUTTIVE)                          β”‚   2419348 β”‚
β”‚ 03                   β”‚ REALIZZAZIONE DI LAVORI PUBBLICI (OPERE ED IMPIANTISTICA)                                           β”‚   2136547 β”‚
β”‚ 01                   β”‚ ACQUISTO DI BENI                                                                                    β”‚    493804 β”‚
β”‚ 08                   β”‚ SOTTOSCRIZIONE INIZIALE O AUMENTO DI CAPITALE SOCIALE (COMPRESI SPIN OFF), FONDI DI RISCHIO O DI …  β”‚      3699 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
2 Likes

Yes,
I can use wildcards:

duckdb -c "
SELECT
    CODICE_NATURA_INTERVENTO,
    NATURA_INTERVENTO,
    COUNT(*) AS conteggio
FROM 'hf://datasets/aborruso/open_cup_complessivo/opencup.parquet/data_*.parquet'
GROUP BY
    CODICE_NATURA_INTERVENTO,
    NATURA_INTERVENTO
ORDER BY
    conteggio DESC;
"
1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.