Hello, I hope you’re well.
I’m writing to you because I’ve noticed a strange error when reading with polars from a dataset hosted on HF.
The code and dataset I use are as follows:
import polars as pl
dataframe = pl.read_parquet(
"hf://datasets/louisbrulenaudet/code-voirie-routiere/data/train-00000-of-00001.parquet"
)
Whereas for another dataset, this code works very well:
import polars as pl
dataframe = pl.read_parquet(
"hf://datasets/louisbrulenaudet/bofip/data/train-00000-of-00001.parquet"
)
Here’s the error message I’m getting, I don’t know whether it’s related to the automatic conversion to parquet or not:
at Function.wrapKernelMethodImpl (/Users/~/.vscode/extensions/ms-toolsai.jupyter-2024.8.2024080201-darwin-x64/dist/extension.node.js:304:82402)
09:26:01.466 [info] Process Execution: ~/.pyenv/versions/3.11.7/bin/python -c "import ipykernel; print(ipykernel.__version__); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.__file__)"
09:26:01.480 [info] Process Execution: ~/.pyenv/versions/3.11.7/bin/python -m ipykernel_launcher --f=/Users/~/Library/Jupyter/runtime/kernel-v2-7904DltItC7jYZG5.json
> cwd: ~/Desktop
09:26:02.722 [info] Restarted 002ef1e7-54fc-423e-9ef9-1d94beba5809
09:26:06.790 [error] Disposing session as kernel process died ExitCode: undefined, Reason: thread 'polars-7' panicked at crates/polars-parquet/src/parquet/read/compression.rs:222:17:
Found compressed page in the middle of the pages
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'polars-2' panicked at crates/polars-parquet/src/parquet/read/compression.rs:222:17:
Found compressed page in the middle of the pages
thread 'polars-6' panicked at crates/polars-parquet/src/parquet/read/compression.rs:222:17:
Thank you in advance, I remain available to test potential solutions.
Best regards, Louis