Dataset preview: pyarrow.lib.ArrowTypeError: ("Expected bytes, got a 'float' object"

Hello there,

uploading a dataset with a wide table annotation: test/metadata.csv · stefanches/genomic-bioimaging at main

and pyarrow apparently struggles with the metadata read-in.

I was able to solve it locally using pyarrow.lib.ArrowTypeError: "Expected a string or bytes object, got a 'int' object" · Issue #349 · wesm/feather · GitHub , but it seems I could not correct the code for the dataset preview on HF? Appreciate any advice, happy to provide more info.

1 Like

Stacktrace from HF:

1 Like

cc @lhoestq

1 Like

I guess the issue comes from some column containing a mix of strings and floats ?

Can you check your CSV doesn’t have mixed types ?

1 Like

It is indeed the problem. In the column “Cluster” I had e.g. entries like “4+5”, which I think were read as strings on par with single-cluster entries.

I could not figure out any sane way to save these mixed type columns as string columns, quoting did not help - so I dropped these columns entirely as a fast-hand solution. But of course it is not a good practice.

1 Like

Ok, btw don’t hesitate to try formats that make typing easier such as JSON Lines, or even Parquet

1 Like

Oh, great tip - I will try that next time!

1 Like