I have a series of 3D seismic volumes (Numpy arrays) that I would like to upload as a dataset, but I see that .npy
data type is not supported. Is there a work around to upload the 3D arrays to HF Dataset Hub?
Thanks!
I have a series of 3D seismic volumes (Numpy arrays) that I would like to upload as a dataset, but I see that .npy
data type is not supported. Is there a work around to upload the 3D arrays to HF Dataset Hub?
Thanks!
How about converting it into parquet? (it’s recommended format Uploading datasets)
The following steps should be enough if you can use pandas:
Thanks for this @mahmutc
Here is the solution I came up with. My 3D seismic volumes are arrays with a shape of (300,300,1259). The code below converts to parquet files. I loop this over all seismic files in the training dataset to create .parquet
versions.
def convert_to_parquet(array, file_name, folder):
# Reshape the 3D array into a 2D array where each row represents a 2D slice of the original array
reshaped_array = array.reshape(-1, array.shape[2])
#column names need to be strings
column_names = [f'{i}' for i in range(array.shape[2])]
# Create a pandas DataFrame with the string-based column names
df = pd.DataFrame(reshaped_array, columns=column_names)
# Optionally, add 'Row' and 'Col' as string identifiers for the original 3D coordinates
df['Row'] = np.repeat(np.arange(array.shape[0]), array.shape[1])
df['Col'] = np.tile(np.arange(array.shape[1]), array.shape[0])
# Reorder the columns to have 'Row' and 'Col' first
df = df[['Row', 'Col'] + column_names]
df.to_parquet(f'{folder}/{file_name}.parquet')
return
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.