Avoid standardizing keys for feature values which are a list of dictionaries

Hi, i’m trying to create a HF dataset from a list using Dataset.from_list.

Each sample in the list is a dict with the same keys (which will be my features). The values for each feature are a list of dictionaries, and each such dictionary has a different set of keys. However, the datasets library standardizes all dictionaries under a feature and adds all possible keys (with None value) from all the dictionaries under that feature.

How can I keep the same set of keys as in the original list for each dictionary under a feature?

Here’s a simple example:

from datasets import Dataset

# Define a function to generate a sample with "tools" feature
def generate_sample():
    # Generate random sample data
    sample_data = {
        "text": "Sample text",
        "feature_1": []
    }
    
    # Add feature_1 with random keys for this sample
    feature_1 = [{"key1": "value1"}, {"key2": "value2"}]  # Example feature_1 with random keys
    sample_data["feature_1"].extend(feature_1)
    
    return sample_data

# Generate multiple samples
num_samples = 10
samples = [generate_sample() for _ in range(num_samples)]

# Create a Hugging Face Dataset
dataset = Dataset.from_list(samples)
dataset[0]

The output is

{'text': 'Sample text', 'feature_1': [{'key1': 'value1', 'key2': None}, {'key1': None, 'key2': 'value2'}]}

Instead, I want to construct the dataset such that I get this
{'text': 'Sample text', 'feature_1': [{'key1': 'value1'}, {'key2': 'value2'}]}