Static Quantization with Own dataset

Hi! I want to apply static quantization on my model using my own dataset for calibration.
There is no special explanation on how to use get_calibration_dataset with “external dataset”. As it’s not the same than with Hugging Face dataset, i don’t get if i need to convert my dataset to a special format or specify any configuration. I’ve tried with the path to my csv, even tried in json.

update : it seems to be mandatory to use a dataset from datasets. As load_dataset is called in the function and args seem to be for HG dataset (no data_files argument available)

Will appreciate any info

thanks!

1 Like

hi @aroger, would this help: Load tabular data?

Then you could follow the examples for calibration, e.g. optimum/run_glue.py at main · huggingface/optimum · GitHub

hi @fxmarty,
The problem is that in the source code of quantizer.get_calibration_dataset() load_dataset is set like :

calib_dataset = load_dataset(
            dataset_name,
            name=dataset_config_name,
            split=dataset_split,
            use_auth_token=use_auth_token,
        )

as you can see it’s not possible to use what is explained in Load tabular data.

In the example, they use glue which is easier but i want to calibrate my quantization on my own dataset as it matches more my requirements and my use case.

Hi @aroger, so if dataset_name="csv" and a way to provide the data_files argument in get_calibration_dataset is provided that should work. Would you like to open a PR to enable this?