Static Quantization with Own dataset

aroger · May 31, 2023, 4:15pm

Hi! I want to apply static quantization on my model using my own dataset for calibration.
There is no special explanation on how to use get_calibration_dataset with “external dataset”. As it’s not the same than with Hugging Face dataset, i don’t get if i need to convert my dataset to a special format or specify any configuration. I’ve tried with the path to my csv, even tried in json.

update : it seems to be mandatory to use a dataset from datasets. As load_dataset is called in the function and args seem to be for HG dataset (no data_files argument available)

Will appreciate any info

thanks!

fxmarty · June 1, 2023, 1:40pm

hi @aroger, would this help: Load tabular data?

Then you could follow the examples for calibration, e.g. optimum/run_glue.py at main · huggingface/optimum · GitHub

aroger · June 1, 2023, 1:47pm

hi @fxmarty,
The problem is that in the source code of quantizer.get_calibration_dataset() load_dataset is set like :

calib_dataset = load_dataset(
            dataset_name,
            name=dataset_config_name,
            split=dataset_split,
            use_auth_token=use_auth_token,
        )

as you can see it’s not possible to use what is explained in Load tabular data.

In the example, they use glue which is easier but i want to calibrate my quantization on my own dataset as it matches more my requirements and my use case.

regisss · July 1, 2023, 9:59am

Hi @aroger, so if dataset_name="csv" and a way to provide the data_files argument in get_calibration_dataset is provided that should work. Would you like to open a PR to enable this?

Topic		Replies	Views
Prakash Hinduja Geneva, Switzerland - How to fine-tune a model on custom dataset in HF? Beginners	2	45	June 6, 2025
Can’t generate my own dataset using load_dataset Beginners	1	171	May 7, 2024
How to load local dataset 🤗Datasets	1	1374	May 2, 2023
Huggingface-cli to load_dataset 🤗Datasets	4	3781	March 6, 2024
How to use load_dataset to load my own local dataset? 🤗Datasets	1	905	May 24, 2023

Static Quantization with Own dataset

Related topics