Extracting Training Data from GPT-2 (+ Differential Privacy)

n2cholas · June 6, 2021, 6:12am

Carlini et al. (2020) (https://arxiv.org/pdf/2012.07805.pdf) show that it is possible to extract portions of training examples from language models. It would be cool to demo this with HuggingFace, then show that we can prevent this extraction by training these models in a differentially private manner. JAX is particularly well suited to running DPSGD efficiently, so this project is based on the Flax GPT-2 implementation.

So far, in this notebook, I fine-tuned GPT2 on wikitext, then tried to extract training examples from the model using the techniques proposed in Carlini et al. I have not been able to get any sections of wikitext, and no longer have the bandwidth to continue this project.

If anyone’s interested in continuing this project, I’d be happy to help you get started.

Roughly, here are some potential next steps:

Successfully extract training samples some from the fine-tuned GPT-2.
Use the filtering techniques described in the paper to extract training examples in a sample-efficient way (i.e. a large proportion of candidates are really from the training data).
Fine-tune GPT-2 using DPSGD (example linked in notebook), ideally achieving a perplexity similar to the original.
Demonstrate that no training samples can be extracted from the differentially private version.

NikolaSelic · July 24, 2022, 1:40pm

I am interested in continuing this project. I have experience with differential privacy and other privacy preserving AI methods.

I’ve took a look at the paper and the provided notebook and got the gist of it. Is there anything else I should keep in mind about this project?

lzy337 · November 9, 2023, 3:43am

Nice to find this post as a rookie in DP field. Any update here?

Topic		Replies	Views
Finetune GPT2 in tensorflow on custom data example programmatically Beginners	0	491	July 23, 2020
Pretrain gpt2 example Beginners	0	313	June 11, 2021
Task-specific fine-tuning of GPT2 Research	0	1051	April 22, 2021
PreTrain GPT-2 from scratch for German on novel GC4 dataset Flax/JAX Projects	7	1212	July 2, 2021
PreTrain GPT2 from scratch in Punjabi Flax/JAX Projects	2	427	June 29, 2021

Extracting Training Data from GPT-2 (+ Differential Privacy)

Related topics