Train a VAE to interpolate on English sentences

Fraser · June 28, 2021, 5:48pm

Transformer-VAE

Convert a T5 model into a variational autoencoder for text.

I have already made a project that does this in PyTorch but its never been trained at scale.

This project is to convert the autoencoder into Flax so it can be trained efficiently on a TPU to train the largest ever Transformer-VAE!

Language

The model will be trained in english.

Model

Build on T5-base, this will match with the Optimus model.

Only additional parameters come from a small Autoencoder module that will go between the encoder and decoder.

Datasets

Use the wikipedia sentences dataset from OPTIMUS.

This comes tokenized so we’ll need to use its tokenizer with T5.

github.com

ChunyuanLI/Optimus/blob/master/download_datasets.md

# Download/Pre-process Datasets

## Wikipedia

Download processed files (11.78G) below, and unzip it (298 files)

https://textae.blob.core.windows.net/optimus/data/datasets/wikipedia_json_64_filtered.zip

Download raw file (11.79G):

https://textae.blob.core.windows.net/optimus/data/datasets/wikipedia.segmented.nltk.txt

Our pre-processing protocal: We split the original wiki text into 298 files, and loop over files in one epoch.

We filter each sentence in wiki based on two constraints: (1) The sentence length is smaller than 64. (2) The tokenized sentence length is smaller than 256 (so that the encoder can take the entire sentence).

To filter the sentence, please change the data folders and run the script:

    sh scripts/scripts_local/run_data_filtering_wiki.sh

This file has been truncated. show original

Training scripts

The original PyTorch training script is adapted from the old Huggingface clm training script so using the flax clm script should be a good base to build on.

Challenges

The original model was made with PyTorch so there will be some features that can’t be ported over. E.g. I added a prism layer to the PyTorch code which requires FFTs.

Desired project outcome

A colab notebook where people can explore the Transformer-VAE’s latent space.

Interpolate between sentences.
Transfer the style/content of one sentence to another.
Do gradient descent on a sentences latent code to get the desired sentiment, classification score, etc.

Reads

Here are some background links to understand the context behind this project:

Initial Transformer-VAE post

and the improvements post.

MMD-VAE, the MMD loss that Transformer-VAE uses.
OPTIMUS Current SOTA text-vae will give a sense of the outputs would should expet.

Fraser · June 28, 2021, 5:51pm

How does this work?

Currenlty the takes a transformer encoder and decoder and puts a VAE between them.

The VAE forms a compressed latent code which allows interpolating on the training data.

For regulatisation an MMD loss is used instead of KL Divergence which reduces posterior collapse.

Fraser · June 28, 2021, 5:52pm

More Detailed ToDos Sign in to GitHub · GitHub

Fraser · June 28, 2021, 5:57pm

old post: Train the best ever transformer-VAE - #14 by Fraser

patrickvonplaten · June 30, 2021, 12:56pm

I really like this project and all the info provided here! Thanks a lot Let’s try to make this happen! Will ask some people as well

alexsunny123 · July 16, 2021, 9:11am

thanks for the awesome information.

alexsunny123 · November 16, 2021, 2:57pm

thanks my issue has been fixed.

Topic		Replies	Views
Train the best ever transformer-VAE Flax/JAX Projects	15	6976	August 26, 2021
PreTrain T5 for Italian 🇮🇹 Flax/JAX Projects	3	618	July 7, 2021
Building an variational autoencoder with transformers Beginners	1	707	March 17, 2024
How does huggingface T5 flax pretraining script handles very long sentences? 🤗Transformers	0	365	May 4, 2022
Fine-tuning sentence-transformer for retrieval task makes things worse Beginners	0	1727	July 25, 2023