Train the best ever transformer-VAE

Fraser · June 24, 2021, 7:05am

Use the Funnel Transformer + T5 model from the huggingface hub with some subclassing to convert them into a VAE for text.

The current SOTA VAE is OPTIMUS which still suffers from some posterior collapse.

Its training data is all open source.

github.com

ChunyuanLI/Optimus/blob/master/download_datasets.md

# Download/Pre-process Datasets

## Wikipedia

Download processed files (11.78G) below, and unzip it (298 files)

https://textae.blob.core.windows.net/optimus/data/datasets/wikipedia_json_64_filtered.zip

Download raw file (11.79G):

https://textae.blob.core.windows.net/optimus/data/datasets/wikipedia.segmented.nltk.txt

Our pre-processing protocal: We split the original wiki text into 298 files, and loop over files in one epoch.

We filter each sentence in wiki based on two constraints: (1) The sentence length is smaller than 64. (2) The tokenized sentence length is smaller than 256 (so that the encoder can take the entire sentence).

To filter the sentence, please change the data folders and run the script:

    sh scripts/scripts_local/run_data_filtering_wiki.sh

This file has been truncated. show original

From my experiments an MMD-VAE doesn’t suffer form as much posterior collapse in smaller scale models, why not try & scale up?

I’ve actually already got the code working to turn them into a Pytorch MMD-VAE so why not just convert it to JAX/Flax?

github.com

Fraser-Greenlee/transformer-vae/blob/master/transformer_vae/model.py

"""
    Base transformer-VAE model.
"""
import torch
from torch import nn
from typing import Dict, Any
from transformers.utils import logging
from transformers.modeling_utils import PreTrainedModel
from transformers.modeling_outputs import BaseModelOutput
from transformers.models.funnel.modeling_funnel import upsample
from transformers import AutoModelForSeq2SeqLM, AutoModelForMaskedLM

from transformer_vae.custom_t5 import modify_t5_stack
from transformer_vae.autoencoders import VAE_ENCODER_MODELS, VAE_DECODER_MODELS, EncoderDecoderVAE
from transformer_vae.critic import CRITIC
from transformer_vae.model_outputs import BaseTransformerVAE_Output
from transformer_vae.config import Funnel_T5_VAE_Config


logger = logging.get_logger(__name__)

This file has been truncated. show original

Fraser · June 24, 2021, 7:06am

Maybe this could make the basis for a new kind of search engine?

Fraser · June 24, 2021, 7:07pm

Thanks Mina!

Fraser · June 24, 2021, 7:37pm

Are you in the slack? Would be good to make a group for this.

schrilax · June 24, 2021, 10:16pm

Would love to contribute to this. I have worked in VAE before (here is a paper I wrote to handle the posterior collapse issue). Pinged you Fraser in slack. Would love to chat when you are available.

Vaibhavbrkn · June 25, 2021, 3:15am

@Fraser very interesting idea, I have good experience in T5 and also worked on vae. And I would love to part of this project.

Fraser · June 25, 2021, 7:06am

I’ve started a Slack so we can make plans and form a proper team.

Would be good to organise a group call when your free?

https://join.slack.com/t/transformer-vae/shared_invite/zt-s5yv7h9h-~d3m7UJlfVPu5tlj8iUvvQ

Fraser · June 25, 2021, 7:06am

Hey Mina, I’ve started a Slack so we can make plans and form a proper team.

Would be good to organise a call when your free?

https://join.slack.com/t/transformer-vae/shared_invite/zt-s5yv7h9h-~d3m7UJlfVPu5tlj8iUvvQ

gigant · June 25, 2021, 9:04am

Hi,I would love to contribute to this work! I am interested in VAEs & transformers and already worked with both (never at the same time though).

srisweet · June 27, 2021, 5:55am

Hello @Fraser & Team,
I am interested to be a part of such an amazing project & team. I will try my best to contribute to this VAE project . It would be nice if we could discuss some more learning resources that would be useful for this project.I can work in any time zone that is comfortable for everyone in the team. I also read your awesome article,’ Interpolating the internet’ & found it very interesting…!!!

Fraser · June 28, 2021, 5:49pm

I’ve made a more clear project post here, be sure to give it a like if your interested in helping out!

patrickvonplaten · June 29, 2021, 3:20pm

Think this is a really cool project - let’s define it officially! Sadly we are limited to using TPU - if it’s too complicated to turn the code into JAX maybe options with PyTorch/XLA can be explored as well …

But cool idea, let’s define it

Fraser · June 29, 2021, 6:28pm

Hi Patrick, thanks for the feedback!

I’ve linked a revised plan bellow.

The idea here is just to take an existing flax-T5 model and stick an autoencoder between the encoder & decoder. As seen here.

I’ve currently had calls with 3 other team members and we’re really exited to see what this produces!

Fraser · June 29, 2021, 6:50pm

Forgot to include memory requirements…

Preferably will train on a dataset of input & output sequences with length 256, batch size 24, with a T5-base model.

T5-base has 220M params while OPTIMUS has 227M and was trained with the same params above using 8 v100GPUs.
The TPUv3-8 is equivalent to 4 v100 GPUs so should be able to train with at least batch size 12 or with shorter sequences.

OPTIMUS (current SOTA VAE) https://arxiv.org/pdf/2004.04092.pdf

Fraser · July 15, 2021, 9:04pm

DEMO

Here’s a demo of the model trained on lines of Python code!

https://huggingface.co/spaces/flax-community/t5-vae

TristanBehrens · August 26, 2021, 10:30am

Hi everyone! This looks so awesome! I am using Deep Neural Networks for composing music. In the past I used MusicVAE (LSTMs), and quite recently GPT2. Your work here would be a nice experimental ground!

Is the slack still active?

Topic		Replies	Views
Train a VAE to interpolate on English sentences Flax/JAX Projects	6	4485	November 16, 2021
Building an variational autoencoder with transformers Beginners	1	704	March 17, 2024
Train the Best Sentence Embedding Model Ever with 1B Training Pairs Flax/JAX Projects	36	25500	July 2, 2023
PreTrain T5 for Italian 🇮🇹 Flax/JAX Projects	3	618	July 7, 2021
Jax/Flax VQ autoencoder for Stable Diffusion 🧨 Diffusers	0	469	October 24, 2022

Train the best ever transformer-VAE

Related topics