Fraser
June 24, 2021, 7:05am
1
Use the Funnel Transformer + T5 model from the huggingface hub with some subclassing to convert them into a VAE for text.
The current SOTA VAE is OPTIMUS which still suffers from some posterior collapse.
When trained effectively, the Variational Autoencoder (VAE) (Kingma and Welling, 2013; Bowman et al., 2016) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the...
Its training data is all open source.
# Download/Pre-process Datasets
## Wikipedia
Download processed files (11.78G) below, and unzip it (298 files)
https://textae.blob.core.windows.net/optimus/data/datasets/wikipedia_json_64_filtered.zip
Download raw file (11.79G):
https://textae.blob.core.windows.net/optimus/data/datasets/wikipedia.segmented.nltk.txt
Our pre-processing protocal: We split the original wiki text into 298 files, and loop over files in one epoch.
We filter each sentence in wiki based on two constraints: (1) The sentence length is smaller than 64. (2) The tokenized sentence length is smaller than 256 (so that the encoder can take the entire sentence).
To filter the sentence, please change the data folders and run the script:
sh scripts/scripts_local/run_data_filtering_wiki.sh
This file has been truncated. show original
From my experiments an MMD-VAE doesn’t suffer form as much posterior collapse in smaller scale models, why not try & scale up?
I’ve actually already got the code working to turn them into a Pytorch MMD-VAE so why not just convert it to JAX/Flax?
"""
Base transformer-VAE model.
"""
import torch
from torch import nn
from typing import Dict, Any
from transformers.utils import logging
from transformers.modeling_utils import PreTrainedModel
from transformers.modeling_outputs import BaseModelOutput
from transformers.models.funnel.modeling_funnel import upsample
from transformers import AutoModelForSeq2SeqLM, AutoModelForMaskedLM
from transformer_vae.custom_t5 import modify_t5_stack
from transformer_vae.autoencoders import VAE_ENCODER_MODELS, VAE_DECODER_MODELS, EncoderDecoderVAE
from transformer_vae.critic import CRITIC
from transformer_vae.model_outputs import BaseTransformerVAE_Output
from transformer_vae.config import Funnel_T5_VAE_Config
logger = logging.get_logger(__name__)
This file has been truncated. show original
4 Likes
Fraser
June 24, 2021, 7:06am
2
Maybe this could make the basis for a new kind of search engine?
Fraser
June 24, 2021, 7:37pm
5
Are you in the slack? Would be good to make a group for this.
Would love to contribute to this. I have worked in VAE before (here is a paper I wrote to handle the posterior collapse issue). Pinged you Fraser in slack. Would love to chat when you are available.
1 Like
@Fraser very interesting idea, I have good experience in T5 and also worked on vae. And I would love to part of this project.
1 Like
Fraser
June 25, 2021, 7:06am
9
I’ve started a Slack so we can make plans and form a proper team.
Would be good to organise a group call when your free?
https://join.slack.com/t/transformer-vae/shared_invite/zt-s5yv7h9h-~d3m7UJlfVPu5tlj8iUvvQ
Fraser
June 25, 2021, 7:06am
10
Hey Mina, I’ve started a Slack so we can make plans and form a proper team.
Would be good to organise a call when your free?
https://join.slack.com/t/transformer-vae/shared_invite/zt-s5yv7h9h-~d3m7UJlfVPu5tlj8iUvvQ
gigant
June 25, 2021, 9:04am
11
Hi,I would love to contribute to this work! I am interested in VAEs & transformers and already worked with both (never at the same time though).
1 Like
Hello @Fraser & Team,
I am interested to be a part of such an amazing project & team. I will try my best to contribute to this VAE project . It would be nice if we could discuss some more learning resources that would be useful for this project.I can work in any time zone that is comfortable for everyone in the team. I also read your awesome article,’ Interpolating the internet’ & found it very interesting…!!!
1 Like
Fraser
June 28, 2021, 5:49pm
14
I’ve made a more clear project post here, be sure to give it a like if your interested in helping out!
Transformer-VAE
Convert a T5 model into a variational autoencoder for text.
I have already made a project that does this in PyTorch but its never been trained at scale.
This project is to convert the autoencoder into Flax so it can be trained efficiently on a TPU to train the largest ever Transformer-VAE!
Language
The model will be trained in english.
Model
Build on T5-base, this will match with the Optimus model.
Only additional parameters come from a small Autoencoder module that will …
Think this is a really cool project - let’s define it officially! Sadly we are limited to using TPU - if it’s too complicated to turn the code into JAX maybe options with PyTorch/XLA can be explored as well …
But cool idea, let’s define it
1 Like
Fraser
June 29, 2021, 6:28pm
16
Hi Patrick, thanks for the feedback!
I’ve linked a revised plan bellow.
The idea here is just to take an existing flax-T5 model and stick an autoencoder between the encoder & decoder. As seen here .
I’ve currently had calls with 3 other team members and we’re really exited to see what this produces!
Transformer-VAE
Convert a T5 model into a variational autoencoder for text.
I have already made a project that does this in PyTorch but its never been trained at scale.
This project is to convert the autoencoder into Flax so it can be trained efficiently on a TPU to train the largest ever Transformer-VAE!
Language
The model will be trained in english.
Model
Build on T5-base, this will match with the Optimus model.
Only additional parameters come from a small Autoencoder module that will …
Fraser
June 29, 2021, 6:50pm
17
Forgot to include memory requirements…
Preferably will train on a dataset of input & output sequences with length 256, batch size 24, with a T5-base model.
T5-base has 220M params while OPTIMUS has 227M and was trained with the same params above using 8 v100GPUs.
The TPUv3-8 is equivalent to 4 v100 GPUs so should be able to train with at least batch size 12 or with shorter sequences.
OPTIMUS (current SOTA VAE) https://arxiv.org/pdf/2004.04092.pdf
1 Like
Fraser
July 15, 2021, 9:04pm
18
DEMO
Here’s a demo of the model trained on lines of Python code!
https://huggingface.co/spaces/flax-community/t5-vae
1 Like
Hi everyone! This looks so awesome! I am using Deep Neural Networks for composing music. In the past I used MusicVAE (LSTMs), and quite recently GPT2. Your work here would be a nice experimental ground!
Is the slack still active?