Is there a way to continue training the learned embeds obtained with Textual Inversion?

Jorvan · October 23, 2022, 12:34am

The thing is that I’m using Colab to run Textual Inversion and there’s only so many steps that one can make before being disconnected. Because of that, I’m wondering if there is a way to continue the training on the last .bin file that I generated. I would be thankful if someone could share some example code for this

raphaelmerx · November 16, 2022, 8:41am

Yes, you’ll need to load the text embeddings from your trained concept, similar to what’s done in the sd inference notebook:

Download the learned embeds bin from the Huggingface hub:

from IPython.display import Markdown
from huggingface_hub import hf_hub_download
#@title Load pre-existing SD concept
#@markdown Enter the `repo_id` for a concept you like (you can find pre-learned concepts in the public [SD Concepts Library](https://huggingface.co/sd-concepts-library))
repo_id_embeds = "sd-concepts-library/" #@param {type:"string"}


#@markdown (Optional) in case you have a `learned_embeds.bin` file and not a `repo_id`, add the path to `learned_embeds.bin` to the `embeds_url` variable 
embeds_url = "" #Add the URL or path to a learned_embeds.bin file in case you have one
placeholder_token_string = "" #Add what is the token string in case you are uploading your own embed

downloaded_embedding_folder = "./downloaded_embedding"
if not os.path.exists(downloaded_embedding_folder):
  os.mkdir(downloaded_embedding_folder)
if(not embeds_url):
  embeds_path = hf_hub_download(repo_id=repo_id_embeds, filename="learned_embeds.bin")
  token_path = hf_hub_download(repo_id=repo_id_embeds, filename="token_identifier.txt")
  !cp $embeds_path $downloaded_embedding_folder
  !cp $token_path $downloaded_embedding_folder
  with open(f'{downloaded_embedding_folder}/token_identifier.txt', 'r') as file:
    placeholder_token_string = file.read()
else:
  !wget -q -O $downloaded_embedding_folder/learned_embeds.bin $embeds_url

learned_embeds_path = f"{downloaded_embedding_folder}/learned_embeds.bin"

display (Markdown("## The placeholder token for your concept is `%s`"%(placeholder_token_string)))

Load the embeds to your text_encoder

def load_learned_embed_in_clip(learned_embeds_path, text_encoder, tokenizer, token=None):
  loaded_learned_embeds = torch.load(learned_embeds_path, map_location="cpu")
  
  # separate token and the embeds
  trained_token = list(loaded_learned_embeds.keys())[0]
  embeds = loaded_learned_embeds[trained_token]

  # cast to dtype of text_encoder
  dtype = text_encoder.get_input_embeddings().weight.dtype
  embeds.to(dtype)

  # add the token in tokenizer
  token = token if token is not None else trained_token
  num_added_tokens = tokenizer.add_tokens(token)
  if num_added_tokens == 0:
    raise ValueError(f"The tokenizer already contains the token {token}. Please pass a different `token` that is not already in the tokenizer.")
  
  # resize the token embeddings
  text_encoder.resize_token_embeddings(len(tokenizer))
  
  # get the id for the token and assign the embeds
  token_id = tokenizer.convert_tokens_to_ids(token)
  text_encoder.get_input_embeddings().weight.data[token_id] = embeds
  
load_learned_embed_in_clip(learned_embeds_path, text_encoder, tokenizer)

Topic		Replies	Views
How do you load multiple concepts and tokens at once? 🧨 Diffusers	11	1832	March 29, 2023
Hugging face sentence embeddings on Dominolab Localy Models	0	1783	April 28, 2023
Diffusion Course Unit 1: An Introduction to Diffusion Models Course	2	102	December 18, 2024
HuggingFaceEmbeddings not working? Beginners	3	92	March 16, 2025
Saving underlying language model after trained on downstream task 🤗Transformers	0	420	September 14, 2020

Is there a way to continue training the learned embeds obtained with Textual Inversion?

Related topics