Sampling: what's the secret sauce?

chrisdoyle · August 8, 2022, 4:00pm

Just a practical question, np.choices is very slow to return a sample when one tries to sample from a large distribution - say, for example, a 52K token vocabulary.

How do HuggingFace’s implementations of sampling methods actually sample? Nucleus sampling makes sense to me because we only sample from the low entropy parts of the distribution, but for beam search which tries to find the max likelihood sequence over several samples, mightn’t we need to sample from close to the full distribution at any given step? How do we do this in practise??

joaogante · August 19, 2022, 2:49pm

Hey @chrisdoyle

We use PyTorch/TensorFlow/JAX sampling operations, which are optimized for GPU usage. See here, for example: transformers/generation_utils.py at e54a1b49aa6268c484625c6374f952f318914743 · huggingface/transformers · GitHub

Beam search does no sampling – it takes the beams/tokens with the highest score at each iteration, which is deterministic after running the model forward pass

chrisdoyle · August 22, 2022, 8:52am

Thanks @joaogante, I think I’ll find the answer I’m looking for inside the torch.multinomial implementation, many thanks!

Topic		Replies	Views
Can beam search be used with sampling? 🤗Transformers	2	2224	May 12, 2022
Multi-Task dataset with Custom Sampler and Sharding Intermediate	4	1366	August 1, 2023
Beam_search bottlenecks inference with only 1 used cpu 🤗Transformers	1	830	October 13, 2022
Multiple gpu not properly parallelized during model.generate() 🤗Transformers	4	1622	October 9, 2022
Using custom models (not necessarily transformer based) with generate() and sampling Beginners	2	1219	March 1, 2022

Sampling: what's the secret sauce?

Related topics