How to decode CSM tokens into audio tensors for streaming

HHolzhauer · June 23, 2025, 7:31am

Using the new ‘sesame/csm-1b’ model and the CsmForConditionalGeneration class I am attempting to stream the audio generation to minimize latency. I have successfully setup the ‘Optional[“BaseStreaming”]’ interface which receives tokens as they are generated, but am at a loss as to how to decode the token into audio tensors so I can stream them to something.

I tried discerning how to do this from the source code but I was unable to find a solution

John6666 · June 23, 2025, 8:16am

I found this.

Or with this function?

github.com/huggingface/transformers

src/transformers/models/csm/processing_csm.py

main


      
                      padding_right = extra_padding
                  else:
                      padding_left = padding_left
                      padding_right = padding_right + extra_padding
          
                  cur_length = cur_length + padding_left + padding_right
                  cur_length = (cur_length - dilation * (kernel_size - 1) - 1) // stride + 1
          
              return cur_length
          
          def save_audio(
              self,
              audio: AudioInput,
              saving_path: Union[str, Path, list[Union[str, Path]]],
              **kwargs: Unpack[CsmProcessorKwargs],
          ):
              # TODO: @eustlb, this should be in AudioProcessor
              if not is_soundfile_available():
                  raise ImportError("Please install `soundfile` to save audio files.")
          
              # ensure correct audio input

Topic		Replies	Views
Audio Spectrogram Transformer in tensorflow 🤗Transformers	0	121	August 2, 2023
Joining SpeechEncoderDecoder embedding chunks for processing longer audio Intermediate	1	556	June 10, 2022
Streaming token output from models like T5 🤗Transformers	7	12197	June 7, 2023
Info about insertion of "distillation_token" into the audio spectrogram transformer class 🤗Transformers	0	181	October 4, 2023
How to modify each decoding step in ProphetNet Transformer 🤗Transformers	3	589	July 14, 2021

How to decode CSM tokens into audio tensors for streaming

Related topics