How to get sp_model variable from T5Tokenizer?

Is there a way to acquire sp_model variable from T5Tokenizer? Related code can be found at https://github.com/huggingface/transformers/blob/2e35bac4e73558d334ea5bbf96a1116f7c0d7fb3/src/transformers/models/t5/tokenization_t5.py#L155
When we initiate tokenizer with ex. “t5-base”, from_pretrained function only returns fast version which does not have sp_model variable.

My aim is to initiate text.SentencepieceTokenizer from sp_model. If there is a way, this will help converting t5 tokenizer easily into tensorflow.

Or if there is a alternative way, I’ll be glad to learn :slight_smile:

Thanks all in advance,

I solved it this way,

from transformers import AutoTokenizer
import sentencepiece as spm
from tensorflow_text import SentencepieceTokenizer

tokenizer = AutoTokenizer.from_pretrained("t5-base")

sp_model = spm.SentencePieceProcessor()
sp_model.Load(tokenizer.vocab_file)

sp_proto = sp_model.serialized_model_proto()

tf_sp = SentencepieceTokenizer(
    model=sp_proto,
    alpha=0.1,
    nbest_size=0,
    add_bos=False,
    add_eos=True,
    reverse=False
)
#now you can tokenize or detokenize with 
input_ids = tf_sp.tokenize("some text")
"some text" as tensor = tf_sp.detokenize(input_ids)