Where does .tokens() come from/inherit from in hugging face

abubakarilyas624 · April 15, 2024, 8:36am

I have been reading NLP with transformers, there I saw this function .tokens. where is this function .tokens() actually written in the library. I wanted to know how can I navigate the libraries of huggingface and use fuctions on my own.
text = “Jack Sparrow loves New York!”

bert_tokens = bert_tokenizer(text).tokens()

dblakely · April 20, 2024, 6:50pm

Here’s a “teach a man to fish” answer that I’m not intending to be snarky but it might come across that way:

Go on the Huggingface transformers Github
In the search bar on the top right enter .tokens() or def tokens (since what you’re asking about says .tokens(), that means it’s a function/method and so it’s defined somewhere in the code as def tokens(...)
Then read the code and learn

In this case we can see that bert_tokenizer(text) is returning a BatchEncoding object and tokens is function of the BatchEncoding class. So it’s in the code here.

Another thing is that if you use an IDE like VSCode, in your own code, you can just right click things like “.tokens()” and then click “go to definition” and it’ll take you to where in the Huggingface code something is defined.

Either way, the best way to learn a library is to use it a lot and read the code when you don’t understand something.

abubakarilyas624 · April 21, 2024, 8:16am

Thank you so much @dblakely . I had gone through so much of their documentation but could not find it any where. This has been really one of the great thing that I have learned from you today. Thanks for helping me out and teaching me how to catch a fish.

system · April 21, 2024, 8:17pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Access word piece tokens from BERT tokenized dataset 🤗Datasets	2	930	November 17, 2021
How to create a Huggingface tokenizer from a non-Huggingface tokenizer? 🤗Tokenizers	0	519	May 4, 2021
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021
Significance of the [CLS] token Research	16	28308	September 5, 2024
Train a new tokenizer from scratch 🤗Transformers	4	1707	November 10, 2020

Where does .tokens() come from/inherit from in hugging face

Related topics