[source code] why do many (not all) transformer classes have their docstring starting with r###

Why do many (not all) transformer classes have their docstring starting with r###

grep -r -A 2 'class ' src/transformers 

here is just a sample of both:

src/transformers/modeling_tf_xlnet.py:# class TFXLNetForQuestionAnswering(TFXLNetPreTrainedModel):
src/transformers/modeling_tf_xlnet.py-#     r"""
src/transformers/tokenization_distilbert.py:class DistilBertTokenizer(BertTokenizer):
src/transformers/tokenization_distilbert.py-    r"""
src/transformers/tokenization_gpt2.py:class GPT2TokenizerFast(PreTrainedTokenizerFast):
src/transformers/tokenization_gpt2.py-    """
src/transformers/tokenization_auto.py:class AutoTokenizer:
src/transformers/tokenization_auto.py-    r""":class:`~transformers.AutoTokenizer` is a generic tokenizer class

Is there a special purpose for using a regex string? There doesn’t seem to be any pattern - it appears in some, but not the others classes’ docstring.

Thank you.

It’s r for raw, not regex. Not sure why they’re used exactly, it maybe to make sure some sphinx stuff doesn’t get interpreted weirdly. I know I added r in the new docstrings I created.

1 Like

Thank you for the explanation, @sgugger!

So, when creating new classes you’re suggesting to use r###, correct?

In my experience most of the time r is used as a pattern in regex - that’s what I meant by a “regex string”. So I assumed it to be a regex thing :wink: I understand this could be needed elsewhere when \ are to be treated literally. So, “raw” does make more sense. Thank you for this, @sgugger!

Yes I think r""" is a good default to have in mind.

1 Like