Chapter 2 questions

Maybe:


You don’t need any secret Hugging Face magic.
That “nice” output is just:

  1. A plain Python dict / BatchEncoding being printed.
  2. PyTorch’s built-in tensor repr.
  3. A notebook / docs theme giving it a light background.

You can reproduce it in your own Jupyter / VS Code setup, and you can go further with rich if you want a boxed “card” look.


1. What the Hugging Face screenshot actually is

On the course page the code is:(Hugging Face)

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

and the output text is literally:

{
    'input_ids': tensor([
        [  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172, 2607,  2026,  2878,  2166,  1012,   102],
        [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,     0,     0,     0,     0,     0,     0]
    ]),
    'attention_mask': tensor([
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
    ])
}

No special pretty-printing library. The background and spacing come from the Hugging Face docs CSS, not from Python.(Hugging Face)

So: if you print the same object in a notebook, you should already be very close.


2. Minimal way to match it in Jupyter / VS Code

In a notebook cell (JupyterLab or VS Code’s Jupyter):

from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]

inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")

# Option A: let Jupyter display the last expression
inputs

# Option B: explicit print(), same text output
# print(inputs)

Jupyter will render that in a monospace block with a background according to your theme. That’s essentially what you see on the Hugging Face page.

If your tensors are printing on one too-long line, adjust PyTorch’s print options:

import torch

torch.set_printoptions(
    linewidth=80,   # characters per line before wrapping
    edgeitems=16,   # how many elements shown at each edge
    profile="default",
)

linewidth is the key knob that controls where PyTorch inserts line breaks in tensor repr.(docs.pytorch.org)


3. Getting a boxed “card” with Rich (works in VS Code + Jupyter)

If you want something nicer than the HF screenshot (clear box, consistent indentation), use rich.

3.1 Install Rich (with Jupyter extras)

In the environment backing your notebook kernel:

pip install "rich[jupyter]"

Rich is designed for pretty printing and can render well in terminals and notebooks.(rich.readthedocs.io)

3.2 Simple pretty print

from rich.pretty import pprint

pprint(inputs)

That already improves indentation and adds color, but you can push it further.

3.3 Boxed, backgrounded output like a “card”

from rich import print
from rich.pretty import Pretty
from rich.panel import Panel

def show_batch(enc, title="BatchEncoding"):
    obj = {
        k: v for k, v in enc.items()
    }
    pretty = Pretty(
        obj,
        indent_size=2,
        indent_guides=True,
        max_length=None,
        max_depth=None,
        expand_all=True,
    )
    print(Panel(pretty, title=title))

show_batch(inputs, title="Tokenized inputs")

Key points:

  • Pretty(...) does the smart multi-line formatting.(rich.readthedocs.io)
  • Panel(...) draws the rectangle with a separated background.(Stack Overflow)
  • VS Code’s Jupyter and JupyterLab both render Rich’s ANSI / HTML output, so you get a visually distinct block instead of raw text.

If Rich behaves oddly in VS Code (extra cells, spacing issues), that’s a known interaction; recent issues document workarounds.(GitHub)


4. If your current rich.pretty.pprint output looks cramped

Typical causes:

  1. Tensor repr is wide
    PyTorch’s default linewidth might be large, so each row stays on one long line. Fix with torch.set_printoptions(linewidth=80) before calling pprint.(docs.pytorch.org)

  2. You’re printing the dict directly with print()
    Use Rich’s Pretty or pprint instead of plain print so nested structures are wrapped and indented.

  3. Too much data at once
    Printing entire batches with long sequences will always look messy. Slice first:

    subset = {
        "input_ids": inputs["input_ids"][:2, :16],
        "attention_mask": inputs["attention_mask"][:2, :16],
    }
    show_batch(subset, title="First 2 examples, 16 tokens")
    

5. Recommended simple setup

For what you asked (context, background, clear and easy-to-read) I’d do:

  1. In every notebook:

    import torch
    torch.set_printoptions(linewidth=80, profile="default")
    
  2. Use a helper:

    from rich import print
    from rich.pretty import Pretty
    from rich.panel import Panel
    
    def show(obj, title=None):
        pretty = Pretty(obj, indent_guides=True, expand_all=True)
        print(Panel(pretty, title=title))
    
  3. When working with tokenizers:

    inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
    show(inputs, title="Tokenized inputs")
    

That gives:

  • A boxed block with a background.
  • Indented, multi-line dict + tensors.
  • Consistent behavior in VS Code and classic Jupyter.

Short summary

  • The Hugging Face screenshot is just print(inputs) from the tokenizer example plus their docs CSS.(Hugging Face)
  • To match it: print the BatchEncoding in a notebook and, if necessary, lower torch.set_printoptions(linewidth=80) so tensors wrap nicely.(docs.pytorch.org)
  • For even “nicer” output with visible background and borders in VS Code / Jupyter, use Rich’s Pretty inside a Panel as shown above.(rich.readthedocs.io)
1 Like