Maybe:
You don’t need any secret Hugging Face magic.
That “nice” output is just:
- A plain Python dict /
BatchEncodingbeing printed. - PyTorch’s built-in tensor
repr. - A notebook / docs theme giving it a light background.
You can reproduce it in your own Jupyter / VS Code setup, and you can go further with rich if you want a boxed “card” look.
1. What the Hugging Face screenshot actually is
On the course page the code is:(Hugging Face)
raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)
and the output text is literally:
{
'input_ids': tensor([
[ 101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102],
[ 101, 1045, 5223, 2023, 2061, 2172, 999, 102, 0, 0, 0, 0, 0, 0, 0, 0]
]),
'attention_mask': tensor([
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
])
}
No special pretty-printing library. The background and spacing come from the Hugging Face docs CSS, not from Python.(Hugging Face)
So: if you print the same object in a notebook, you should already be very close.
2. Minimal way to match it in Jupyter / VS Code
In a notebook cell (JupyterLab or VS Code’s Jupyter):
from transformers import AutoTokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
raw_inputs = [
"I've been waiting for a HuggingFace course my whole life.",
"I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
# Option A: let Jupyter display the last expression
inputs
# Option B: explicit print(), same text output
# print(inputs)
Jupyter will render that in a monospace block with a background according to your theme. That’s essentially what you see on the Hugging Face page.
If your tensors are printing on one too-long line, adjust PyTorch’s print options:
import torch
torch.set_printoptions(
linewidth=80, # characters per line before wrapping
edgeitems=16, # how many elements shown at each edge
profile="default",
)
linewidth is the key knob that controls where PyTorch inserts line breaks in tensor repr.(docs.pytorch.org)
3. Getting a boxed “card” with Rich (works in VS Code + Jupyter)
If you want something nicer than the HF screenshot (clear box, consistent indentation), use rich.
3.1 Install Rich (with Jupyter extras)
In the environment backing your notebook kernel:
pip install "rich[jupyter]"
Rich is designed for pretty printing and can render well in terminals and notebooks.(rich.readthedocs.io)
3.2 Simple pretty print
from rich.pretty import pprint
pprint(inputs)
That already improves indentation and adds color, but you can push it further.
3.3 Boxed, backgrounded output like a “card”
from rich import print
from rich.pretty import Pretty
from rich.panel import Panel
def show_batch(enc, title="BatchEncoding"):
obj = {
k: v for k, v in enc.items()
}
pretty = Pretty(
obj,
indent_size=2,
indent_guides=True,
max_length=None,
max_depth=None,
expand_all=True,
)
print(Panel(pretty, title=title))
show_batch(inputs, title="Tokenized inputs")
Key points:
Pretty(...)does the smart multi-line formatting.(rich.readthedocs.io)Panel(...)draws the rectangle with a separated background.(Stack Overflow)- VS Code’s Jupyter and JupyterLab both render Rich’s ANSI / HTML output, so you get a visually distinct block instead of raw text.
If Rich behaves oddly in VS Code (extra cells, spacing issues), that’s a known interaction; recent issues document workarounds.(GitHub)
4. If your current rich.pretty.pprint output looks cramped
Typical causes:
-
Tensor repr is wide
PyTorch’s defaultlinewidthmight be large, so each row stays on one long line. Fix withtorch.set_printoptions(linewidth=80)before callingpprint.(docs.pytorch.org) -
You’re printing the dict directly with
print()
Use Rich’sPrettyorpprintinstead of plainprintso nested structures are wrapped and indented. -
Too much data at once
Printing entire batches with long sequences will always look messy. Slice first:subset = { "input_ids": inputs["input_ids"][:2, :16], "attention_mask": inputs["attention_mask"][:2, :16], } show_batch(subset, title="First 2 examples, 16 tokens")
5. Recommended simple setup
For what you asked (context, background, clear and easy-to-read) I’d do:
-
In every notebook:
import torch torch.set_printoptions(linewidth=80, profile="default") -
Use a helper:
from rich import print from rich.pretty import Pretty from rich.panel import Panel def show(obj, title=None): pretty = Pretty(obj, indent_guides=True, expand_all=True) print(Panel(pretty, title=title)) -
When working with tokenizers:
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt") show(inputs, title="Tokenized inputs")
That gives:
- A boxed block with a background.
- Indented, multi-line dict + tensors.
- Consistent behavior in VS Code and classic Jupyter.
Short summary
- The Hugging Face screenshot is just
print(inputs)from the tokenizer example plus their docs CSS.(Hugging Face) - To match it: print the
BatchEncodingin a notebook and, if necessary, lowertorch.set_printoptions(linewidth=80)so tensors wrap nicely.(docs.pytorch.org) - For even “nicer” output with visible background and borders in VS Code / Jupyter, use Rich’s
Prettyinside aPanelas shown above.(rich.readthedocs.io)