Chapter 2 questions

John6666 · December 9, 2025, 7:56am

Hmm…?

What you want and what Python actually does are different things.

The Hugging Face snippet is hand-formatted for the docs. It is not the literal result of print(inputs). Your output is what Python + PyTorch really produce, and you can’t get the HF layout by just tweaking torch.set_printoptions.

I’ll go through your numbered points and then give a practical workaround.

1. Why your layout looks “wrong”

You are printing a dict that contains tensors:

inputs = {
    'input_ids': tensor(...),
    'attention_mask': tensor(...),
}
print(inputs)

Two independent formatters are involved:

PyTorch tensor repr
Produces strings like
```
tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]])
```
Newlines and the indentation of the second row come from PyTorch. Controlled by torch.set_printoptions.
Python dict repr
Takes key/value ('input_ids', <that tensor repr string>) and glues them into:
```
"{'input_ids': " + tensor_repr + "}"
```
It does not insert extra newlines or re-indent the tensor; it just concatenates strings. Controlled by the pprint module if you use it, but not by PyTorch.(Python documentation)

So the real Python output is essentially:

{'input_ids': tensor([[ ... first row ...],
        [ ... second row ...]]), 'attention_mask': tensor([[ ... ]])}

This is exactly what you see in your screenshot. The Hugging Face page shows a manually cleaned version:

{
    'input_ids': tensor([
        [ ... ],
        [ ... ]
    ]),
    'attention_mask': tensor([
        [ ... ],
        [ ... ]
    ])
}

Those extra line breaks after tensor( and after each ]), are simply edited into the Markdown on the site. You will not get them from plain print(inputs).

That explains:

'input_ids' on the first row: that’s dict repr, no padding step.
First list “continuing” on the next row: that’s either
- a newline coming from the tensor repr itself, or
- VS Code’s output word wrap breaking the line visually.
Second list not starting exactly under the first: again, that’s VS Code wrapping a long physical line into multiple visual lines.
Closing ]]) not on its own row, and 'attention_mask' appearing right after: dict repr has no reason to insert a newline between key–value pairs.

There is no PyTorch option that changes how dictionaries align keys or where newlines between items go.

2. Why your `torch.set_printoptions(...)` changes did nothing

torch.set_printoptions affects only tensor formatting, not the dict surrounding it. Options like linewidth, edgeitems, profile apply when the tensor builds its own string.

Example:

torch.set_printoptions(linewidth=120)

print(inputs["input_ids"])  # affected
print(inputs)               # tensor part is affected, dict layout is not

The dict is still a single logical line with embedded newlines from the tensor. Dict doesn’t know about linewidth and won’t decide to put 'attention_mask' on a new line.

So your experiments in (2) were correctly applied to the tensors, but they can’t fix the dict layout you dislike.

3. Why `pprint()` and `rich.pretty` still look “off”

pprint(inputs) again uses the dict’s keys and values; for values it just calls repr(value) and uses that as an opaque string when deciding where to break lines. It doesn’t reformat the innards of that tensor string.(Python documentation)
rich.pretty.pprint(inputs) does the same thing, just with color. It does not rewrite tensor reprs or move 'attention_mask' to its own block.

So they can:

put each key–value pair on its own line,
change order (sort_dicts), indentation, etc.,

but they will not give you the exact HF style automatically.

4. “I actually want to see a long line”

If you want the entire row [101, 1045, ...] on one unbroken line, then:

PyTorch must not insert internal breaks inside that row (controlled by linewidth), and
Your front-end (VS Code / Jupyter) must not soft-wrap the output.

You already tried the first part. For typical 2×16 tensors the PyTorch repr of a single row is shorter than 120 characters, so linewidth=120 is fine.

The second part is VS Code’s setting, not Python:

VS Code has a setting "notebook.output.wordWrap" that controls whether output cells wrap or scroll horizontally.(Stack Overflow)
If this is true, very long physical lines are wrapped visually into multiple lines, which is exactly what you’re seeing.
To get “one long line with a scrollbar”, you want wrapping off (notebook.output.wordWrap: false) or a wider notebook pane.

So: to see truly long lines, you must handle visual wrapping in VS Code; Python cannot override that from inside the notebook.

5. Practical workaround: custom “HF-style” printer

If you really want something very close to the Hugging Face layout, you need to format it yourself.

Here is a small helper that:

Prints braces on their own lines.
Puts each key on its own line.
Indents the tensor nicely.
Preserves dict key order.

import torch

def hf_print(batch, indent=4):
    """
    Pretty-print a dict like the HF docs example.

    hf_print(inputs)
    """
    items = list(batch.items())
    sp = " " * indent
    print("{")
    for i, (k, v) in enumerate(items):
        v_str = repr(v)
        # indent tensor's own internal lines one extra level
        v_str = ("\n" + sp * 2).join(v_str.splitlines())
        comma = "," if i < len(items) - 1 else ""
        print(f"{sp}{k!r}: {v_str}{comma}")
    print("}")

Usage:

inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
hf_print(inputs)

This will produce something close to:

{
    'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
                2607,  2026,  2878,  2166,  1012,   102],
                [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
                        0,     0,     0,     0,     0,     0]]),
    'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
}

If you want to combine this with Rich:

from rich import print as rprint
from rich.panel import Panel

def hf_print_rich(batch, title="BatchEncoding"):
    from rich.text import Text
    from io import StringIO

    buf = StringIO()
    # temporarily redirect hf_print output into a string
    saved_print = __builtins__["print"]
    __builtins__["print"] = lambda *a, **k: saved_print(*a, file=buf, **k)
    try:
        hf_print(batch)
    finally:
        __builtins__["print"] = saved_print
    text = Text.from_ansi(buf.getvalue())
    rprint(Panel(text, title=title))

That gets you both a “nice” textual format and a boxed background.

6. What you can and cannot control

Can control with Python / PyTorch:
- How tensors format internally (torch.set_printoptions).
- Whether dict items go on separate logical lines (pprint / custom printer).
- Whether keys are in insertion order (pprint(..., sort_dicts=False)).(Python documentation)
Cannot control from Python:
- VS Code’s choice to wrap output lines instead of horizontal scroll.
- The exact HF-docs layout; that output in the docs is curated Markdown, not the raw console dump.

Condensed summary

Your output is the correct default repr of a dict containing PyTorch tensors.
The Hugging Face page shows a manually formatted version; you won’t get that exact style from plain print(inputs).(Hugging Face)
torch.set_printoptions only affects how the tensors themselves are printed, not the dict keys or where 'attention_mask' starts.
Long lines broken in the middle are caused by VS Code / notebook word wrap, not by PyTorch; toggle "notebook.output.wordWrap" or widen the output pane to see unbroken lines.(Stack Overflow)
To get HF-style layout, write a small custom printer (like hf_print above) or wrap its output in a Rich Panel if you want backgrounds.

Topic		Replies	Views
Ai Agents course error in running the Smolagent example Course	14	1750	June 2, 2025
TransformerAgents - error running example notebook 🤗Transformers	0	203	May 11, 2023
Function/tool calling using Transformer models 🤗Transformers	5	1403	July 17, 2025
Unable to convert output to interpretable format 🤗Tokenizers	0	366	July 31, 2021
Chapter 1 questions Course	114	27717	November 5, 2025