Chapter 2 questions

Hmm…?


What you want and what Python actually does are different things.

The Hugging Face snippet is hand-formatted for the docs. It is not the literal result of print(inputs). Your output is what Python + PyTorch really produce, and you can’t get the HF layout by just tweaking torch.set_printoptions.

I’ll go through your numbered points and then give a practical workaround.


1. Why your layout looks “wrong”

You are printing a dict that contains tensors:

inputs = {
    'input_ids': tensor(...),
    'attention_mask': tensor(...),
}
print(inputs)

Two independent formatters are involved:

  1. PyTorch tensor repr
    Produces strings like

    tensor([[ 0,  1,  2,  3],
            [ 4,  5,  6,  7]])
    

    Newlines and the indentation of the second row come from PyTorch. Controlled by torch.set_printoptions.

  2. Python dict repr
    Takes key/value ('input_ids', <that tensor repr string>) and glues them into:

    "{'input_ids': " + tensor_repr + "}"
    

    It does not insert extra newlines or re-indent the tensor; it just concatenates strings. Controlled by the pprint module if you use it, but not by PyTorch.(Python documentation)

So the real Python output is essentially:

{'input_ids': tensor([[ ... first row ...],
        [ ... second row ...]]), 'attention_mask': tensor([[ ... ]])}

This is exactly what you see in your screenshot. The Hugging Face page shows a manually cleaned version:

{
    'input_ids': tensor([
        [ ... ],
        [ ... ]
    ]),
    'attention_mask': tensor([
        [ ... ],
        [ ... ]
    ])
}

Those extra line breaks after tensor( and after each ]), are simply edited into the Markdown on the site. You will not get them from plain print(inputs).

That explains:

  • 'input_ids' on the first row: that’s dict repr, no padding step.

  • First list “continuing” on the next row: that’s either

    • a newline coming from the tensor repr itself, or
    • VS Code’s output word wrap breaking the line visually.
  • Second list not starting exactly under the first: again, that’s VS Code wrapping a long physical line into multiple visual lines.

  • Closing ]]) not on its own row, and 'attention_mask' appearing right after: dict repr has no reason to insert a newline between key–value pairs.

There is no PyTorch option that changes how dictionaries align keys or where newlines between items go.


2. Why your torch.set_printoptions(...) changes did nothing

torch.set_printoptions affects only tensor formatting, not the dict surrounding it. Options like linewidth, edgeitems, profile apply when the tensor builds its own string.

Example:

torch.set_printoptions(linewidth=120)

print(inputs["input_ids"])  # affected
print(inputs)               # tensor part is affected, dict layout is not

The dict is still a single logical line with embedded newlines from the tensor. Dict doesn’t know about linewidth and won’t decide to put 'attention_mask' on a new line.

So your experiments in (2) were correctly applied to the tensors, but they can’t fix the dict layout you dislike.


3. Why pprint() and rich.pretty still look “off”

  • pprint(inputs) again uses the dict’s keys and values; for values it just calls repr(value) and uses that as an opaque string when deciding where to break lines. It doesn’t reformat the innards of that tensor string.(Python documentation)
  • rich.pretty.pprint(inputs) does the same thing, just with color. It does not rewrite tensor reprs or move 'attention_mask' to its own block.

So they can:

  • put each key–value pair on its own line,
  • change order (sort_dicts), indentation, etc.,

but they will not give you the exact HF style automatically.


4. “I actually want to see a long line”

If you want the entire row [101, 1045, ...] on one unbroken line, then:

  • PyTorch must not insert internal breaks inside that row (controlled by linewidth), and
  • Your front-end (VS Code / Jupyter) must not soft-wrap the output.

You already tried the first part. For typical 2×16 tensors the PyTorch repr of a single row is shorter than 120 characters, so linewidth=120 is fine.

The second part is VS Code’s setting, not Python:

  • VS Code has a setting "notebook.output.wordWrap" that controls whether output cells wrap or scroll horizontally.(Stack Overflow)
  • If this is true, very long physical lines are wrapped visually into multiple lines, which is exactly what you’re seeing.
  • To get “one long line with a scrollbar”, you want wrapping off (notebook.output.wordWrap: false) or a wider notebook pane.

So: to see truly long lines, you must handle visual wrapping in VS Code; Python cannot override that from inside the notebook.


5. Practical workaround: custom “HF-style” printer

If you really want something very close to the Hugging Face layout, you need to format it yourself.

Here is a small helper that:

  • Prints braces on their own lines.
  • Puts each key on its own line.
  • Indents the tensor nicely.
  • Preserves dict key order.
import torch

def hf_print(batch, indent=4):
    """
    Pretty-print a dict like the HF docs example.

    hf_print(inputs)
    """
    items = list(batch.items())
    sp = " " * indent
    print("{")
    for i, (k, v) in enumerate(items):
        v_str = repr(v)
        # indent tensor's own internal lines one extra level
        v_str = ("\n" + sp * 2).join(v_str.splitlines())
        comma = "," if i < len(items) - 1 else ""
        print(f"{sp}{k!r}: {v_str}{comma}")
    print("}")

Usage:

inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
hf_print(inputs)

This will produce something close to:

{
    'input_ids': tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
                2607,  2026,  2878,  2166,  1012,   102],
                [  101,  1045,  5223,  2023,  2061,  2172,   999,   102,     0,     0,
                        0,     0,     0,     0,     0,     0]]),
    'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])
}

If you want to combine this with Rich:

from rich import print as rprint
from rich.panel import Panel

def hf_print_rich(batch, title="BatchEncoding"):
    from rich.text import Text
    from io import StringIO

    buf = StringIO()
    # temporarily redirect hf_print output into a string
    saved_print = __builtins__["print"]
    __builtins__["print"] = lambda *a, **k: saved_print(*a, file=buf, **k)
    try:
        hf_print(batch)
    finally:
        __builtins__["print"] = saved_print
    text = Text.from_ansi(buf.getvalue())
    rprint(Panel(text, title=title))

That gets you both a “nice” textual format and a boxed background.


6. What you can and cannot control

  • Can control with Python / PyTorch:

    • How tensors format internally (torch.set_printoptions).
    • Whether dict items go on separate logical lines (pprint / custom printer).
    • Whether keys are in insertion order (pprint(..., sort_dicts=False)).(Python documentation)
  • Cannot control from Python:

    • VS Code’s choice to wrap output lines instead of horizontal scroll.
    • The exact HF-docs layout; that output in the docs is curated Markdown, not the raw console dump.

Condensed summary

  • Your output is the correct default repr of a dict containing PyTorch tensors.
  • The Hugging Face page shows a manually formatted version; you won’t get that exact style from plain print(inputs).(Hugging Face)
  • torch.set_printoptions only affects how the tensors themselves are printed, not the dict keys or where 'attention_mask' starts.
  • Long lines broken in the middle are caused by VS Code / notebook word wrap, not by PyTorch; toggle "notebook.output.wordWrap" or widen the output pane to see unbroken lines.(Stack Overflow)
  • To get HF-style layout, write a small custom printer (like hf_print above) or wrap its output in a Rich Panel if you want backgrounds.
1 Like