I have tested torchview and hiddenlayer attempting to visualize operations within LlamaDecoderLayer and both ways produced similarly flawed results. Since both methods lead to the same error I suspect there is something wrong with transformers library (or my understanding of it) or my code.
TL;DR:
torchview and hiddenlayer produce incorrect visualization of LlamaDecoderLayer. The problem is with content of nodes in the visualization that represent tensor operations. They do not include any information. The flow (edges) correctly connect nodes.
Model/Module to visualize:
My code (below) wraps the “LlamaDecoderLayer”, passes “LlamaConfig” and “layer_idx” such that I can simply pass it to torchview or hiddenlayer.
class SingleLayerLlama(nn.Module):
def __init__(self, attn_mask, config, layer_idx):
super().__init__()
self.attn_mask = attn_mask
self.layer = LlamaDecoderLayer(config=config, layer_idx=layer_idx)
def forward(self, hidden_states):
position_embedding = torch.rand(1, int(hidden_states.shape[1]), int(hidden_states.shape[2]/32))
position_embeddings = (position_embedding, position_embedding)
hidden_states = self.layer(
hidden_states,
attention_mask=attn_mask,
# position_ids=position_ids,
position_embeddings = position_embeddings,
layer_head_mask=None,
past_key_value=None,
use_cache=False
)[0]
return hidden_states
Visualization results:
The results have the following flaw, the nodes include no information.
torchview
hiddenlayer
(my first post, only 1 image is allowed)
Questions:
- Is this expected behavior for transformers models/modules?
- Is there something wrong with my approach?
Other info:
Code used to produce the visualizations:
torchview
from torchview import draw_graph
model_graph = draw_graph(
_layer,
device="cpu",
input_size=hidden_states.shape,
expand_nested=True,
save_graph = True,
hide_inner_tensors = False,
hide_module_functions = False,
show_shapes = True,
graph_name='LlamaDecoderLayer',
directory=''<path>/', # Save in the current directory
)
with open(''<path>/LlamaDecoderLayer.gv', 'r') as f:
dot_graph = f.read()
import graphviz
graph = graphviz.Source(dot_graph)
graph.render(
'<path>/LlamaDecoderLayer',
format='svg',
cleanup=True,
engine='neato',
)
hiddenlayer
# From
https://github.com/pytorch/pytorch/blob/2efe4d809fdc94501fc38bf429e9a8d4205b51b6/torch/utils/tensorboard/_pytorch_graph.py#L384
def _node_get(node: torch._C.Node, key: str):
"""Gets attributes of a node which is polymorphic over return type."""
sel = node.kindOf(key)
return getattr(node, sel)(key)
torch._C.Node.__getitem__ = _node_get
import hiddenlayer as hl
im = hl.build_graph(_layer, torch.zeros(hidden_states.shape))
print(f" >>> im = {type(im)}")
im.save(path="<path>/LlamaDecoderLayer_hl" , format="svg")
P.S.
I have tested loading the weights from Llama3.1 model, but the result is the same.