Not sure how to pipeline/tokenize graphs for inference


I am trying to use a pre-trained Graphormer fintetuned on a custom dataset for regression. I can feed forward the data to the trainer without a problem, but what I can’t do correctly is feed-forward the test set.
I tried the pipeline on the web,
pipe = pipeline(“graph-ml”, model=“clefourrier/graphormer-base-pcqm4mv2”), which throws the error,

KeyError Traceback (most recent call last)
Cell In[188], line 4
1 # Use a pipeline as a high-level helper
2 from transformers import pipeline
----> 4 pipe = pipeline(“graph-ml”, model=“clefourrier/graphormer-base-pcqm4mv2”)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
740 pipeline_class = get_class_from_dynamic_module(
741 class_ref, model, revision=revision, use_auth_token=use_auth_token
742 )
743 else:
→ 744 normalized_task, targeted_task, task_options = check_task(task)
745 if pipeline_class is None:
746 pipeline_class = targeted_task[“impl”]

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/, in check_task(task)
445 def check_task(task: str) → Tuple[str, Dict, Any]:
446 “”"
447 Checks an incoming task string, to validate it’s correct and return the default Pipeline and Model classes, and
448 default models if they exist.
486 “”"
→ 487 return PIPELINE_REGISTRY.check_task(task)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/, in PipelineRegistry.check_task(self, task)
1194 return task, targeted_task, (tokens[1], tokens[3])
1195 raise KeyError(f"Invalid translation task {task}, use ‘translation_XX_to_YY’ format")
→ 1197 raise KeyError(
1198 f"Unknown task {task}, available tasks are {self.get_supported_tasks() + [‘translation_XX_to_YY’]}"
1199 )

KeyError: “Unknown task graph-ml, available tasks are [‘audio-classification’, ‘automatic-speech-recognition’, ‘conversational’, ‘depth-estimation’, ‘document-question-answering’, ‘feature-extraction’, ‘fill-mask’, ‘image-classification’, ‘image-segmentation’, ‘image-to-text’, ‘mask-generation’, ‘ner’, ‘object-detection’, ‘question-answering’, ‘sentiment-analysis’, ‘summarization’, ‘table-question-answering’, ‘text-classification’, ‘text-generation’, ‘text2text-generation’, ‘token-classification’, ‘translation’, ‘video-classification’, ‘visual-question-answering’, ‘vqa’, ‘zero-shot-audio-classification’, ‘zero-shot-classification’, ‘zero-shot-image-classification’, ‘zero-shot-object-detection’, ‘translation_XX_to_YY’]”

and then I tried the tokenizer,

tokenizer = AutoTokenizer.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)
model = GraphormerForGraphClassification.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)

which throws the following error,

KeyError Traceback (most recent call last)
Cell In[187], line 6
2 from transformers import AutoTokenizer, GraphormerForGraphClassification
4 model_checkpoint = “Mpro_pcqm4mv2_trnsformer” # pre-trained model from which to fine-tune
----> 6 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
7 model = GraphormerForGraphClassification.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
716 model_type = config_class_to_model_type(type(config).name)
717 if model_type is not None:
→ 718 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
719 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
720 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/, in _LazyAutoMapping.getitem(self, key)
672 model_name = self._model_mapping[mtype]
673 return self._load_attr_from_module(mtype, model_name)
→ 674 raise KeyError(key)

KeyError: <class ‘transformers.models.graphormer.configuration_graphormer.GraphormerConfig’>


In the training process, the GraphormerDataCollator() is used. I don’t know how to pass as an argument when predicting.

I tried the script below to call the model for prediction,

with torch.no_grad():
outputs = model(
n_node = num_nodes,
ignore_mismatched_sizes = True
I still get dimension errors despite using the preprocessed test split of the training data. My training set was processed following the Tutorial as such,

from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator

train = train_test_valid_dataset[‘train’].map(preprocess_item, batched=False)
test = train_test_valid_dataset[‘test’].map(preprocess_item, batched=False)
val = train_test_valid_dataset[‘valid’].map(preprocess_item, batched=False)

How the test data looks before processing,
features: [‘edge_index’, ‘edge_attr’, ‘node_feat’, ‘num_nodes’, ‘y’],
num_rows: 6

After processing,
features: [‘edge_index’, ‘edge_attr’, ‘node_feat’, ‘num_nodes’, ‘y’, ‘input_nodes’, ‘attn_bias’, ‘attn_edge_type’, ‘spatial_pos’, ‘in_degree’, ‘out_degree’, ‘input_edges’, ‘labels’],
num_rows: 6

Could someone be kind enough and give me an example of how I can call the model for inference on one instance? @clefourrier, should I use a tokenizer for piping graphs? Does the data need to be processed differently for testing?


  • transformers version: 4.31.0
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.8.17
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.1
  • Accelerate version: 0.21.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script? No
  • Using distributed or parallel set-up in script? No