Not sure how to pipeline/tokenize graphs for inference

Hi,

I am trying to use a pre-trained Graphormer fintetuned on a custom dataset for regression. I can feed forward the data to the trainer without a problem, but what I can’t do correctly is feed-forward the test set.
I tried the pipeline on the web,
pipe = pipeline(“graph-ml”, model=“clefourrier/graphormer-base-pcqm4mv2”), which throws the error,


KeyError Traceback (most recent call last)
Cell In[188], line 4
1 # Use a pipeline as a high-level helper
2 from transformers import pipeline
----> 4 pipe = pipeline(“graph-ml”, model=“clefourrier/graphormer-base-pcqm4mv2”)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/init.py:744, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
740 pipeline_class = get_class_from_dynamic_module(
741 class_ref, model, revision=revision, use_auth_token=use_auth_token
742 )
743 else:
→ 744 normalized_task, targeted_task, task_options = check_task(task)
745 if pipeline_class is None:
746 pipeline_class = targeted_task[“impl”]

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/init.py:487, in check_task(task)
445 def check_task(task: str) → Tuple[str, Dict, Any]:
446 “”"
447 Checks an incoming task string, to validate it’s correct and return the default Pipeline and Model classes, and
448 default models if they exist.
(…)
485
486 “”"
→ 487 return PIPELINE_REGISTRY.check_task(task)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/base.py:1197, in PipelineRegistry.check_task(self, task)
1194 return task, targeted_task, (tokens[1], tokens[3])
1195 raise KeyError(f"Invalid translation task {task}, use ‘translation_XX_to_YY’ format")
→ 1197 raise KeyError(
1198 f"Unknown task {task}, available tasks are {self.get_supported_tasks() + [‘translation_XX_to_YY’]}"
1199 )

KeyError: “Unknown task graph-ml, available tasks are [‘audio-classification’, ‘automatic-speech-recognition’, ‘conversational’, ‘depth-estimation’, ‘document-question-answering’, ‘feature-extraction’, ‘fill-mask’, ‘image-classification’, ‘image-segmentation’, ‘image-to-text’, ‘mask-generation’, ‘ner’, ‘object-detection’, ‘question-answering’, ‘sentiment-analysis’, ‘summarization’, ‘table-question-answering’, ‘text-classification’, ‘text-generation’, ‘text2text-generation’, ‘token-classification’, ‘translation’, ‘video-classification’, ‘visual-question-answering’, ‘vqa’, ‘zero-shot-audio-classification’, ‘zero-shot-classification’, ‘zero-shot-image-classification’, ‘zero-shot-object-detection’, ‘translation_XX_to_YY’]”

and then I tried the tokenizer,

tokenizer = AutoTokenizer.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)
model = GraphormerForGraphClassification.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)

which throws the following error,

KeyError Traceback (most recent call last)
Cell In[187], line 6
2 from transformers import AutoTokenizer, GraphormerForGraphClassification
4 model_checkpoint = “Mpro_pcqm4mv2_trnsformer” # pre-trained model from which to fine-tune
----> 6 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
7 model = GraphormerForGraphClassification.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:718, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
716 model_type = config_class_to_model_type(type(config).name)
717 if model_type is not None:
→ 718 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
719 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
720 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:674, in _LazyAutoMapping.getitem(self, key)
672 model_name = self._model_mapping[mtype]
673 return self._load_attr_from_module(mtype, model_name)
→ 674 raise KeyError(key)

KeyError: <class ‘transformers.models.graphormer.configuration_graphormer.GraphormerConfig’>

!

In the training process, the GraphormerDataCollator() is used. I don’t know how to pass as an argument when predicting.

I tried the script below to call the model for prediction,

with torch.no_grad():
outputs = model(
input_nodes=input_nodes,
input_edges=input_edges,
n_node = num_nodes,
attn_bias=attn_bias,
in_degree=in_degree,
out_degree=out_degree,
spatial_pos=spatial_pos,
attn_edge_type=attn_edge_type,
ignore_mismatched_sizes = True
)
I still get dimension errors despite using the preprocessed test split of the training data. My training set was processed following the Tutorial as such,

from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator

train = train_test_valid_dataset[‘train’].map(preprocess_item, batched=False)
test = train_test_valid_dataset[‘test’].map(preprocess_item, batched=False)
val = train_test_valid_dataset[‘valid’].map(preprocess_item, batched=False)

How the test data looks before processing,
Dataset({
features: [‘edge_index’, ‘edge_attr’, ‘node_feat’, ‘num_nodes’, ‘y’],
num_rows: 6
})

After processing,
Dataset({
features: [‘edge_index’, ‘edge_attr’, ‘node_feat’, ‘num_nodes’, ‘y’, ‘input_nodes’, ‘attn_bias’, ‘attn_edge_type’, ‘spatial_pos’, ‘in_degree’, ‘out_degree’, ‘input_edges’, ‘labels’],
num_rows: 6
})

Could someone be kind enough and give me an example of how I can call the model for inference on one instance? @clefourrier, should I use a tokenizer for piping graphs? Does the data need to be processed differently for testing?

Specs:

  • transformers version: 4.31.0
  • Platform: macOS-10.16-x86_64-i386-64bit
  • Python version: 3.8.17
  • Huggingface_hub version: 0.16.4
  • Safetensors version: 0.3.1
  • Accelerate version: 0.21.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script? No
  • Using distributed or parallel set-up in script? No