Not sure how to pipeline/tokenize graphs for inference

Sarah-af · September 14, 2023, 4:14pm

Hi,

I am trying to use a pre-trained Graphormer fintetuned on a custom dataset for regression. I can feed forward the data to the trainer without a problem, but what I can’t do correctly is feed-forward the test set.
I tried the pipeline on the web,
pipe = pipeline(“graph-ml”, model=“clefourrier/graphormer-base-pcqm4mv2”), which throws the error,

KeyError Traceback (most recent call last)
Cell In[188], line 4
1 # Use a pipeline as a high-level helper
2 from transformers import pipeline
----> 4 pipe = pipeline(“graph-ml”, model=“clefourrier/graphormer-base-pcqm4mv2”)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/init.py:744, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, use_auth_token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
740 pipeline_class = get_class_from_dynamic_module(
741 class_ref, model, revision=revision, use_auth_token=use_auth_token
742 )
743 else:
→ 744 normalized_task, targeted_task, task_options = check_task(task)
745 if pipeline_class is None:
746 pipeline_class = targeted_task[“impl”]

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/init.py:487, in check_task(task)
445 def check_task(task: str) → Tuple[str, Dict, Any]:
446 “”"
447 Checks an incoming task string, to validate it’s correct and return the default Pipeline and Model classes, and
448 default models if they exist.
(…)
485
486 “”"
→ 487 return PIPELINE_REGISTRY.check_task(task)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/pipelines/base.py:1197, in PipelineRegistry.check_task(self, task)
1194 return task, targeted_task, (tokens[1], tokens[3])
1195 raise KeyError(f"Invalid translation task {task}, use ‘translation_XX_to_YY’ format")
→ 1197 raise KeyError(
1198 f"Unknown task {task}, available tasks are {self.get_supported_tasks() + [‘translation_XX_to_YY’]}"
1199 )

KeyError: “Unknown task graph-ml, available tasks are [‘audio-classification’, ‘automatic-speech-recognition’, ‘conversational’, ‘depth-estimation’, ‘document-question-answering’, ‘feature-extraction’, ‘fill-mask’, ‘image-classification’, ‘image-segmentation’, ‘image-to-text’, ‘mask-generation’, ‘ner’, ‘object-detection’, ‘question-answering’, ‘sentiment-analysis’, ‘summarization’, ‘table-question-answering’, ‘text-classification’, ‘text-generation’, ‘text2text-generation’, ‘token-classification’, ‘translation’, ‘video-classification’, ‘visual-question-answering’, ‘vqa’, ‘zero-shot-audio-classification’, ‘zero-shot-classification’, ‘zero-shot-image-classification’, ‘zero-shot-object-detection’, ‘translation_XX_to_YY’]”

and then I tried the tokenizer,

tokenizer = AutoTokenizer.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)
model = GraphormerForGraphClassification.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)

which throws the following error,

KeyError Traceback (most recent call last)
Cell In[187], line 6
2 from transformers import AutoTokenizer, GraphormerForGraphClassification
4 model_checkpoint = “Mpro_pcqm4mv2_trnsformer” # pre-trained model from which to fine-tune
----> 6 tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
7 model = GraphormerForGraphClassification.from_pretrained(“clefourrier/graphormer-base-pcqm4mv2”)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:718, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
716 model_type = config_class_to_model_type(type(config).name)
717 if model_type is not None:
→ 718 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
719 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
720 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)

File ~/opt/anaconda3/envs/trans/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:674, in _LazyAutoMapping.getitem(self, key)
672 model_name = self._model_mapping[mtype]
673 return self._load_attr_from_module(mtype, model_name)
→ 674 raise KeyError(key)

KeyError: <class ‘transformers.models.graphormer.configuration_graphormer.GraphormerConfig’>

!

In the training process, the GraphormerDataCollator() is used. I don’t know how to pass as an argument when predicting.

I tried the script below to call the model for prediction,

with torch.no_grad():
outputs = model(
input_nodes=input_nodes,
input_edges=input_edges,
n_node = num_nodes,
attn_bias=attn_bias,
in_degree=in_degree,
out_degree=out_degree,
spatial_pos=spatial_pos,
attn_edge_type=attn_edge_type,
ignore_mismatched_sizes = True
)
I still get dimension errors despite using the preprocessed test split of the training data. My training set was processed following the Tutorial as such,

from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator

train = train_test_valid_dataset[‘train’].map(preprocess_item, batched=False)
test = train_test_valid_dataset[‘test’].map(preprocess_item, batched=False)
val = train_test_valid_dataset[‘valid’].map(preprocess_item, batched=False)

How the test data looks before processing,
Dataset({
features: [‘edge_index’, ‘edge_attr’, ‘node_feat’, ‘num_nodes’, ‘y’],
num_rows: 6
})

After processing,
Dataset({
features: [‘edge_index’, ‘edge_attr’, ‘node_feat’, ‘num_nodes’, ‘y’, ‘input_nodes’, ‘attn_bias’, ‘attn_edge_type’, ‘spatial_pos’, ‘in_degree’, ‘out_degree’, ‘input_edges’, ‘labels’],
num_rows: 6
})

Could someone be kind enough and give me an example of how I can call the model for inference on one instance? @clefourrier, should I use a tokenizer for piping graphs? Does the data need to be processed differently for testing?

Specs:

transformers version: 4.31.0
Platform: macOS-10.16-x86_64-i386-64bit
Python version: 3.8.17
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.1
Accelerate version: 0.21.0
Accelerate config: not found
PyTorch version (GPU?): 2.0.1 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script? No
Using distributed or parallel set-up in script? No

Topic		Replies	Views
How to use pipeline for 'token-classification' with already tokenized input? Beginners	0	690	February 3, 2022
Limit max # of tokens for inference in pipeline? Beginners	0	1080	April 7, 2023
Customizing pipeline problems Beginners	0	299	August 10, 2022
Fundamental newbie questions Beginners	1	1335	December 6, 2020
Option to load only tokenizer and model configuration into "token-classification" pipeline 🤗Tokenizers	0	779	November 25, 2022

Not sure how to pipeline/tokenize graphs for inference

which throws the following error,

Related topics