No dynamic sized input with huggingface-transformers ALBERT and TFjs

Goal

We try to use Tensorflow ALBERT from Huggingface transformers for TokenClassification to process text of any length in Tensorflow JS. The code of ALBERT was not changed, only trained and saved as *.pb file.

What we tried

We trained TFAlbertForTokenClassification with this command with our dataset:

!python3 transformers/examples/token-classification/run_tf_ner.py --data_dir transformers-dataset/ \
 --labels labels-hf.txt \
 --model_name_or_path albert-base-v2 \
 --output_dir models/hf \
 --max_seq_length  128 \
 --num_train_epochs 1 \
 --per_device_train_batch_size 32 \
 --save_steps 750 \
 --seed 1 \
 --do_train \
 --do_eval 

Under Python you can load the *.pb file and process tokenized text of any length.

tokenizer = AutoTokenizer.from_pretrained('models/hf')
model = TFAlbertForTokenClassification.from_pretrained('models/hf')
input_text = "That is a test"
sent = tokenizer(input_text)
print(sent)
test = model.predict(sent['input_ids'])

Output looks like this:

Output:
{'input_ids': [2, 30, 25, 21, 1289, 3], 'token_type_ids': [0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1]}
(array([[[ 0.47850922, -0.32002082, -0.30604097, -0.45886928,
          0.7326187 ]],

       [[-0.47561824, -0.2232911 , -1.1606266 , -0.76804316,
          1.5043088 ]],

       [[-0.08451878,  0.20940554, -0.53571814, -0.50926656,
          1.2012056 ]],

       [[-0.06268388, -0.91142046, -0.8368153 , -0.58241546,
          0.72836596]],

       [[-0.9490294 ,  0.1855019 , -0.34128478,  0.76724774,
         -0.28610477]],

       [[-0.8634669 ,  0.13790444,  0.23986098, -0.12315045,
          1.7485847 ]]], dtype=float32),)

To use Tensorflow JS we converted the *.pb file with tfjs-converter which gave use a model.json and a weight file:

!tensorflowjs_converter \
--input_format=tf_saved_model \
--weight_shard_size_bytes 128000000 \
saved_model \
tfjs

The Problem

After converting the *.pb file with tfjs-converter we tried to run the converted tfjs model with the same input [2, 30, 25, 21, 1289, 3]:

const tf = require('@tensorflow/tfjs')
const tfn = require('@tensorflow/tfjs-node')
const handler = tfn.io.fileSystem('./model/model.json')

tf.ready()
  .then(() => console.log('tensorflow ready'))
  .catch((e) => console.log(e))

tf.loadGraphModel(handler)
  .then((gModel) => {
    let tensor = tf.tensor2d([[2, 30, 25, 21, 1289, 3]], [1, 6], 'int32')

    const result = gModel.predict(tensor)
    console.log('result', result)
  })
  .catch((e) => console.log(e))

We got an error:

Error: The shape of dict['input_ids'] provided in model.execute(dict) must be [-1,5], but was [1,6]

When we use an input list with 5 integers like let tensor = tf.tensor2d([[2, 30, 25, 1289, 3]], [1, 5], 'int32'), it works as expected. In model.json file we found out, that after converting the input shape of the graph was reduced to a fixed size input of shape [-1,5]:

model.json
{"inputs": {"input_ids:0": {"name": "input_ids:0", "dtype": "DT_INT32", "tensorShape": {"dim": [{"size": "-1"}, {"size": "5"}]}}}

Question

What we need to do to get a dynamic sized input for “input_ids” in this model after convertion to Tensorflow JS model?

We used TF 2.3 and TFjs 2.4 and Huggingface transformers version 3.3.1.

Thank you for your help!