Hi, I trained a VisionEncoderDecoderModel to regconize math expression LaTeX image.
when I predict image with long math expression label , (the length of label text is not over max_length), Model generation is incomplete or is stopped in the middle way like this:
label:"= \left( \frac { 9 } { 4 } \right) ^ { \frac { 1 } { 2 } } - \left( \frac { 3 } { 4 } \right) ^ { 2 } + \left( \frac { 1 } { 6 } \right) ^ { 2 } \times \left( \frac { 2 7 } { 8 } \right) ^ { \frac { 2 } { 3 } }" (210 words include space)
model pred:"= { { \left( \frac { 9 } { 4 } \right) } ^ { \frac { 1 } { 2 } } } - { { \left( \frac { 3 } { 4 } \right) } ^ { 2 } } + { { \left( \frac" (136 words include space)
this is a part of processing dataset labels:
encoder_checkpoint = 'google/vit-base-patch16-224-in21k'
decoder_checkpoint = 'witiko/mathberta'
feature_extractor = ViTFeatureExtractor.from_pretrained(encoder_checkpoint)
tokenizer = AutoTokenizer.from_pretrained(decoder_checkpoint)
processor = TrOCRProcessor(image_processor=feature_extractor,tokenizer=tokenizer)
. . .
self.max_target_length = 256
. . .
labels = self.processor.tokenizer(text,padding="max_length",max_length=self.max_target_length,truncation=True).input_ids
this is the paramaters of beam search:
# set beam search paramaters
model.config.eos_token_id = processor.tokenizer.sep_token_id
model.config.max_length = 256
model.config.early_stopping = True
model.config.no_repeat_ngram = 3
model.config.length_penalty = 2.0
model.config.num_beams = 4
How can I fix this?