In LayoutLMv2, TIA and MVLM


i m trying reproduce LayoutLMv2 pretraining performance.

But I faced problem with compute TIA loss

In LayoutLMV2 paper,

TIA(Text-Image-alignment) explain like this,

When MVLM and TIA are preformed simultaneously, TIA losses of the tokens masked in MVLM are not taken into account. THis prevents the model from learning the useless but straighforward correspondence from ‘[MASK]’ to ‘[Covered]’

i understand that means when setting tia_label, if token masked with MVLM’s [MASK] , ignore that token to prevent model train [MASK] to [Covered].

So if input embedding will be generate like this

input :


in line :

   #line1 : ['a']['b']['mask']['d']
   #line2  : ['e']['f']['mask']

TIA perform cover line 15% probability, in this case, assume line2 will covered,

total input’s TIA_label will be

#[       line1         ][     line2       ]
 [notCover][notCover][ignore][notCover]  [Covered][Covered][ignore]

But, TIA loss compute the binary cross entropy loss

how they compute loss?