I am just getting started with coreference resolution and am interested in using the huggingface
coval metric for evaluation. However, I’m not sure how to get the output of the model I’m using (NeuralCoref through spaCy) into the required format for the
The function requires references and predictions in CoNNL line format. For example,
words = ['bc/cctv/00/cctv_0005 0 0 Thank VBP (TOP(S(VP* thank 01 1 Xu_li * (V*) * -', ... 'bc/cctv/00/cctv_0005 0 1 you PRP (NP*) - - - Xu_li * (ARG1*) (ARG0*) (116)', ... 'bc/cctv/00/cctv_0005 0 2 everyone NN (NP*) - - - Xu_li * (ARGM-DIS*) * (116)', ... 'bc/cctv/00/cctv_0005 0 3 for IN (PP* - - - Xu_li * (ARG2* * -', ... 'bc/cctv/00/cctv_0005 0 4 watching VBG (S(VP*)))) watch 01 1 Xu_li * *) (V*) -', ... 'bc/cctv/00/cctv_0005 0 5 . . *)) - - - Xu_li * * * -']
Is there a standard way to convert the spaCy NeuralCoref coreference resolution output into this format-- specifically the predicted coreference clusters into the parenthesis structure used for CoNNL ?
I haven’t worked with coreference resolution before, so any insight you can give is much appreciated!