Output of NER pipeline is in single quotes... difficult to transform it in JSON

SiderumSpectator · January 24, 2023, 8:59pm

Hi, I’m using a pretrained model on my data. I’m iterating over a bunch of XML files and storing the output in a list, and I want the whole output, so that list, to be a JSON file. I need the JSON because I’d need to manipulate it later on. Thing is, the output of the model is in single quotes and if I replace those with double quotes to reuse the file later on, it becomes a problem for words like “isn’t”, “wouldn’t”, etc., and I cannot replace those manually because it’s incredibly time-consuming. I’m not sure what is the best way to go from here. What do you suggest?

My code is below:

label_list= ['literal',"metaphoric"]

label_dict_relations={ i : l for i, l in enumerate(label_list) }

tokenizer = AutoTokenizer.from_pretrained("lwachowiak/Metaphor-Detection-XLMR")

model = AutoModelForTokenClassification.from_pretrained("lwachowiak/Metaphor-Detection-XLMR", id2label=label_dict_relations)

words_input_dir = "MYDIRECTORY"
import os
os.chdir(words_input_dir)

resu = {}
sents = []
import json
for filename in os.listdir(words_input_dir):
    if filename.endswith(".xml"):
      tree = ET.parse(filename)
      root = tree.getroot()
      node = root.findall("./body/sec/p")
      for x in node:
        if x is not None:
          coco = x.text
          nerpipeline = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy="simple")
          data = nerpipeline(str(coco))
          json_string = json.dumps(str(sents))
          with open(r'MYPATH/RESULTS.json', "w") as outfile:
            outfile.write(json_string)

TL&DR: how can I convert the output of this pipeline to a (correct) JSON?

Topic		Replies	Views
Pretrain a model on a very specific language for NER Beginners	0	372	September 28, 2023
Token Classification run_NER.py AttributeError Models	1	892	July 8, 2022
Unable to get NER tags from "ner" pipeline? Beginners	0	521	October 7, 2020
How to get NER pipeline output to match with spacy's output? 🤗Transformers	3	2083	July 12, 2020
Inconsistency in Model Output [ Token Classification] 🤗Transformers	0	333	April 12, 2023

Output of NER pipeline is in single quotes... difficult to transform it in JSON

Related topics