Trainer Question Answering evaluation metrics

abdallah197 · January 19, 2021, 3:58pm

Hi HF community

I wanted to ask whether anyone has encountered an example of evaluating QA models using built-in trainer functions like compute_metrics

sgugger · January 19, 2021, 9:21pm

You can check the new run_qa script. It does need a special subclass of Trainer because the post-processing is fairly complex, but it uses compute_metrics and the squad metrics from the datasets library.

abdallah197 · January 25, 2021, 9:06am

I keep running into an error whenever I try to use costume datasets. I found this recent issue on the repo but it’s still not closed yet. I tried the suggested fixes but no output yet. my data files are formatted like Squad. an example:

{
  "data": [
    {
      "paragraphs": [
        {
          "qas": [
            {
              "id": "52bf208003868f1b06000019_002",
              "question": "What is the inheritance pattern of Li\u2013Fraumeni syndrome?",
              "answers": [
                {
                  "text": "autosomal dominant",
                  "answer_start": 213
                }
              ]
            }
          ],
          "context": "Balanced t(11;15)(q23;q15) in a TP53+/+ breast cancer patient from a Li-Fraumeni syndrome family. Li-Fraumeni Syndrome (LFS) is characterized by early-onset carcinogenesis involving multiple tumor types and shows autosomal dominant inheritance. Approximately 70% of LFS cases are due to germline mutations in the TP53 gene on chromosome 17p13.1. Mutations have also been found in the CHEK2 gene on chromosome 22q11, and others have been mapped to chromosome 11q23. While characterizing an LFS family with a documented defect in TP53, we found one family member who developed bilateral breast cancer at age 37 yet was homozygous for wild-type TP53. Her mother also developed early-onset primary bilateral breast cancer, and a sister had unilateral breast cancer and a soft tissue sarcoma. Cytogenetic analysis using fluorescence in situ hybridization of a primary skin fibroblast cell line revealed that the patient had a novel balanced reciprocal translocation between the long arms of chromosomes 11 and 15: t(11;15)(q23;q15). This translocation was not present in a primary skin fibroblast cell line from a brother with neuroblastoma, who was heterozygous for the TP53 mutation. There was no evidence of acute lymphoblastic leukemia in either the patient or her mother, although a nephew did develop leukemia and died in childhood. These data may implicate the region at breakpoint 11q23 and/or 15q15 as playing a significant role in predisposition to breast cancer development."
        },

The error is

Using custom data configuration default
Downloading and preparing dataset json/default-43dfe5d134316dba (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/abashir/.cache/huggingface/datasets/json/default-43dfe5d134316dba/0.0.0/fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9...
Traceback (most recent call last):
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 434, in incomplete_dir
    yield tmp_dir
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 476, in download_and_prepare
    dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 553, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 897, in _prepare_split
    for key, table in utils.tqdm(generator, unit=" tables", leave=False, disable=not_verbose):
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/tqdm/std.py", line 1130, in __iter__
    for obj in iterable:
  File "/home/abashir/.cache/huggingface/modules/datasets_modules/datasets/json/fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9/json.py", line 75, in _generate_tables
    parse_options=self.config.pa_parse_options,
  File "pyarrow/_json.pyx", line 247, in pyarrow._json.read_json
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/GW/Health-Corpus/work/UMLS/transformers/examples/question-answering/run_qa.py", line 495, in <module>
    main()
  File "/GW/Health-Corpus/work/UMLS/transformers/examples/question-answering/run_qa.py", line 222, in main
    datasets = load_dataset(extension, data_files=data_files, field="data")
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/load.py", line 611, in load_dataset
    ignore_verifications=ignore_verifications,
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 483, in download_and_prepare
    self._save_info()
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 440, in incomplete_dir
    shutil.rmtree(tmp_dir)
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/shutil.py", line 498, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/shutil.py", line 496, in rmtree
    os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/abashir/.cache/huggingface/datasets/json/default-43dfe5d134316dba/0.0.0/fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.incomplete'

theudster · May 3, 2021, 10:03am

Hi! the link seems to be broken

sgugger · May 3, 2021, 1:09pm

The example has moved here.

Topic		Replies	Views
Metrics for Text Generation from T5 Model Beginners	3	870	November 1, 2023
T5 Model Evaluation on Generation 🤗Transformers	0	421	February 8, 2024
Compute_metrics() behaves strangely in distributed setting 🤗Transformers	0	47	July 28, 2024
(Distributed Training) KeyError: eval_f1 in QuestionAnsweringTrainer taken from trainer_qa.py in examples 🤗Transformers	1	1192	June 22, 2023
AutoTrain models performance (mainly F1 score) 🤗AutoTrain	7	1580	January 3, 2023

Trainer Question Answering evaluation metrics

Related topics