Hi HF community
I wanted to ask whether anyone has encountered an example of evaluating QA models using built-in trainer functions like compute_metrics
Hi HF community
I wanted to ask whether anyone has encountered an example of evaluating QA models using built-in trainer functions like compute_metrics
You can check the new run_qa
script. It does need a special subclass of Trainer
because the post-processing is fairly complex, but it uses compute_metrics
and the squad metrics from the datasets library.
I keep running into an error whenever I try to use costume datasets. I found this recent issue on the repo but it’s still not closed yet. I tried the suggested fixes but no output yet. my data files are formatted like Squad. an example:
{
"data": [
{
"paragraphs": [
{
"qas": [
{
"id": "52bf208003868f1b06000019_002",
"question": "What is the inheritance pattern of Li\u2013Fraumeni syndrome?",
"answers": [
{
"text": "autosomal dominant",
"answer_start": 213
}
]
}
],
"context": "Balanced t(11;15)(q23;q15) in a TP53+/+ breast cancer patient from a Li-Fraumeni syndrome family. Li-Fraumeni Syndrome (LFS) is characterized by early-onset carcinogenesis involving multiple tumor types and shows autosomal dominant inheritance. Approximately 70% of LFS cases are due to germline mutations in the TP53 gene on chromosome 17p13.1. Mutations have also been found in the CHEK2 gene on chromosome 22q11, and others have been mapped to chromosome 11q23. While characterizing an LFS family with a documented defect in TP53, we found one family member who developed bilateral breast cancer at age 37 yet was homozygous for wild-type TP53. Her mother also developed early-onset primary bilateral breast cancer, and a sister had unilateral breast cancer and a soft tissue sarcoma. Cytogenetic analysis using fluorescence in situ hybridization of a primary skin fibroblast cell line revealed that the patient had a novel balanced reciprocal translocation between the long arms of chromosomes 11 and 15: t(11;15)(q23;q15). This translocation was not present in a primary skin fibroblast cell line from a brother with neuroblastoma, who was heterozygous for the TP53 mutation. There was no evidence of acute lymphoblastic leukemia in either the patient or her mother, although a nephew did develop leukemia and died in childhood. These data may implicate the region at breakpoint 11q23 and/or 15q15 as playing a significant role in predisposition to breast cancer development."
},
The error is
Using custom data configuration default
Downloading and preparing dataset json/default-43dfe5d134316dba (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /home/abashir/.cache/huggingface/datasets/json/default-43dfe5d134316dba/0.0.0/fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9...
Traceback (most recent call last):
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 434, in incomplete_dir
yield tmp_dir
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 476, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 553, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 897, in _prepare_split
for key, table in utils.tqdm(generator, unit=" tables", leave=False, disable=not_verbose):
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/tqdm/std.py", line 1130, in __iter__
for obj in iterable:
File "/home/abashir/.cache/huggingface/modules/datasets_modules/datasets/json/fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9/json.py", line 75, in _generate_tables
parse_options=self.config.pa_parse_options,
File "pyarrow/_json.pyx", line 247, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: straddling object straddles two block boundaries (try to increase block size?)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/GW/Health-Corpus/work/UMLS/transformers/examples/question-answering/run_qa.py", line 495, in <module>
main()
File "/GW/Health-Corpus/work/UMLS/transformers/examples/question-answering/run_qa.py", line 222, in main
datasets = load_dataset(extension, data_files=data_files, field="data")
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/load.py", line 611, in load_dataset
ignore_verifications=ignore_verifications,
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 483, in download_and_prepare
self._save_info()
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/site-packages/datasets/builder.py", line 440, in incomplete_dir
shutil.rmtree(tmp_dir)
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/shutil.py", line 498, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/home/abashir/anaconda3/envs/mpi/lib/python3.7/shutil.py", line 496, in rmtree
os.rmdir(path)
OSError: [Errno 39] Directory not empty: '/home/abashir/.cache/huggingface/datasets/json/default-43dfe5d134316dba/0.0.0/fb88b12bd94767cb0cc7eedcd82ea1f402d2162addc03a37e81d4f8dc7313ad9.incomplete'
Hi! the link seems to be broken
The example has moved here.