Call for Participation: SemEval 2022 Task 2 Multilingual Idiomaticity Detection and Sentence Embedding

harish · October 4, 2021, 7:47pm

Dear all,

We invite you to participate in the Multilingual Idiomaticity Detection and Sentence Embedding Shared task which is being held as part of SemEval 2022.

Subtask B is a novel task that is likely to be of interest to those working on language models.

All participants are invited to submit a task description paper. We are not just looking for models that are top performing but are also looking for interesting ideas and methods of addressing this problem.

Please do not hesitate to get in touch with any questions.

[Apologies for cross-posting.]

================================================

FIRST CALL FOR PARTICIPATION

SemEval 2022 Task 2 Multilingual Idiomaticity Detection and Sentence Embedding

We are excited to announce the SemEval 2022 Task seeking to encourage the development of methods aimed at better identification and representation of Idiomatic Multiword Expressions (MWEs).

Motivation

================================================

By and large, the use of compositionality of word representations has been successful in capturing the meaning of sentences. However, there is an important set of phrases — those which are idiomatic — which are inherently not compositional. Early attempts to represent idiomatic phrases in non-contextual embeddings involved the extraction of frequently occurring n-grams from text (such as “big fish”) before learning representations of the phrase based on their context. However, the effectiveness of this method drops off significantly as the length of the idiomatic phrase increases as a result of data sparsity. More recent studies show that even state-of-the-art pre-trained contextual models (e.g. BERT) cannot accurately represent idiomatic expressions.

Task Overview

================================================

Given this shortcoming in existing state-of-the-art models, this task (part of SemEval 2022) is aimed at detecting and representing multiword expressions (MWEs) which are potentially idiomatic phrases across English, Portuguese and Galician. This task consists of two subtasks, each available in two “settings”.

Participants have the freedom to choose a subset of subtasks or settings that they’d like to participate in (see sections detailing each of the subtasks for details). You cannot pick a subset of languages.

This task consists of two subtasks:

Subtask A

A binary classification task aimed at determining whether a sentence contains an idiomatic expression.

Subtask B

This novel subtask requires models to output the correct Semantic Text Similarity (STS) scores between sentence pairs whether or not either sentence contains an idiomatic expression. Participants must submit STS scores which range between 0 (least similar) and 1 (most similar). This will require models to correctly encode the meaning of idiomatic phrases such that the encoding of a sentence containing an idiomatic phrase (e.g. Who will he start a program with and will it lead to his own swan song?) and the same sentence with the idiomatic phrase replaced by a (literal) paraphrase (e.g. Who will he start a program with and will it lead to his own final performance?) are semantically similar to each other and equally similar to any other sentence.

Important Dates

================================================

[NOW AVAILABLE] Training data available: September 3, 2021

Evaluation start: January 10, 2022

Evaluation end: (TBC) January 31, 2022

Paper submissions due: (TBC) February 23, 2022

Notification to authors: March 31, 2022

Organisation

================================================

Harish Tayyar Madabushi, University of Sheffield, UK.

Edward Gow-Smith, University of Sheffield, UK.

Marcos Garcia, Universidade de Santiago de Compostela, Spain

Carolina Scarton, University of Sheffield, UK.

Marco Idiart, Federal University of Rio Grande do Sul, Brazil.

Aline Villavicencio, University of Sheffield, UK.

For more information, see: SemEval 2022 Task 2

elenajones · July 14, 2024, 4:37am

The next call for participation is approaching in September 2024. Titled “Call for Participation: SemEval 2022 Task 2 Multilingual Idioms Detection and Sentence Embedding.” This task focuses on developing models capable of detecting idiomatic expressions in multiple languages and creating effective sentence embeddings. Researchers and practitioners are invited to contribute their expertise and innovations to advance the understanding and processing of idiomatic language across different linguistic contexts. This is a valuable opportunity to engage with the global NLP community, share findings, and collaborate on cutting-edge solutions for multilingual idiomaticity detection.

Topic		Replies	Views
EMNLP Picks from the Hugging Face Science Team Research	1	4067	December 2, 2020
Multilingual token, phrase and sentence representations for text similarity Research	0	491	January 13, 2021
Community content of the week (01/13/2022) Community Calls	0	1737	January 13, 2022
2nd CfP: GermEval2024 GerMS-Detect - Sexism Detection and Annotator Disagreement modeling in German Online News Fora @Konvens 2024 Research	0	154	May 15, 2024
ACL 2020 - Some personal highlights - Victor Research	4	1367	July 14, 2020

Call for Participation: SemEval 2022 Task 2 Multilingual Idiomaticity Detection and Sentence Embedding

FIRST CALL FOR PARTICIPATION

Motivation

Task Overview

Subtask A

Subtask B

Important Dates

Organisation

Related topics