How to determine if a sentence is correct?

Is there any way to calculate if a sentence is correct. I have tried to calculate sentence perplexity using gtp2 as here - GPT-2 Perplexity Score Normalized on Sentence Lenght?.

So there i get quite close results, considering it is obvious that the other sentence is wrong by all means.
I am a man. 50.63967
I is an man. 230.10565

Is there any other way to calculate if i sentence is correct. Because this is a quite close result.

Maybe finetune T5 on examples, if there is a training set?

I have made some huge 3 gram and 4 gram models and they seem to be useless, even I used around 800 GB of text and i cant tell if a sentence is good or not.

Although I cannot vouch for their quality, there are a number of grammar correction models in model hub: Models - Hugging Face

They seem to finetune T5 or GPT as you mentioned. However, there will never be a guarantee that the model output is 100% grammatically correct. I think a rule-based approach suits grammar the most, since it mostly follows well-defined rules.

1 Like

Hi,

The task you are referring to is one of the subtasks in the GLUE benchmark (which is an important benchmark in NLP): the CoLa dataset (CoLa is short for Corpus of Linguistic Acceptability). This is a simply binary classification task: given a sentence, the model needs to determine whether the sentence is grammatically correct or not.

Hence, you can use a BERT model (or one of its variants, such as RoBERTa, DistilBERT, etc.) fine-tuned on this dataset. This is already available on the hub, for example this one.

2 Likes

Rule based grammar. Which library? But i really doubt they can determine if a sentence is good or not. I found Cola can somestimes determine it well.

It is good sometimes, i mean it worked for shorter sentences. But there is not any info on how to use it, just how to load the model.