How to calculate statistical significance for accuracy from two model?

neo-benjamin · November 17, 2022, 5:27am

I have two models (model1, model2) which are run on a test dataset of 5,800 instances,

So I show each data point to each model, and they generate a response.

If the response from the model matches with the expected outcome, then I count it as it as “1”, else it would be “0”. So its binary outcome.

But I was asked that the improvement with model2 compared to model1 is not statistically significant.

I was further asked to report a confidence interval for a two-sample difference in proportion tests.

Also what is the statistical significance of my result?

I don’t understand this query.

I ran inference with all the data points, and it is not like I only tested for a proportion of the test dataset.

Please note I have read about confidence interval, Paired Samples T-Test from different blogs, Khan academy, and watched many youtube videos.

But I cant figure this out. All the videos apply these techniques for numeric values, and not for a binary metric like the above.

As an example, in this tutorial Paired Samples T-Test (How to calculate and interpret) - YouTube they applied this for weight loss. As a result they were able to get mean, standard deviation. But for my case its a binary outcome i.e. 0 or 1.

So how to calculate statistical significance from the accuracy from two models?

Can anyone please help me?

savasy · November 22, 2022, 1:11pm

You can use t-test
look at this

Topic	Replies	Views
Statistical significance between BERT models Beginners	317	May 31, 2022
Measure statistical significance betweetn Beginners	205	June 14, 2022
How do I get a final accuracy for my model if my data is split into train/validation/test Beginners	273	February 28, 2023
Need to verify results of transformer models by running cross-validation or statistical methods? If so, how? Beginners	449	May 27, 2023
Human Evaluation and Statistical significance Research	418	April 8, 2021