How to calculate statistical significance for accuracy from two model?

I have two models (model1, model2) which are run on a test dataset of 5,800 instances,

So I show each data point to each model, and they generate a response.

If the response from the model matches with the expected outcome, then I count it as it as “1”, else it would be “0”. So its binary outcome.

  • Accuracy from model1 is 68%.
  • Accuracy from model2 is 75%.

But I was asked that the improvement with model2 compared to model1 is not statistically significant.

I was further asked to report a confidence interval for a two-sample difference in proportion tests.

Also what is the statistical significance of my result?

I don’t understand this query.

  • I ran inference with all the data points, and it is not like I only tested for a proportion of the test dataset.

Please note I have read about confidence interval, Paired Samples T-Test from different blogs, Khan academy, and watched many youtube videos.

But I cant figure this out. All the videos apply these techniques for numeric values, and not for a binary metric like the above.

As an example, in this tutorial Paired Samples T-Test (How to calculate and interpret) - YouTube they applied this for weight loss. As a result they were able to get mean, standard deviation. But for my case its a binary outcome i.e. 0 or 1.

So how to calculate statistical significance from the accuracy from two models?

Can anyone please help me?

1 Like

You can use t-test
look at this