I have two models (model1
, model2
) which are run on a test dataset of 5,800
instances,
So I show each data point to each model, and they generate a response.
If the response from the model matches with the expected outcome, then I count it as it as “1”, else it would be “0”. So its binary
outcome.
- Accuracy from
model1
is 68%. - Accuracy from
model2
is 75%.
But I was asked that the improvement with model2 compared to model1 is not statistically significant.
I was further asked to report a confidence interval for a two-sample difference in proportion tests.
Also what is the statistical significance of my result?
I don’t understand this query.
- I ran inference with all the data points, and it is not like I only tested for a proportion of the test dataset.
Please note I have read about confidence interval, Paired Samples T-Test from different blogs, Khan academy, and watched many youtube videos.
But I cant figure this out. All the videos apply these techniques for numeric values, and not for a binary metric like the above.
As an example, in this tutorial Paired Samples T-Test (How to calculate and interpret) - YouTube they applied this for weight loss. As a result they were able to get mean
, standard deviation
. But for my case its a binary outcome i.e. 0
or 1
.
So how to calculate statistical significance from the accuracy from two models?
Can anyone please help me?