Difference between dev and test in production environment

inhyeok · July 22, 2023, 7:43am

Hi. I’m currently developing a neural IR service. During developing the service I confuse the right way of using the dev/test set.

We are at the beginning of the project so we don’t have any model. I collected data and trained the five models. During the training, I evaluated the five models via the dev set (at every epoch end). After the training models, I tested the models via test set. I get the below result from training. which model should I choose for our service?

Model	Train	Dev	Test
A	1	2	5
A’	2	4	4
A’’	3	3	3
B	4	1	2
C	5	5	1

(Note that A’ and A’’ means the hyper-parameter tuned version of model A and the number means the rank of the metric such as accuracy. Model A shows higher performance than A’ in the dev set)

I think that the dev set and the test set do exactly the same thing here. So I don’t need the test set or dev set (because the distribution of the dev and the test is the same and the models didn’t see the dev set during training, as well as test set). Am I correct?

We get a base model (B) from the above and we want to develop it. 1 year later, I decide to upgrade the service.

Model	Train	Dev	Test
B (Base)	-	-	5
C	2	4	4
D	3	3	3
E	4	1	2
F	5	5	1

I train the new models by using the train set and get metrics via the dev set. When I get the winner in the tes set (which is F), I compare the performance of the base model (B) and the new winner via the test set. Am I correct?

Topic		Replies	Views
Validation VS Test with Transformers Trainer Beginners	2	6272	June 6, 2022
How to use the test set in those beginner examples? Beginners	1	702	October 18, 2021
Compute metric on Dev Research	1	798	April 15, 2022
Do_eval, do_predict difference Beginners	0	593	December 14, 2021
Different results predicting from trainer and model Beginners	6	7947	December 20, 2021

Difference between dev and test in production environment

Related topics