You could have a look at implementation of existing metrics available here on datasets repo. You can even use one of the simpler one like accuracy or f1 as base and then modify it for your case.
Instructions to load a custom metric are available on the documentation page.