5 B. Error rates The interaction with a biometric system starts with the enrolment, where the quality of enrolment data is very important and significantly influences the system performance. Often several input samples (e.g. 3 or 5) are combined to create one biometric reference (or to verify usability of the newly created biometric reference). The probability of a person not being able to enrol in a biometric system is called the Fail to Enrol rate (FTE). It is computed as a fraction of people who could not enrol in the system out of the complete group of people. The FTE rate includes people without fingers (for fingerprint systems), visually impaired people (for iris-based systems), etc. For verification/identification attempts, the biometric input sample is obtained and its quality is verified. If the quality does not satisfy certain minimal quality requirements, the acquisition process must be repeated. If all repeated acquisitions do not yield sufficiently good samples, the person cannot be identified/verified and such an attempt increases the Failure To Acquire (FTA) rate. Sometimes the minimal quality can be configured and then it is clear that the stricter we are with the quality check the better result we get during the biometric comparison and vice versa. The FTA rate can be therefore traded off with biometric matching error rates. Input samples of sufficient quality are processed in the biometric matching algorithm. The matching algorithm compares the input sample with a biometric reference (in the case of verification) or number of references (in the case of identification). The result of the matching algorithm is either correct or incorrect. If an error occurs, the resulting decision can either incorrectly refuse an authentic person (this is so-called false non-match ­ FNM) or match an impostor with another person's biometric reference (this is so called false match ­ FM). What happens next depends on the system policy. In the case of single attempt scenario, the verification/identification ends. In the case of, for example three-attempt scenario, re-acquisition is possible if the person is not being recognized (either false non-match or correct refusal of an impostor). The final result of an authentication/verification attempt is either correct acceptance or correct refusal, false acceptance or false rejection. In the case of single-attempt scenario the FRR and FAR can be computed as: FRR = FTE + (1 - FTE) FTA + (1 - FTE) (1 - FTA) FNMR March 5, 2008 DRAFT 6 FAR = (1 - FTE)2 (1 - FTA) FMR For the purpose of FAR computations the so-called zero-effort (also called random forgery) unauthorized authentication attempts are taken. In this case attackers are not actively changing their biometric characteristics (for example in the case of dynamic signature systems they sign as usual). Sometimes the minimal quality required for a successful enrolment can be configured. It is however clear that the stricter we are with the quality control at the time of enrolment (i.e. the better quality of the biometric reference), the better results we achieve later in verification/identification attempts and vice versa. Therefore matching error rates can be traded off with the enrolment quality requirements. In 2004 Atos Origin (commissioned by the UK Passport Service) ran a biometric trial. Facial, iris and fingerprint systems were tested in real conditions with 3 groups of participants: Quota (representative sample of the population), Opportunistic (volunteers) and Disabled (several types of disabilities). The Quota and Disabled results can be briefly summed in the table 1. For details (explanation of some of the results, shortcomings of the trial etc.) see the final report of the trial [5]. Face Iris Fingerprint FTE FTA FNMR FRR FTE FTA FNMR FRR FTE FTA FNMR FRR Quota 0.15 0.00 30.82 30.92 12.30 0.44 1.75 14.21 0.69 6.98 11.70 18.43 Disabled 2.27 0.00 51.57 52.67 39.00 0.68 8.22 44.39 3.91 3.14 16.35 22.14 TABLE I THE ERROR RATES OF FACIAL, IRIS AND FINGERPRINT SYSTEMS IN A UK 2004 TRIAL [5]. ALL VALUES ARE EXPRESSED IN %. The correct way to calculate error rates is to compute error rates for each person who contributes to the tests and then to average3 the rates over the group of all the people. Otherwise the results can be biased by an unbalanced number of verification/identification attempts done by different people. 3 Weighted average corresponding to the target population can also be used. March 5, 2008 DRAFT 7 As we have seen, the accuracy/usability of biometric systems can be measured in the terms of FTE, FTA, FMR, FNMR, FAR and FRR. When comparing different systems, typically only the resulting FR and FA rates are used. The FAR and FRR can be graphically expressed in a FAR-FRR graph, where both the error rates are a function of the threshold value or can be plotted in a ROC graph where the FAR is a function of FRR or vice-versa (thus eliminating the threshold value from the graph). Figures 1 and 2 give a simplified example of such graphs. The point where FAR and FRR have the same value is called the equal error rate (EER) or the crossover accuracy. Such a threshold does not have a particular importance, but the resulting EER can be used as a (rather simplified) performance value of a biometric system in evaluations. EER security threshold FAR FRR FAR FRR Figure 1: FAR-FRR graph (idealized). ROC FAR FRR Figure 2: ROC graph (idealized). Now let us review some real numbers. There are several types of tests [7] and not all the results must necessarily be comparable. The American NIST is regularly testing the accuracy of fingerprint and facial biometric systems. As an example of the result of their test effort we include here the ROC graph of facial bioMarch 5, 2008 DRAFT 8 metric systems from 2006. The details of the NIST tests can be found at fingerprint.nist.gov and face.nist.gov. Figure 3: The ROC graph of several facial recognition algorithms and human ability to recognise faces (FRVT 2006 run by NIST [35] for facial images with illumination changes). C. Large scale systems Designing a biometric system for a few of data subjects (as users are called according to [23]) is relatively easy. Tuning a system for millions of data subjects is significantly more challenging. While the verification speed and accuracy is essentially same for a system with 10 data subjects and for a system with 10 million data subjects, the identification mode makes the difference. In identification mode the biometric system can incorrectly reject the data subject (and this affects the false-negative identification-error rate ­ FNIR) or incorrectly accept an impostor (and this is measured by the false-positive identification-error rate ­ FPIR). In the case of a single attempt scenario the values of FNIR and FPIR can be estimated from March 5, 2008 DRAFT