How to Measure Quality of Credit Scoring Models Martin Řezáč, Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University, Kotlářská 2, 611 37 Brno, Czech Republic, mrezac@math.muni.cz František Řezáč, Dept. of Finance, Faculty of Economics and Administration, Masaryk University, Lipová 41a, 602 00 Brno, Czech Republic, rezac@econ.muni.cz Abstract Credit scoring models are widely used to predict a probability of client’s default. To measure the quality of the scoring models it is possible to use quantitative indexes such as Gini index, K-S statistics, Lift, Mahalanobis distance and Information statistics. They are used for comparison of several developed models at the moment of development as well as for monitoring of quality of those models after deployment into real business. The paper deals with definition of good/bad client, which is crucial for further computations. Parameters affecting this definition are discussed. The main part is devoted to quality indexes based on distribution functions (Gini, K-S and Lift) and on density functions (Mahalanobis distance, Information statistics). It brings some interesting results connected to the Lift, especially the expression of the Lift by cumulative distribution functions of scores of bad and all clients, which allows computing value of the Lift for any level of score. Kernel density estimates are used in case of density based indexes, namely for the Information statistics. We extend some known results for normally distributed scores, especially in general case of unequal variances of scores. All proposed expressions are discussed and illustrated in appropriate figures. Over all, application of all listed quality indexes, including appropriate computation issues, is illustrated in the case study based on real financial data. JEL Classification: C10, C53, D81, G32 Keywords: Credit scoring, Quality indexes, Distribution function, Density function, Normally distributed scores 1. Introduction Banks and other financial institutions receive thousands of credit applications every day (in case of consumer credits it can be tens or hundreds of thousands every day). Since it is impossible to process them manually, automatic systems are widely used by these institutions for evaluating credit reliability of individuals who ask for credit. The assessment of the risk associated with granting of credits has been underpinned by one of the most successful applications of statistics and operations research: credit scoring. Credit scoring is the set of predictive models and their underlying techniques that aid financial institutions in the granting of credits. These techniques decide who will get credit, how much credit they should get, and what further strategies will enhance the profitability of the borrowers to the lenders. Credit scoring techniques assess the risk in lending to a particular client. They do not identify “good” or “bad” (negative behaviour is expected, e.g. default) applications on an individual basis, but they forecast probability, that an applicant with any given score will be “good” or “bad”. These probabilities or scores, along with other business considerations such as expected approval rates, profit, churn, and losses, are then used as a basis for decision making. Several modelling methods for credit scoring have been introduced during the last six decades. The best known and most widely used are logistic regression, classification trees, linear programming approach and neural networks. It is impossible to use scoring model effectively without knowing how good it is. First, one needs to select the best model with regard to some measure of quality at the time of development. Second, one needs to monitor the quality after deployment into real business. Methodology of credit scoring models and some measures of their quality were discussed in surveys like Hand and Henley (1997), Thomas (2000) or Crook at al. (2007). Even ten years ago a list of really good books devoted to the issue of credit scoring was not large. The situation has improved in the last decade. For instance, books such as Anderson (2007), Crook et al. (2007), Siddiqi (2006), Thomas et al. (2002) and Thomas (2009) were published. Further remarks connected to credit scoring issues can be found there as well. Despite the fact, that there are some books and articles in scientific journals, there is no comprehensive work devoted to assessment of credit scoring model’s quality in the full complexity. Due to that, we decided to summarize and extend known results on this topic. From the definition of good/bad client through the list of the most popular indexes to their expressions for normally distributed scores, generally with unequal variances of scores. The most used indexes in practice are Gini index, number one in Europe, and KS, number one in North America. Despite the fact, that their use may not be optimal. It is obvious that we need to have the best performance of given scoring model nearby expected cutoff value. Hence we should judge quality indexes from this point of view. Gini index is global measure, hence it is impossible to use it for assessment of local quality. The same holds for mean difference D. The KS is ideal if the expected cutoff value is near that point where KS is realized. Although the information statistics is global measure of model’s quality, we propose to use graphs of difff , LRf and graph of their product to examine local properties of given model. Especially we can focus on region of scores where the cutoff is expected. Overall, the Lift seems to be the best choice for our purpose. Since we proposed expression of the Lift by cumulative distribution functions of scores of bad and all clients, it is possible to compute the value of the Lift for any level of score. The contribution of this paper to practice is the comprehensive overview of widely used techniques of assessment of credit scoring model’s quality, including appropriate discussion. Firstly we discuss definition of good/bad client, which is crucial for further computation. Result of quality assessment process really highly depends on this definition. In the next section we review widely used quality indexes, their properties and mutual relationships and extend some results connected to them. Furthermore we extend known results for normally distributed scores. Finally, application of all listed quality indexes, including appropriate computation issues, is illustrated in the case study based on real financial data. The main contribution to the theory is the expression of the Lift by cumulative distribution functions of scores of bad and all clients and expressions of selected indexes for normally distributed data. Namely, Gini index and Lift in case of common variance of scores and mean difference D, KS, Gini index, Lift and Information statistics in general case, i.e. without assumption of equality of variances. 2. Definition of good/bad client In fact, the most important step in predictive model building is the correct definition of dependent variable. In case of credit scoring it is necessary to precisely define good and bad client. Usually this definition is based on the client’s number of days after the due date (days past due, DPD) and the amount past due. We need to set some tolerance level in case of the past due amount. It means what it is considered as the debt and what is not. It may be that the client gets into payment delay innocently (because of technical imperfections of the system). It does not make sense to regard as debt small amount (e.g. less than 3€) past due as well. Furthermore, it is necessary to determine the time horizon in which the previous two characteristics are traced. For example, as a good is marked client who:  Has less than 60 DPD (with tolerance 3€) in 6 months from the first due date  Has less than 90 DPD (with tolerance 1€) ever Choice of these parameters depends greatly on the type of financial product (certainly will be different parameters for consumer loans for small amounts with original maturities around one year and for mortgages, which are typically connected to very large amounts with maturities up to several tens of years) and on further usage of this definition (credit scoring, fraud prevention, marketing, ...). Another practical issue of the definition of good client is the accumulation of several agreements. For example, it may be that the customer is overdue on more contracts, but with different days past due and with different amounts. In this case, all amounts past due connected to the client in one particular point in time are usually added together and it is taken the maximum value from days past due. This approach can be applied only in some cases and especially in a situation where there is a complete accounting data. The situation is considerably more complex in case of aggregated data. In connection with the definition of good client we can generally talk about the following types of clients:  Good  Bad  Indeterminate  Insufficient  Excluded  Rejected. The first two types were discussed. The third type of client is on the border between good and bad clients, and directly affects their definition. If we are considering only DPD, clients with a high DPD (e.g. 90 +) are typically identified as bad, clients who are not delinquent (e.g. their DPD are less than 30 or equal to zero) are identified as good. As indeterminate are then considered delinquent customers who have not exceeded given threshold of DPD. When we use this type of clients, then we model very good clients against very bad ones. Consequence is obtaining a model with amazing predictive power. Indeed, this power dive immediately after assessing the model on whole population, where indeterminates are considered to be good. Thus the usage of this type of clients is very disputable and usually does not lead to any improvement of model’s quality. The next type is typically case of the clients with the very short history, which makes impossible the correct definition of dependent variable (good / bad client). The excluded clients are typically clients with so wrong data as to be misleading (e.g. frauds). They are also marked as “hard bad”. The second group of excluded clients consists of applicants who belong to a category that will not be assessed by a model (scorecard), e.g. VIPs. The meaning of rejected client is obvious. See Anderson (2007), Thomas et al. (2002) or Thomas (2009) for more details. Only good and bad clients are used for further model building. When we do not use indeterminate category, set up some tolerance level for amount past due and solve somehow the issue with simultaneous contracts, it remains two parameters affecting the good/bad definition. It is DPD and time horizon. Usually it is useful to build up set of models with varying levels of these parameters. Furthermore it can be useful to develop a model with one good/bad definition and measure the model’s quality with another. It should hold that scoring models developed on harder definition (higher DPD, longer time horizon or measuring DPD on first payment) perform better than those developed on softer definitions (Witzany 2009). Furthermore, it should hold that given scoring model has higher performance if it is measured by harder good/bad definition. If not, usually it means that something is wrong. Over all, development and assessment of credit scoring models on as hard as possible and reasonable definition should lead to the best performance. 3. Measuring the quality Once the definition of good / bad client and client's score is available, it is possible to evaluate the quality of this score. If the score is an output of a predictive model (scoring function), then we evaluate the quality of this model. We can consider two basic types of quality indexes. First, indexes based on cumulative distribution function like KolmogorovSmirnov statistics, Gini index and Lift. The second, indexes based on likelihood density function like Mean difference (Mahalanobis distance) and Informational statistics. For further available measures and appropriate remarks see Wilkie (2004), Giudici (2003) or Siddiqi (2006). 3.1. Indexes based on distribution function Assume that score s is available for each client and put the following markings.    = .,0 ,1 otherwise goodisclient DK Empirical cumulative distribution functions (CDF) of scores of good (bad) clients are given by the relationships ( )1 1 )( 1 . =∧≤= ∑= Ki n i GOODn DasI n aF , (1) ( )0 1 )( 1 . =∧≤= ∑= Ki m i BADm DasI m aF , [ ]HLa ,∈ , (2) where is is score of ith client, n is number of good, m is number of bad clients and I is the indicator function where I(true)=1 and I(false)=0. L is the minimum value of given score, H is the maximum value. The proportion of bad clients we denote by mn m pB + = , proportion of good clients by mn n pG + = . Furthermore, empirical distribution function of scores of all clients is given by ( )asI N aF i N i ALLN ≤= ∑=1 . 1 )( , [ ]HLa ,∈ , (3) where mnN += is number of all clients. An often-used characteristic in describing the quality of the model (scoring function) is Kolmogorov-Smirnov statistics (K-S or KS). It is defined as [ ] )()(max ,, , aFaFKS GOODnBADm HLa −= ∈ . (4) Figure 1 gives an example of estimation of distribution functions of good and bad clients, including an estimate of KS statistics. It can be seen, for example, that the score around 2.5 and smaller has population of approximately 30% of good clients and 70% of bad clients. Figure 1: Distribution Functions, KS. The Lorenz curve (LC), sometimes confused with ROC curve (Receiver Operating Characteristic curve), can also be successfully used to show the discriminatory power of scoring function, i.e. the ability to identify good and bad clients. The curve is given parametrically by [ ].,),( )( . . HLaaFy aFx GOODn BADm ∈= = The definition and name (LC) is consistent with Müller, M., Rönz, B. (2000). The same definition of the curve, but called ROC one can find in Thomas et al. (2002). Siddiqi (2006) used name ROC for curve with reversed axes and LC for curve with CDF of bad clients on vertical axis and CDF of all clients on horizontal axis. Each point of curve represents some value of given score. If we assume this value as cutoff value, we can read the proportion of rejected bad and good clients. An example of Lorenz curve is given in Figure 2. We can see that by rejection of 20% of good clients we reject almost 60% of bad clients at the same moment. Figure 2: Lorenz Curve, Gini index. In connection to LC we consider next quality measure, Gini index. This index describes a global quality of scoring function. It takes values between -1 and 1. The ideal model, i.e. scoring function that perfectly separate good and bad clients, has the Gini index equal to 1. On the other hand, model that assigns a random score to the client has this index equal to 0. Negative values correspond to a model with reversed meaning of scores. Using Figure 2 it can be defined as A BA A Gini 2= + = . The actual calculation of Gini index can be, given the previous markings, made using ( )[ ( )]1.. 2 1..1 − + = − +⋅−−= ∑ kGOODnGOODn mn k kBADmkBADm FFFFGini k , (5) where kBADmF . ( kGOODnF . ) is kth vector value of empirical distribution function of bad (good) clients. For further details see Thomas et al. (2002), Siddiqi (2006) or Xu (2003). The Gini index is a special case of Somers’ D (Somers (1962)), which is an ordinal association measure defined in general as XX XY YXD τ τ = , where XYτ is Kendall’s aτ defined as ( ) ( )[ ]2121 YYsignXXsignEXY −−=τ , where ( )11,YX , ( )22 ,YX are bivariate random variables sampled independently from the same population, and []⋅E denotes expectation. In our case, 1=X if a client was good and 0=X if the client was bad. Variable Yrepresents scores. It can be found in Thomas (2009), that Somers’ D assessing performance of given credit scoring model, denoted as SD , one can calculate as mn bgbg D ij j i i ij j i i S ⋅ − = ∑∑∑∑ >< , (6) FBAD FGOOD where ig ( jb ) is number of goods (bads) in ith interval of scores. Furthermore it holds that SD can be expressed by Mann-Whitney U-statistic in following way. Order the sample in increasing order of score and sum ranks of goods in the sequence. Let this be GR . The SD is then given by 12 − ⋅ = mn U DS , (7) where U is given by ( )1 2 1 +−= nnRU G . (8) Some further details can be found in Nelsen (1998). Another available type of quality assessment figure is CAP (Cumulative Accuracy Profile). Another names used for this concept are Lift chart, Lift curve, Power curve or Dubbed curve. See Sobehart et al. (2000) or Thomas (2009) for more details. In this case we have the proportion of all clients (FALL) on the horizontal axis and the proportion of bad clients (FBAD) on the vertical axis. An example of Lift chart is displayed in Figure 4. The ideal model is now represented by polyline from [0, 0] through [pB, 1] to [1, 1]. Advantage of this figure is that one can easily read the proportion of rejected bads vs. proportion of all rejected. For example in case of Figure 3 we can see that if we want to reject 70% of bads, we have to reject about 40% of all applicants. Figure 3: CAP. It is called Gains chart in case of marketing usage, see Berry and Linoff (2004). In this case, the horizontal axis represents proportion of clients who can be addressed by some marketing offer and the vertical axis represents proportion of clients who will accept the offer. When we use CAP instead of LC, we can define the Accuracy Rate (AR), see Thomas (2009). Again, it is defined by ratio of some areas. We have Although the ROC and CAP are not equvivalent, it is true that Gini index and AR are equal for any scoring model. Proof for discrete scores is given in Engelmann et al. (2003), for continuous scores one can find it in Thomas (2009). In connection to the Gini index, c-statistics (Siddiqi 2006) is defined as 2 1 Gini statc + =− . (9) It represents the likelihood that randomly selected good client has higher score than randomly selected bad client, i.e. ( )01 2121 =∧=≥=− KK DDssPstatc . It takes values from 0.5, for random model, to 1, for ideal model. Another name for c-statistic can be found in literature. It is Harrell's c, which is a reparameterization of Somers' D and is recommended in Harrell et al. (1996) as a general measure of the predictive power of a prognostic score arising from a medical test. Further details can be found in (Newson 2006). Furthermore it is called AUROC, e.g. in Thomas (2009) or AUC, e.g. in Engelmann et al. (2003). Another possible indicator of the quality of scoring model can be cumulative Lift, which says, how many times, at a given level of rejection, is the scoring model better than random selection (random model). More precisely, the ratio indicates the proportion of bad clients with less than a score a, [ ]HLa ,∈ , to the proportion of bad clients in the general population. Formally, it can be expressed by ( ) ( ) ( ) ( ) ( ) ( ) N n asI DasI DDI DI asI DasI BadRate aBadRate aLift i n i Ki n i KK mn i K n i i n i Ki n i ≤ =∧≤ = =∨= = ≤ =∧≤ == ∑ ∑ ∑ ∑ ∑ ∑ = = + = = = = 1 1 1 1 1 1 0 10 0 0 )( )( It can be easily verified that the Lift can be equivalently expressed as )( )( )( . . aF aF aLift ALLN BADn = , [ ]HLa ,∈ . (10) In practice, this calculation is done for Lift corresponding to 10%, 20%, ..., 100% of clients with the worst score. Let’s demonstrate this procedure by the following example, taken from Coppock (2002). Assume that we have a score of 1000 clients, of which 50 are bad. The proportion of bad clients is 5%. Sort customers according to score and split into ten groups, i.e., divide it by deciles of score. In each group, in our case around 100 clients, then count bad clients. This will get their share in the group (Bad Rate). Absolute Lift in each group is then given by the ratio of the share of bad clients in the group to the proportion of bad clients in total. Cumulative Lift is given by the ratio of the share of bad clients in groups up to the given group to the proportion of bad clients in total. See Table 1. Table 1: Absolute and Cumulative Lift # bad clients Bad rate abs. Lift # bad clients Bad rate cum. Lift 1 100 16 16,0% 3,20 16 16,0% 3,20 2 100 12 12,0% 2,40 28 14,0% 2,80 3 100 8 8,0% 1,60 36 12,0% 2,40 4 100 5 5,0% 1,00 41 10,3% 2,05 5 100 3 3,0% 0,60 44 8,8% 1,76 6 100 2 2,0% 0,40 46 7,7% 1,53 7 100 1 1,0% 0,20 47 6,7% 1,34 8 100 1 1,0% 0,20 48 6,0% 1,20 9 100 1 1,0% 0,20 49 5,4% 1,09 10 100 1 1,0% 0,20 50 5,0% 1,00 All 1000 50 5,0% absolutely cumulatively decile # cleints In connection to the previous example we define ( ))( 1 ))(( ))(( 1 ..1 .. 1 .. qFF qqFF qFF Lift ALLNBADn ALLNALLN ALLNBADn q − − − == , (11) where q represents the score level of 100q% of the worst scores and )(1 . qF ALLN − can be computed as { }qaFHLaqF ALLNALLN ≥∈=− )(],,[min)( . 1 . . Since the expected reject rate is usually between 5% and 20%, q is typically assumed to be equal 0.1 (10%), i.e. we are interested in discriminatory power of scoring model in point of 10% of the worst scores. In this case we have ( ).)1.0(10 1 ..%10 − ⋅= ALLNBADn FFLift . 3.2. Indexes based on density function Let gM and bM be means of scores of good (bad), clients and gS and bS be standard deviations of good (bad) clients. Let S be the pooled standard deviation of the good and bad clients given by 2 1 22         + + = mn mSnS S bg . Estimates of mean and standard deviation of scores for all clients ( ALLALL σµ , ) are given by mn mMnM MM bg ALL + + == , ( ) ( ) ( ) 2 1 2222         + −+−++ = mn MMmMMnmSnS S bgbg ALL . The first quality index based on density function is the standardized difference between the means of two groups of scores, i.e. scores of bad and good clients. Denote by D this mean difference, calculated as S MM D bg − = . Generally, good clients are supposed to get high scores and bad clients low scores, so that we would expect that bg MM > , so that D is positive. Another name for this concept is Mahalanobis distance; see Thomas et al. (2002). The second index based on densities is the information statistics (value) valI , defined in Hand and Henley (1997) as ( ) dx xf xf xfxfI BAD GOOD BADGOODval       −= ∫ ∞ ∞− )( )( ln)()( . (12) We propose to examine decomposed form of right-hand side expression. For this purpose we mark )()( xfxff BADGOODdiff −= ,       = )( )( ln xf xf f BAD GOOD LR . Although the information statistics is global measure of model’s quality, one can use graphs of difff , LRf and graph of their product to examine local properties of given model, see section 4 for more details. We have two basic ways how to compute the value of this index. First way is to create bins of scores and compute it empirically from a table with counts of good and bad clients in that bins. The second way is to estimate unknown densities using kernel smoothing theory. Consequently we compute the integral by a suitable numerical method. Let’s have m score value mis i ,...,1,,0 = for bad clients and n score values njs j ,...,1,,1 = for good clients and recall that L denotes the minimum of all values and H the maximum. Let’s divide the interval [L,H] to r equal subinterval [q0, q1], (q1, q2],…(qr-1, qr], where q0 = L, qr = H. Set rkqqsIn qqsIn n j kkjk m i kkik ,...,1,)],(( )],(( 1 1,1,1 1 1,0,0 =∈= ∈= ∑ ∑ = − = − observed counts of bad and good in each interval. Then the empirical information value is calculated by ∑=         ⋅ ⋅       −= r k k kkk val nn mn m n n n I 1 ,0 ,1,0,1 ln . (13) The following Table 2 gives an example of computational scheme for informational statistics in case of discretized data. Table 2: Informational Statistics score int. # bad clients #good clients % bad [1] % good [2] [3] = [2] - [1] [4] = [2] / [1] [5] = ln[4] [6] = [3] * [5] 1 1 10 2,0% 1,1% -0,01 0,53 -0,64 0,01 2 2 15 4,0% 1,6% -0,02 0,39 -0,93 0,02 3 8 52 16,0% 5,5% -0,11 0,34 -1,07 0,11 4 14 93 28,0% 9,8% -0,18 0,35 -1,05 0,19 5 10 146 20,0% 15,4% -0,05 0,77 -0,26 0,01 6 6 247 12,0% 26,0% 0,14 2,17 0,77 0,11 7 4 137 8,0% 14,4% 0,06 1,80 0,59 0,04 8 3 105 6,0% 11,1% 0,05 1,84 0,61 0,03 9 1 97 2,0% 10,2% 0,08 5,11 1,63 0,13 10 1 48 2,0% 5,1% 0,03 2,53 0,93 0,03 All 50 950 Info. Value 0,68 Second and third column contain counts of bad and good clients. Next two columns, [1] and [2], contain relative frequencies of bad and good clients in each score interval. Last four columns, [3] to [6], represent mathematical operations employed in (13). Adding the last column [6] we get information value. Another way how to compute this index is estimation of appropriate densities using kernel estimations. Consider )(xfGOOD and )(xfBAD to be likelihood density functions of scores of good or bad clients respectively. The kernel density estimates are defined by ( )∑= −= n j jhGOOD sxK n hxf 1 ,11 1 1 ),( ~ , ( )∑= −= m i ihBAD sxK n hxf 1 ,00 0 1 ),( ~ , where       = ii h h x K h xK i 1 )( , i= 0,1 and K is some kernel function, e.g. Epanechnikov kernel. For further details see Wand and Jones (1995). The estimation of bandwidth hi can be given by maximal smoothing principal approach, see Terrel (1990) or Řezáč (2003), i.e. 12 112 1 2 3 , ~ )!32( )52()!12( + −++ ⋅⋅         + ++ = k kk kOS n k kkk h σ , where k is the order of kernel K, σ~ is an appropriate estimation of standard deviation and n is the number of observations. As the next step we need to estimate the final integral. We use the composite trapezoidal rule. Set ( )         ⋅−= ),( ~ ),( ~ ln),( ~ ),( ~ )( ~ 0 1 01 hxf hxf hxfhxfxf BAD GOOD BADGOODIV . Then, for given M+1 equidistant points L = x0,…,xM = H we obtain       ++ − = ∑ − = )( ~ )( ~ )( ~ 2 1 1 HfxfLf M LH I IV M i iIVIVval . (14) For further details see Koláček and Řezáč (2010). 3.3 Some results for normally distributed scores Assume that the scores of good and bad clients are each approximately normally distributed, i.e. we can write their densities as ( ) 2 2 2 2 1 )( g gx g GOOD exf σ µ πσ − − = , ( ) 2 2 2 2 1 )( b bx b BAD exf σ µ πσ − − = . The values of gM , bM and gS , bS can be taken as estimates of gµ , bµ and gσ , bσ . Finally we assume that standard deviations are equal to a common value σ . In practice, this assumption should be tested by F-test. The mean difference D (see Wilkie (2004)) is now defined as σ µµ bg D − = and is calculated by S MM D bg − = . (15) The maximum difference between the cumulative distributions, denoted KS before, is calculated, as proposed in Wilkie (2004), at the point where the distributions cross, halfway between the means. The value KS is therefore given by 1 2 2 22 −      Φ⋅=      − Φ−      Φ= DDD KS , (16) where ()⋅Φ is the standardized normal distribution function. We derived formula for Gini index. It can be expressed by 1 2 2 −      Φ⋅= D G . (17) For the Lift statistics is computation quite easy. Denoting )(1 ⋅Φ− the standard normal quantile function and )(2 , ⋅Φ σµ the normal distribution function with expected value µ and variance 2 σ , we have ( )       ⋅+Φ⋅Φ= − Dpq q Lift G ALL q 11 σ σ . Computational form is then ( )       ⋅+ΦΦ= − Dpq S S q Lift G ALL q 11 . (18) A couple of further interesting results are given in Wilkie (2004). One of them is that, under our assumptions on normality and equality of standard deviations, it holds 2 DIval = . (19) We derived expressions for all mentioned indexes in general case, i.e. without assumption of equality of variances. The mean difference is now in form * 2 DD = , (20) where 22 * bg bg D σσ µµ + − = . The KS is given by       ⋅+−⋅Φ−       ⋅+−⋅Φ= cbDa b D b a cbDa b D b a KS bg gb 2 1 2 1 2 2 *2* *2* σσ σσ , where 22 gba σσ += , ,22 gbb σσ −=       = b g c σ σ ln . Empirical form can be expressed by ( ) ( ) ( ) ( ) . ln2 1 ln2 1 22*22 22 * 22 22 22*22 22 * 22 22 2 2                       −⋅++⋅ ⋅ − −⋅ − + Φ−                       −⋅++⋅ ⋅ − −⋅ − + Φ= b g gbgb b gb g gb gb b g gbgb g gb b gb gb S S SSDSS S SS DS SS SS S S SSDSS S SS DS SS SS KS (21) Gini coefficient can be expressed as ( ) 12 * −Φ⋅= DG . (22) Lift is given by formula ( )( ) ( )       −+Φ⋅ Φ= =Φ⋅+Φ= − − b bALLALL ALLALLq q q q q Lift bb σ µµσ σµσµ 1 1 , 1 1 2 . When we replace theoretical means and standard deviations by their estimates we obtain ( )       −+Φ⋅ Φ= − b bALL q S MMqS q Lift 1 1 . (23) Finally, information statistics is given by ( ) 11 2* −++= ADAIval , (24) where         += 2 2 2 2 2 1 g b b g A σ σ σ σ , in computation form it is         += 2 2 2 2 2 1 g b b g S S S S A . For this index one can find similar formula in Thomas (2009). Some of these results are graphically expressed in relation to bµ , gµ and 2 bσ , 2 gσ in the following figures. In case of Figures 4 to 7 it was selected 0=bµ , 12 =bσ . There is displayed dependence of examined characteristics on gµ a 2 gσ . In case of Figure 8 it was set 0=bµ , 1=gµ and displayed value of Ival depending on the 2 bσ , 2 gσ . Right-hand side of all these figures is contour graph of appropriate graph at left-hand side. Figure 4: KS, 0=bµ , 12 =bσ Figure 5: Gini coefficient, 0=bµ , 12 =bσ Figure 6: Lift10% , 0=bµ , 12 =bσ It is evident from the figures that KS statistics and the Gini react much more to change of gµ and are almost unchanged in the direction of 2 gσ . Its theoretical maximum, i.e. 1, is approximately reached in value 4=gµ , which is the value where relevant probability densities of good and bad clients almost do not overlap and hence the perfect separation of these two groups is reached. In case of Lift10%, see Figure 7, it is evident strong dependence on gµ . This time, however, the value of this index is significantly affected by 2 gσ . Figure 7: Ival , 0=bµ , 12 =bσ Figure 8: Ival 0=bµ , 1=gµ Information statistics is again much more responsive to change of gµ than to change of 2 gσ , see Figure 7. But there is one significant difference. If it is 22 bg σσ < , i.e. 12