Logistic regression Session 3 Oleg Deev & Stefan Lyócsa Masaryk University ★ * ★ •ČFINTECH RISK MANAGEMENT ^ www.flntedt-ho2020«u Oleg Deev & Štefan Lyócsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 Assume that you are taking a 10 question test. Each question has 5 possible answers, you have to choose only one. Only one is correct. You do not know the answer to any of those questions, and you actually do not know anything about the topic of the test (say quantum field theory). How probable it is to have at least 6 correct answers (and to pass the test)? Make a guess... Oleg Deev & Stefan Lyöcsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 • Probability of a correct answer is p = 0.20. • Answering questions is a Bernoulli trial, 1 if correct, 0 if not correct. • You have n = 10 trials. • We are interested in x > 6 correct answers. The probability to have at least 6 correct answers can be solved using the Binomial probability distribution formula: P(X >x) = J2l°=Q CM1 -P)(n-x) = 0.00637 If you learn and increase the probability to p = 0.5, the overall probability of success increases to 0.37695. Oleg Deev & Stefan Lyöcsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 We would like to model success (1) and failure (0) variable. • What determines loan defaults? • What influences consumers to buy a product? Define a variable DEFi to be 1 if the interest earned is negative, partially or fully defaulted loan, and 0 otherwise. • Is it true, that the higher the annualized interest rate on the loan, the higher the probability that it defaults? Why not to use simple linear regression? Let's see... DEFi = A) + PilriU + Ui Oleg Deev & Stefan Lyöcsa FinTech Events , . J.. OLS approach Logistic regression Logistic regression approach Case study 7 DEFi = fio + Pilnti + Ui The estimated coefficients are: DEFi = -0.074 + 0.0078 x InU + iii Oleg Deev & Stefan Lyocsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 Instead of modeling probability pit we could model the odds. ODDSi = n , This solves the ceiling, as ODDSi > 0, but the lower bound is at 0. We could take the log (natural logarithm): log(ODDSi) = log{j^^) = logit(pi) logit(pi) = Po + Pi x Inti We could get back to the probability as: e/30+/3i Xlnti Pi l^_e(30-\-(31xInti Oleg Deev & Stefan Lyöcsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 Coefficients are estimated using maximum likelihood. Each loan is a Bernoulli trial, with 1 if the loan defaulted and 0 if not. The probability of observing a default can be modeled as: min -> LL(/30, A) = Yli=i Vilo9{Pi) + (1 - Vi)log(l - p{) Estimating the model leads to the following coefficients: Mi^y = -4-35 + 0.085 x Inti A unit increase in annualized interest rate, increases the log odds by 0.085. Perhaps, it could be easier to transform the result to probabilities... Oleg Deev & Stefan Lyöcsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 log{^-) = -4.35 + 0.085 x InU A unit increase in annualized interest rate, increases the log odds by 0.085. Perhaps, it could be easier to transform the result to probabilities. e-4.35+0.085x/nti Pi = i_^e-4.35+0.085 xlnti Obviously, the effect of Interest rate depends on the level of Interest rate. It is non-linear. Oleg Deev & Stefan Lyöcsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 e-4.35+0.085x/nti Pi = i_^e-4.35+0.085 xlnti Annualized interest rate [%' Oleg Deev & Stefan Lyöcsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 How the effect of interest rate changes with the level of the interest rate? o CM O Ö
0.5)*1) • TBL = table(predicted,true) • TBL Oleg Deev & Stefan Lyöcsa FinTech Logistic regression Events OLS approach Logistic regressi ase study • What is the accuracy of our predictions? • sum(diag(TBL))/100 • What is the sensitivity of our predictions? • TBL[1,1]/sum(TBL[1,]) • What is the specificity of our predictions? • TBL[2,2]/sum(TBL[,2]) We have high overall accuracy, but small specificity. Note that we classify a loan to be defaulted if p < 0.5. What if 0.5 is not correct? Oleg Deev & Štefan Lyócsa FinTech Events Logistic regression OLS approach Logistic regression approach Case study 7 Let's take a look how our evaluation changes, if we change our threshold variable. c o "in _^ CO > O O O O CO O O o OJ o —\ ■"..J ■ ■■■ ■' ■■■■■■■■■■ • Accuracy ■ • Sensitivity • Specificity -0.2 0.0 0.2 0.4 o.e 0.8 1.0 Oleg Deev & Stefan Lyöcsa FinTech Events , . J.. OLS approach Logistic regression Logistic regression approach Case study 7 Session 3 Oleg Deev & Stefan Lyócsa Masaryk University ★ * ★ •ČFINTECH RISK MANAGEMENT 9 www.flntedi-ho2020«u Oleg Deev & Štefan Lyócsa FinTech