Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes _Forecasting Use Case 1 Oleg Deev & Stefan Lyócsa Masaryk University *C FINTECH MANAGEMENT Oleg Deev & Štefan Lyócsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes _Forecasting Predict profitability of loans - 'building credit scoring model' • Better credit models — > less defaults — > higher financial stability. • Innovative financial services use (require) new approaches. 'Starting September 8, 2017, we are rolling out the fifth-generation credit model, which further leverages machine learning and 10 years of LendingClub data to better assess and price credit risk.' https: //blog. lend ingclub.com/lendingclubs-next-generation-cred it-model / Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting We have data on 4157 loans: loan amount, loan duration, . We model profitability using different models: o OLS (benchmark model) o RIDGE model • LASSO model • Elastic Net model Using data from 4057 loans we build a model and forecast profitability of the last 100 loans. We evaluate models based on the mean squared error. Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes _Forecasting • We have data on 4157 loans: loan amount, loan duration, Distribution of returns 'Si c Q O o CM o o o o nTffWfUTnJkjn rvn .m-juUk-nfl--rmn r T -100 -30 -60 -40 -20 20 40 Return Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes _Forecasting • We model profitability using different models: • OLS (benchmark model), a — 0, A = 0 • RIDGE model, a = 0.0 • LASSO model, a = 1.0 • Elastic Net model, 0.0 < a < 1.0 Ti p p mm ^)2 + A(i=2 £ % + a £ |&|) $0,...$p 1=1 j = l j = l Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes Forecasting • Using data from 4057 loans we build a model and forecast profitability of the last 100 loans. Loans in the past Loans to predict Loans over time Oleg Deev & Stefan Lyocsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting • We evaluate models based on the mean squared error. MSE = n-'YTUy-y)2 Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes _Forecasting • The idea is to extract new information from data, using network analysis, and include them in credit models. • Network analysis meets econometric modeling We hope, that new models augmented with network variables will improve credit models, i.e. the prediction of loan's profitability. To be able to compare models, we need to re-estimate models from previous session. Oleg Deev & Stefan Lyöcsa FinTech Opening the file UseCasel in RStudio. Importing We split the sample: • NF = 100 • N = dim(DT) [1] • SI = DT[1:(N-NF)J • S2 = DT[(N-NF+1):N,] Now we estimate models:... Oleg Deev & Stefan Lyocsa FinTech Estimate model: • ml = lm(RR2 new+ver3+ver4+lfi+lee+luk+lrs+lsk+age+undG+ female+lamt+int+durm+educprim+educbasic+ educvocat+educsecn espem+esfue+essem+esent+esret+dures+exper+ linctot+noliab+] lamntplr+lamteprl+nopearlyrep,data=Sl) Forecast loan returns: • yhat = predict(ml,new=S2) Calculate means squared error: • ytrue = S2$RR2 • OLS = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyocsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes ■1 el - estimation Load program library: • library(glmnet) Define the matrix of input and output variables: • indep = as.matrix(SI[,c(;new;,;ver3;,;ver4;,;lfi;,;lee;,; ;undG;,;female;,;lamt;,;int;,;durm;,;educprim;,;educbasic;. ;educvocat;,;educsec;,;msmar;,;msco;,;mssi;,;msdi;,;nrodep: ;espem;,;esfue;,;essem;,;esent;,;esret;,;dures;,;exper;, ;linctot;,;noliab;,;lliatot;,;norli;,;noplo;,;lamountplo;, ;lamntplr;,;lamteprl;,;nopearlyrep;) ]) • dep = S1$RR2 Estimate model: • m2 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=l) Oleg Deev & Stefan Lyöcsa FinTech Define the matrix of input variables from predicted loans: • pred = as.matrix(S2 [,c(;new;,;ver3;,;ver4;,;lfi;,;lee;,;] ;undG;,;female;,;lamt;,;int;,;durm;,;educprim;,;educbasic;. ;educvocat;,;educsec;,;msmar;,;msco;,;mssi;,;msdi;,;nrodep: ;espem;,;esfue;,;essem;,;esent;,;esret;,;dures;,;exper;, ;linctot;,;noliab;,;lliatot;,;norli;,;noplo;,;lamountplo;, ;lamntplr;,;lamteprl;,;nopearlyrep;) ]) Forecast loan returns: • yhat = predict(m2,newx=pred,s=m2$lambda.lse) Calculate mean squared error: • ytrue = S2$RR2 • LASSO = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyocsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting RIDGE model Estimate model: • m3 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0) Forecast loan returns: • yhat = predict(m3,newx=pred,s=m3$lambda.lse) Calculate mean squared error: • ytrue = S2$RR2 • RIDGE = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyöcsa FinTech Estimate model: • m4_25 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.25) • m4_50 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.50) • m4_75 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.75) Forecast loan returns: • yhat = predict(m4_25 ,newx=pred,s=m4_25$lambda.lse) • yhat = predict (m4_50 ,newx=pred, s=m4_50$lambda. lse) • yhat = predict(m4_75 ,newx=pred,s=m4_75$lambda.lse) Calculate mean squared error: • EN25 = mean((yhat-ytrue)2) • EN50 = mean((yhat-ytrue)2) • EN75 = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyocsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes Forecasting MSEs EN50 870.3685 LASSO 871.5219 EN75 874.5317 EN25 874.8249 RIDGE 929.6943 OLS 995.6180 Oleg Deev & Stefan Lyöcsa FinTech How similar are loans? • Perhaps, more risky loans share multiple features (e.g. higher loan, longer duration, past loans,...) • How to compare loans across many characteristics? We use a distance metric: • Let Xi be a column vector of ith loan attributes. • Let A be a diagonal matrix with standard deviation of variables on the diagonal. Distance metric: Dissimilarity matrix is defined as: dij e D Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting Dissimilarity matrix is symmetric. We (expert opinion) select variables that we think should be indicative of a bad loan: • DMV = DT[,c(;int;,;durm;,;linctot;,;noliab;)] Using Euclidean distance metric - we create the dissimilarity matrix D. • DM = as.matrix(dist(scale(DMV))) 12 3 4 5 1 0.000000 2.676694 1.074275 3.190169 1.236050 2 2.676694 0.000000 2.605000 5.334466 3.636566 3 1.074275 2.605000 0.000000 3.322493 1.264062 4 3.190169 5.334466 3.322493 0.000000 2.590660 5 1.236050 3.636566 1.264062 2.590660 0.000000 Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting • Dissimilarity matrix is an adjacency matrix with weights. Larger distance between loan i and loan j, the less similar are loans i and j, i.e. the edge between the two loans is longer. • Dissimilarity matrix leads to a complete graph. Everybody is connected to everybody. Way too complex. A common strategy is to select a 'suitable' sub-graph. From complete graph we can extract the Minimum Spanning Tree. Minimum spanning tree is a subset of a connected graph which satisfies the following: • connects all vertices, • there are no cycles, • with a minimum possible total distance. Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting Click here for the Animated Kruskal algorithm en.wikipedia.org see Kruskal Algorithm Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting • library(igraph) We create a graph object in R: • g = graph.from_adjacencyjiiatrix(DM, mode = undirected", weighted = TRUE) We create a minimum spanning tree: • g_mst = mst(g) We visualize the 'MST: • status = (S1$RR2<0)*1 • V(g_mst)$status = status • V(g_mst)[status == l]$color = "firebrickl" • V(g_mst)[status == 0]$color = "lightgreen" • plot(g_mst, graph = "MST",vertex.label=NA,vertex.size = 3,main = "MST of the P2P applicants networks") Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting MST of the P2P applicants networks Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting • vertex degree • vertex strength o closeness, c(x) = • betweenness, b(x) = Es^/t as,t • Community detection - Louvain method Louvain method creates communities while increasing the modu larity of the network: Si Sj 2m S(ci,c 3 Here, 2m is the sum of all edge weights in the graph, Si is the sum of all weights of vertex z, and Cj) is a delta function such that it returns 1 if q = Cj and 0 otherwise (i.e. if the two vertices belong to the same community it returns 1. Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting Estimating vertex level attributes in R Degree centrality: • DT$Deg = degree(gjmst) Vertex strength: • DT$Str = strength(gjnst) Closeness centrality: • DT$Clos = closeness(gjnst)*104 Betweenness centrality: • DT$Bet = betweenness(gjnst) Community detection via Louvain method: • com = cluster_louvain(g_mst) • length(unique(com$membership)) • CD=dummy(com$membership) • DT = data.frame(DT,CD) Oleg Deev & Stefan Lyöcsa FinTech Define the matrix of input and output variables: • indep = as.matrix(SI[,c(;new;,;ver3;,;ver4;,;lfi;,;lee;,; ;undG;,;female;,;lamt;,;int;,;durm;,;educprim;,;educbasic;. ;educvocat;,;educsec;,;msmar;,;msco;,;mssi;,;msdi;,;nrodep: ;espem;,;esfue;,;essem;,;esent;,;esret;,;dures;,;exper;, ;linctot;,;noliab;,;lliatot;,;norli;,;noplo;,;lamountplo;, ;lamntplr;,;lamteprl;,;nopearlyrep;,;Deg;,;Str;,;Clos;,;Be1 paste('membership;,1:124,sep=;;))]) We have additional 124 communities - clusters! Suitable for shrinkage methods. • dep = S1$RR2 Oleg Deev & Stefan Lyocsa FinTech • pred = as.matrix(S2 [,c(;new;,;ver3;,;ver4;,;lfi;,;lee;,;] ;undG;,;female;,;lamt;,;int;,;durm;,;educprim;,;educbasic;. ;educvocat;,;educsec;,;msmar;,;msco;,;mssi;,;msdi;,;nrodep: ;espem;,;esfue;,;essem;,;esent;,;esret;,;dures;,;exper;, ;linctot;,;noliab;,;lliatot;,;norli;,;noplo;,;lamountplo;, ;lamntplr;,;lamteprl;,;nopearlyrep;,;Deg;,;Str;,;Clos;,;Be1 paste('membership;,1:124,sep=;;))]) • ytrue = S2$RR2 Oleg Deev & Stefan Lyocsa FinTech Model estimation: • m5_L = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=l) • coef (m5_L,s=;lambda.lse;) Forecast loan returns: • yhat = predict (m5_L,newx=pred,s=m5_L$lambda.lse) Calculate Mean squared error: • LASSCLN = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyocsa FinTech Model estimation: • m5_R = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0) • coef (m5_R,s=;lambda.lse;) Forecast loan returns: • yhat = predict (m5_R,newx=pred,s=m5_R$lambda.lse) Calculate Mean squared error: • RIDGE_N = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyocsa FinTech • m5_E25 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.25) • m5_E50 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.50) • m5_E75 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.75) • yhat = predict(m5_E25 ,newx=pred,s=m5_E25$lambda.lse) • yhat = predict(m5_E50 ,newx=pred,s=m5_E50$lambda.lse) • yhat = predict(m5_E75 ,newx=pred,s=m5_E75$lambda.lse) • EN25N = mean((yhat-ytrue)2) • EN50N = mean((yhat-ytrue)2) • EN75N = mean((yhat-ytrue)2) Oleg Deev & Stefan Lyocsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes Forecasting Comparing forecast accuracy Is network approach worth the strugle? • MSEs = c(0LS, LASSO, RIDGE, EN25, EN50, EN75, LASSCLN, RIDGE_N, EN25N, EN50N, EN75N) • names(MSEs) =c('0LS', 'LASSO', 'RIDGE', 'EN25', 'EN50', 'EN75', 'LASS0_N' , 'RIDGE_N' , 'EN25N', 'EN50N', 'EN75N') • MSEs = sort(MSEs) • cbind(MSEs) Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-1 eve I attributes Forecasting Comparing forecast accuracy Is the network approach worth the struggle? MSEs LASSO_N 856.3854 EN25N 858.0379 RIDGE_N 860.3076 EN50N 862.0838 EN75N 864.3380 EN50 870.3685 LASSO 871.5219 EN75 874.5317 EN25 874.8249 RIDGE 929.6943 OLS 995.6180 Oleg Deev & Stefan Lyöcsa FinTech Review Dissimilarity matrix Similarity network - MST Vertex-level attributes Forecasting Use Case 1 Oleg Deev & Stefan Lyócsa Masaryk University *C FINTECH MANAGEMENT Oleg Deev & Štefan Lyócsa FinTech