Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Use Case 2 Stefan Lyócsa Masaryk University 7) and gij = 0 otherwise where $ is the cumulative distribution function of the normal distribution. We follow the previous literature and choose 7 = 0.10 and fl = -i(_L_). 9i. 3 = 1 if = $ e + {FFT)hJ Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Create network in R Import data again (but do not delete anything from previous Use Case). We arbitrarily select variables that we think might identify a bad loan: • X = DT[,c(;int;,;durm;,;linctot;,;noliab;)] Run the following function: • AM = FN_SVD(X,p=0.75,gam=0.10) • g = graph.from_adjacencyjiiatrix(AM, mode = ;undirected; , weighted = TRUE) We can visualize the Network Factor Model: • plot(g, graph = ;NFM;, vertex.label=NA,vertex.size = 3, main = ;Network factor model of the P2P applicants networks;) Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Create the network in R Network factor model of the P2P applicants networks Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes • vertex degree, • harmonic centrality, • Community detection - Louvain method. To address the issue of isolated vertices, one can assume that the shortest distance between vertex i and an isolated vertex j is oc, while conveniently assuming that 1/oc = 0. Harmonic centrality is therefore: where d(i,j) is the shortest path from vertex i to vertex j in the network. Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Estimating vertex level attributes in R The following function calculates centrality and community: • NetDscr=BVC(g) Now add variable into the model: • DT$Deg = NetDscr$VCentrality[,l] • DT$Hac = NetDscr$VCentrality[,2] • DT = data.frajne(DT,NetDscr$Community) Stefan Lyöcsa FinTech Preparing data Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Define the matrix of input and output variables: • indep = as.matrix(DT[1:(N-NF),c(;new;,;ver3;,;ver4;,;lfi; ;undG;,;female;,;lamt;,;int;,;durm;,;educprim;,;educbasic;. ;educvocat;,;educsec;,;msmar;,;msco;,;mssi;,;msdi;,;nrodep: ;espem;,;esfue;,;essem;,;esent;,;esret;,;dures;,;exper;, ;linctot;,;noliab;,;lliatot;,;norli;,;noplo;,;lamountplo;, ;lamntplr;,;lamteprl;,;nopearlyrep;,;Deg;,;Hac;,paste(;g;J • dep = DT[1:(N-NF),;RR2;] Stefan Lyöcsa FinTech Preparing data Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes • pred = as.matrix(DT[(N-NF+1):N,c(;new;,;ver3;,;ver4;,;lfj ;undG;,;female;,;lamt;,;int;,;durm;,;educprim;,;educbasic;. ;educvocat;,;educsec;,;msmar;,;msco;,;mssi;,;msdi;,;nrodep: ;espem;,;esfue;,;essem;,;esent;,;esret;,;dures;,;exper;, ;linctot;,;noliab;,;lliatot;,;norli;,;noplo;,;lamountplo;, ;lamntplr;,;lamteprl;,;nopearlyrep;,;Deg;,;Hac;,paste(;g;J • ytrue = DT[(N-NF+1):N,;RR2;] Stefan Lyöcsa FinTech LASSO model Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Model estimation: • m3_L = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=l) • coef (m3_L,s=;lambda.lse;) Forecast loan returns: • yhat = predict (m3_L,newx=pred,s=m3_L$lambda.lse) Calculate mean squared error: • LASSCLFN = mean((yhat-ytrue)2) Stefan Lyöcsa FinTech RIDGE model Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Model estimation: • m3_R = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0) • coef (m3_R,s=;lambda.lse;) Forecast loan returns: • yhat = predict (m3_R,newx=pred,s=m3_R$lambda.lse) Calculate mean squared error: • RIDGE_FN = mean((yhat-ytrue)2) Stefan Lyöcsa FinTech Elastic net model Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes • m3_E25 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.25) • m3_E50 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.50) • m3_E75 = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0.75) • yhat = predict(m3_E25 ,newx=pred,s=m3_E25$lambda.lse) • yhat = predict(m3_E50 ,newx=pred,s=m3_E50$lambda.lse) • yhat = predict(m3_E75 ,newx=pred,s=m3_E75$lambda.lse) • EN25FN = mean((yhat-ytrue)2) • EN50FN = mean((yhat-ytrue)2) • EN75FN = mean((yhat-ytrue)2) Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Comparing forecast accuracy Is network approach worth the strugle? • MSEs = c(0LS, LASSO, RIDGE, EN25, EN50, EN75, LASSCLN, RIDGE_N, EN25N, EN50N, EN75N, LASSCLFN, RIDGE_FN, EN25FN, EN50FN, EN75FN) • names(MSEs) =c('0LS', 'LASSO', 'RIDGE', 'EN25', 'EN50', 'EN75', 'LASS0_N' , 'RIDGE_N' , 'EN25N', 'EN50N', 'EN75N', 'LASS0_FN','RIDGE_FN' , 'EN25FN', 'EN50FN', 'EN75FN') • MSEs = sort(MSEs) • cbind(MSEs) Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Comparing forecast accuracy the factor network approach worth the struggle? MSEs EN75FN 848.6278 EN50FN 849.5642 RIDGE_FN 850.8609 LA550_N 856.3854 EN25N 858.0379 LASSO_FN 859.8096 RIDGE_N 860.3076 EN50N 862.0838 EN75N 864.3380 EN25FN 864.3384 EN50 870.3685 LASSO 871.5219 EN75 874.5317 EN25 874.8249 RIDGE 929.6943 OLS 995.6180 Stefan Lyöcsa FinTech Decomposing Data Matrix into Factors Factor Network based segmentation Vertex-level attributes Use Case 2 Stefan Lyócsa Masaryk University