Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Session 2
Oleg Deev & Stefan Lyócsa
Masaryk University
★ * ★
•ČFINTECH
RISK
MANAGEMENT
9 www.flntedi-ho2020«u
Oleg Deev & Štefan Lyócsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
The principle
Assume that we have the following observations availa
					*
			*		
	* * *	• * * *	* *	* * *	•
	•  * • *				
* • *					
	*				
in
o
in
o
in
1.0
1.5
2.0
2.5
3.0
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Assume that we know the true values (not contaminated by noise), are at the red line:
X
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
You want to estimate the relationship between x and y. Using the estimated model, you would like to make predictions into future.
A common strategy is to split the sample first into two parts:
• Testing sample - allow the model to learn.
• Validation sample - test the out-of-sample performance.
Different splitting strategies are possible. This is a basic one.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Both samples visualized:
X
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Using data from the testing sample, let's fit a linear line model:
Vi.test ~ A) + f^lxi,test + ui,test
The estimated coefficients are:
Vilest ~ 1-37 + 0.65x^es£ + ^i,test
We know that the model is ill specified, no way a line is going to fit these data very well. But for prediction purposes, it might a good-enough approximation to the reality.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
This is how the line looks like:
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Using data from the testing sample, let's fit a polynomial model:
Yi,test = A) + X^=l Pp^i,test + ui,test
Yitest = @0+ (3lXijest+ fi2X?jest+ (33Xfjest^
^i,test
The estimated coefficients are:
Yhtest = -24.33 + 7b.b9Xhtest - 85.93X?test + 48.28*?^ -13.28X*test — lAlXftest + Uijest
This polynomial is going to fit the data much better.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
This is how the curve looks like:
X
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Model comparison
We can compare which model fits the data better, e.g. R2. Instead we calculate a related measure, teh mean square error for th( first model:
MS Ex = Nt~lt Y,i^test - Yitl)2 = 0.06676
The smaller the value, the better the fit. Now for the second model
MSE2 = Nt~lt £,(yMesi - Yh2)2 = 0.05401
The second model has better fit, by approx. 19%!
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Model comparison
The first model fit the data poorly. It is linear. The data are curved. It is a biased model. The second model fits the data better. The higher the order of the polynomial, the better the fit and lower the bias (in-sample).
Is the model with better fit on the testing sample going to be better in the validation sample?
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Model comparison
Using coefficients from model 1 and model 2, and given new x from validation sample, we can predict y. Next we compare which model forecasts better using the MSE, but now we use predicted values. This is called the Mean Forecasted Squared Error:
MSFE1 = N^lldatlon Ei^v^on ~ Yhlf = 0.0802 Now for the second model:
MSFE2 = N~^idation ^invalidation ~ = 0.2357
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
• The forecasts are less accurate on the validation sample.
• The linear model (although biased) performs much better.
Why? The polynomial model is over-fitting the data, e.g. fits too well on the expense of parameters. Parameters are not estimated with certainty - they suffer from variance. This leads to an increase in the variance of the predictions.
		•
True values		
• Line		
• Curve		•
* •	• • iM1	• *• • *
• ••		
n i i i r
1.0 1.5 2.0 2.5 3.0
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
The goal of the Machine learning is to find an optimum between model bias and the variance of predictions. Many strategies, two standard ones:
• Regularization (Ridge regression, Lasso, Elastic net).
• Boosting (Regression trees, Random forest,...).
One strategy is to allow small bias (e.g. less parameters in the model) while lowering the variance. The accuracy of predictions might improve.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
The common theme is to sacrifice in-sample fit in hope for a better out-of-sample prediction. Recall a multiple linear regression model:
Yi = A) + PlX^i + /?2^Q,2 + ... + fipXi^ + Ui
Using OLS, parameters of interest are estimated by minimizing the sum of squared residuals:
n n ^
min ^J2^i= J2(Yi ~ A) - PiXi,! - ... - /3pXiiP)2 In short:
min ^J2(Yi- Yi)
/3o,.../3p i=l
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
OLS approach:
min ^J2(Yi- Yi)
f3o,...f3p i=l
Ridge regression:
mm ^E(^-^)2 + AE/5l
f3o,...f3p       i=l j=l
• A > 0,
• X are standardized (0 mean, 1 variance),
• Y is centered around 0.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Ridge regression:
f3o,...f3p       i=l j=l
The higher the A the lower the (3 coefficients, i.e. stronger the penalty.
Why might Ridge regression actually work? The higher the A, the less sensitive is Y, the dependent variable, to the changes in the Xj explanatory variable(s). The Ridge regression model is more 'robust' to changes in explanatory variables.
How to find A? Standard approach is to use 10-fold cross-validation technique. See the next Case study.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
What factors drive the rate of return on a loan? We use the same model as in the Case study 3. Now, instead of OLS, we estimate it via penalized 'Ridge' estimator.
• Can Ridge out-perform (out-of-sample) the OLS model?
RR2i = f30 + Pinewi + ^verZi + ... + f3pnrodepi + ui
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
O Split the sample into two. Leave last 100 observations for out-of-sample (validation).
O Estimate OLS and calculate MSFE using the out-of-sample data.
O Perform k — fold cross-validation to estimate A for the Ridge regression models.
O Calculate MSFE using the out-of-sample data.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression				
Sai	mple	spli	t	
• NF = 100
• N = dim(DT) [1]
• tst = DT[1:(N-NF)J
• val = DT[((N-NF)+1) :Nj
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression								
	k-fol	Id	1 C	ross val	id	la	tion	
We need to prepare the data for the glmnet functions. See the codes....
• CV = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0)
• plot(CV)
• CV$lambda.min
• CV$lambda.lse
• round(cbind(coefficients(m7),coef(CV,s=;lambda.min;), coef(CV,s=;lambda.lse;)),4)
Oleg Deev & Stefan Lyöcsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression										
(	n u	II	LS esti	m	a	ti	ion ai	id	1 prediction	
• m7 = lm(RR2 ^ new+ver3+ver4+lfi+lee+luk+lrs+lsk+age+undG female+lamt+int+durm+educprim+educbasic+educvocat+ educsec+msmar+msco+mssi+msdi+nrodep+espem+esfue+ essem+esent+esret+dures+exper+linctot+noliab+ lliatot+norl: lamteprl+nopearlyrep,data=tst)
• yOLS = predict(m7,new=val)
• ytrue = val[,MRR2M]
• MSEOLS = mean((yOLS-ytrue)2)
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
How A (actually log(X)) and MSE are related. Increasing penalization is very expensive as it increases MSE considerably.
39  39  39  39  39  39  39  39  39  39  39  39 39
i-1-1-r
2 4 6 8
log(Lambda)
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
ation
We need to prepare the data for the glmnet functions. See the codes....
• yRIDGEmin=predict(CV,newx=pred,s=CV$lambda.min)
• MSER1 = mean((ytrue-yRIDGEmin)2)
• yRIDGElse=predict(CV,newx=pred,s=CV$lambda.lse)
• MSER2 = mean((ytrue-yRIDGElse)2)
• cbind(MSEOLS, MSER1, MSER2)
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
OLS approach:
min      J2(Yi ~ Yi)
f3o,...f3p i=l
Ridge regression:
n - „ p
min ^£(^-^)2 + A£/3j
$o,.../3p       i=l j=l
Least Absolute Shrinkage and Selection Operator (LASSO):
n p
min ^E(^-^)2 + AEI^I
f3o,...f3p       i=l j=l
As before:
• A > 0,
• X are standardized (0 mean, 1 variance),
• Y is centered around 0.
Oleg Deev & Stefan Lyocsa FinTech
Least Absolute Shrinkage and Selection Operator (LASSO):
As with Ridge, the higher the A, the lower the ft coefficients, i.e. stronger the penalty.
With LASSO, coefficients might be reduced to 0. This is useful as LASSO reduces the model complexity, which in turn is known to be helpful for forecasting purposes.
n
P
Oleg Deev & Stefan Lyöcsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Which to use? LASSO or Ridge?
• Ridge is useful when many variables are supposed to be useful (they might be highly correlated as well).
• LASSO is useful when only few variables are useful. Why not to select only useful variables and run OLS?
Oleg Deev & Stefan Lyöcsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
What factors drive the rate of return on a loan? We use the same model as in the Case study 3 and 4. Now, instead of OLS and Ridge, we estimate it via penalized 'LASSO' estimator.
9 Can LASSO out-perform (out-of-sample) the OLS and Ridge model?
RR2i = f30 + Pinewi + ^verZi + ... + f3pnrodepi + ui
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
O Split the sample into two. Leave last 100 observations for out-of-sample (validation).
O Estimate OLS and calculate MSFE using the out-of-sample data.
O Perform k — fold cross-validation to estimate A for the Ridge regression models.
O Calculate MSFE using the out-of-sample data.
O Perform k — fold cross-validation to estimate A for the LASSO regression models.
O Calculate MSFE using the out-of-sample data.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression								
	k-fol	Id	1 C	ross val	id	la	tion	
We need to prepare the data for the glmnet functions. See the codes....
• CV = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=l)
• plot(CV)
• CV$lambda.min
• CV$lambda.lse
• round(cbind(coefficients(m7),coef(CV,s=;lambda.min;), coef(CV,s=;lambda.lse;)),4)
Oleg Deev & Stefan Lyöcsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
How A (actually log(X)) and MSE are related.
38  37  36  34  31   29  26  21   15 6   4   2   2 2
T
-4-3-2-10 1 2
log(Lambda)
Oleg Deev & Stefan Lyöcsa
FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
ation
We need to prepare the data for the glmnet functions. See the code in Case study 3. Next, we can run the predictions:
• yLASSOmin=predict(CV,newx=pred,s=CV$lambda.min)
• MSEL1 = mean((ytrue-yLASS0min)2)
• yLASS01se=predict(CV,newx=pred,s=CV$lambda.lse)
• MSEL2 = mean((ytrue-yLASS01se)2)
• cbind(MSEOLS, MSER1, MSER2, MSEL1, MSEL2)
Oleg Deev & Stefan Lyöcsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
OLS approach:
min ^J2(Yi- Yi)
f3o,...f3p i=l
Ridge regression:
f3o,...f3p       i=l j=l
Least Absolute Shrinkage and Selection Operator (LASSO):
n ^ P
min ^£(^-^)2 + AEI&l
(3o,...(3p       i=l j=l
Elastic net:
n P P
mm -+ ± E(^-^)2 + A(^E^ + «E 1/5,1)
$0,—&p 1=1 j = l j = l
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Elastic net:
n P P
mm -+ ± E(^-^)2 + A(^E/? + «£ l&D
$0,—&p 1=1 j = l j = l
It gives a combined penalization of Ridge and LASSO. The new parameter a shows which of the two penalization forms gets higher weight.
o If a = 1 it is a LASSO model.
• If a = 0 it is a Ridge model.
• With 0 < a < 1, we have the Elastic net.
As before, the optimal a and A is determined via cross-validation.
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
What factors drive the rate of return on a loan? We use the same model as in the Case study 3, 4 and 5. Now, instead of OLS, Ridge, LASSO we estimate it via 'Elastic net' estimator.
• Can Elastic net out-perform (out-of-sample) the OLS, Ridge, LASSO model?
RR2i = f30 + Pinewi + ^verZi + ... + f3pnrodepi + ui
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Force = 0.25 • CV = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0
• yNET025min=predict(CV,newx=pred,s=CV$lambda.min)
• MSEEN1.1 = mean((ytrue-yNET025min)2)
• yNET0251se=predict(CV,newx=pred,s=CV$lambda.lse)
• MSEEN1.2 = mean((ytrue-yNET0251se)2)
Oleg Deev & Štefan Lyócsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Force = 0.50 • CV = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0
• yNET050min=predict(CV,newx=pred,s=CV$lambda.min)
• MSEEN2.1 =mean((ytrue-yNET050min)2)
• yNET0501se=predict(CV,newx=pred,s=CV$lambda.lse)
• MSEEN2.2 =mean((ytrue-yNET0501se)2)
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
Force = 0.75 • CV = cv.glmnet(x=indep,y=dep,nfolds=30,alpha=0
• yNET075min=predict(CV,newx=pred,s=CV$lambda.min)
• MSEEN3.1 = mean((ytrue-yNET075min)2)
• yNET0751se=predict(CV,newx=pred,s=CV$lambda.lse)
• MSEEN3.2 = mean((ytrue-yNET0751se)2)
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning	
Ridge regression	
Lasso regression	
Elastic-net regression	
We can compare results:
MSEs	
EN75_1	868.86
LASSOJ	871.52
EN25_1	874.82
EN50_1	876.54
RidgeJ	929.69
RidgeM	968.78
EN75_M	976.86
EN25_M	976.97
EN50_M	977.07
LASSOM	977.57
OLS	995.62
Oleg Deev & Stefan Lyocsa FinTech
Introduction to Machine Learning Ridge regression Lasso regression Elastic-net regression
Session 2
Oleg Deev & Stefan Lyócsa
Masaryk University
★ * ★
•ČFINTECH
RISK
MANAGEMENT
9 www.flntedi-ho2020«u
Oleg Deev & Štefan Lyócsa FinTech