0.1 Introduction to Hedging

Human life as well as all human activities are inextricably associated with an element of
uncertainty. The assumption of uncertainty implies the concept of risk that is part of our social
system in various forms of understanding. This risk is generally related to some negative meaning.
In the field of finance it is most associated with some diversion from an expected state, more
precisely the risk is embodied in a deviation of expected return. It is not necessary that the
divergence from the expected state must be solely a financial loss. The uncertainty could be
demonstrated equally by a positive deviation with higher profit than expected. So it is not only
limited to financial suffering. Although, this could be a major threat to market attendance.

Risk is commonly divided into two components in the financial system. One part of financial risk is
called unique risk, also called unsystematic, undiversified, residual or idiosyncratic risk
(Beja:1972). The concept of financial risk is applied when financial sources are placed in the
financial market no matter whether in the form of investment or speculation. In theory, but in
practical application as well, it is common to work with the multiple assets concept. A reduction
of residual risk is feasible by applying an appropriate algorithm for assets allocation
(diversification). Apparently, a complete elimination of this risk could be achieved. Elton

The second part of the overall financial risk is referred to as systematic risk. This category of
risk is also called non-diversifiable risk. All actors in the financial market face systematic
risk, because every asset is exposed to market risk. Frenkel:2005 Its existence can only be
accepted and not removed. What applies to financial markets can be applied without reservations to
the commodities market as well. As noted by Garner: "Producers and users of commodities are
constantly faced with price and production risk due to an unlimited number of unpredictable factors
including weather, currency exchange rates, and economic cycles." Garner:2010 The embodiment of
risk is actually uncertainty arising due to the nature of the markets. It is inherent in the entire
financial market. However, that does not mean that active trading subjects on the market could not
influence the impact of systematic risk. Hedging is a financial operation that aims to reduce the
impact of non-diversifiable risk. Collins:1999 On the other hand, some authors pointed out that
only part of systematic risk could be eliminated. Kolb:2014 It is actually a closure of positions
hold in assets. Growth or decline in the price of one asset is offset by the opposite movement of
hedging assets price.

A long and short hedge position could be distinguished from the perspective of trade. The
difference depends on whether the asset exposed to systematic risk intends to be bought, or the
asset is already owned and will be sold later on. If it is the right purpose, then an opposite
operation, i.e. the selling of hedging assets on the financial market, must be done. Therefore,
such an operation is called short hedging. Rutledge Other kinds of situations in trade relations
constitute such activities where the short sell will be realized.

^1

Short sell means that an assets is borrowed and currently sold on the market and will be returned
in any time in the future. Linnertová

Short sell is executed on behalf of a securities loan. A similar strategy will be applied when a
certain asset should be bought in the future, for instance an essential asset for an
entrepreneurial activity. Thus, in order to guard against an eventual rise in the price of intended
asset a sell of high correlated asset to the price of considered asset is required. The rising
price of the asset bought at present will offset an undesirable price growth of the asset purchased
in futures. The above strategy is obviously long hedging. Bessembinder:1992

Derivatives are widely used for hedging purposes. Bingham:2013 Thus, the underlying of derivative
is the object of hedging. Poitras 2002 The class of standard derivatives like forward, future, and
swap were commonly used. Hull 2006 The development of financial engineering has led to the use of
more sophisticated and complex instruments in recent years. Avellaneda:1995

^2

Exotic options, synthetic derivatives etc. Chance2015

However, a diversion from their primary function could be observed in the last two decades.
Bartram2009 A growing interest in these instruments can paradoxically lead to the increase of
market risk, since a large part of the transaction volume is driven by speculative intention.
Stiglitz:2000

^3

Their share of the market distortion is indisputable, e.g. the global financial crisis in 2008.
Crotty 2009

The practical part of the thesis will investigate exclusively the problematics of short hedging.
The analysis is focused on mitigating the price risk that is represented by the prospective price
loss on the asset held long (energy commodities). The examination tracks risk on the side of supply
(producers/sellers). However, the results of the research should be applied to the opposite
position in financial assets as well. With regards to the scope of the investigation period and the
markets, only financial hedging will be considered, i.e. no physical delivery is considered. The
subject of the research is premising the research on the spot and futures prices. These derivative
instruments were chosen as appropriate securities because they are traded for all three examined
commodities. In addition, futures show efficiency, sufficient liquidity, and strong dependency on
spot.

0.2 Literature review

An interest in hedging can be traced to scientific circles in the first half of the twentieth
century. The pioneers who contributed to the research in this field predominantly focused on use of
the future contract in agricultural products. Howell1938 Scientists have noted the potential a
standardized derivative instrument can offer due to price fluctuations. Yamey states:"The practice
of hedging, by buying and selling futures contracts on organized produce exchanges, enables
manufacturers and merchants to cover themselves against adverse movements of prices of raw
materials in which they deal.“ Yamey:1951 Likewise, he points out that such protection against risk
may not be sufficient, since hedging may not be perfect. In principle, hedging was regarded as the
use of futures that should guarantee protection against undesirable price movement. The methodology
of protection was based on the closure of opposite trade positions with a ratio of 1:1, namely the
amount of spot volume should be protected with the same amount of futures. Arguments for such
manner was advocated by very similar price behavior in both considered assets. Graf:1953 Thus, a
potential loss on spot price could be eliminated by the gain from futures. Considering prices of
underlying and futures were determined by identical factors, hedging with same weight appeared to
be appropriate. Howell:1948

The fifties were revelatory for finance. The key achievement was the birth of modern portfolio
theory. Markowitz1952 A new optimization technique could be employed to find an appropriate hedge
ratio thanks to the contribution of Harry Markowitz. Telser1955 Since this point, the attention of
scientists did not restrict merely on statistical characteristics of separate assets, but also the
mutual interaction among them. This handling has helped to improve the benefit arising from
affinity within return and risk. Said circumstance motivated further evolution in hedging research.
The trivial procedure of same weights could be put aside owing to the application of the utility
function made known in modern portfolio theory.

One of the cardinal contributions in scientific literature was the article Hedging Reconsidered.
Working1953. The author pointed out three major economic effects of hedging. First, risk reduction
causes fewer bankruptcies of companies with positive effects on society and the whole economy. The
forthcoming level of spot prices could be estimated more accurately. He mentioned the positive
impact on commodity stocks as well. The radical thought heretofore was a revolutionary look at the
role of hedging. Working highlights an incorrect understanding of hedging. He states that
protection against potential financial loss is a secondary function of hedging. The primary
function, in his opinion, is usage for arbitrage purposes. Working 1953_jinyworking. The scientific
literature refers to this finding in later years too. Ederington 1979 Cicchetti:1981 Tomek:1987
Garbade:1983

Graf also discussed the perception of the insufficient ability of hedging to protect against price
risk in the fifties. Graf1953 The author examined the ability of futures to provide risk reduction
of potential losses. He considered the concept of hedging effectiveness and analyzed the degree of
efficiency. He expressed his opinion about changeability in hedging effectiveness. According to
empirical research, hedging shows dynamic development. The date of his study examinined the Chicago
Mercantile Exchange. Futures with near or second near month were used for hedging scope and the
researched commodities were corn, wheat, and oats.

Subsequently, scientific improvement in the hedging issue was introduced by Johnson. His
contribution is particularly in the area of quantification. In his view, the price and subsequent
return and price risk is represented as a process of random variable. Johnson 1960 Thus, the price
risk is identified as the variance of price change over time. The author defined the optimization
process to determine the weights of hedge instrument represented by futures. The knowledge
elaborated in modern portfolio theory was adopted in his consideration. Weights of futures are
calculated by solving the optimization problem in utility function. The objective function was the
portfolio variance. The breakthrough is that he can clearly identify a portfolio risk depending on
the linear relation between spot and futures. Furthermore, Johnson additionally developed a
methodology for measuring the effectiveness of implemented hedging. The inputs are the variance of
unhedged asset (spot) and hedged portfolio (combination of spot and futures). The hedge
effectiveness is referring to the  percentage of decrease in the variance. What is certainly
revealing is the importance of strong correlation in both assets for effective hedging. Johnson
1960

Stein adopted the concept of expected return to the hedging problem. The idea follows the
difference of current spot and futures price with the expected state of both assets. In his
reasoning, the carrying costs are reflected as well. Similarly to Johnson he considered portfolio
variance as a risk of the hedged position. In addition, he discussed the graphical interpretation
of dependence between expected return and risk. Furthermore, Stein worked with the theory of convex
indifference curves. He argues its shape by declining income utility referring to Tobin. Tobin1958

Scientific papers in successive years drew from the learning of modern portfolio theory
simultaneously using futures. In some studies, the subject of interest is the measurement risk
reduction on ex post data. Ederington is among the known authors in this area. He examines the
futures weights by the ordinary least squares model on empirical data. The regression is done on
percentage changes of prices.

^4

The application of regression for setting the weight on futures was realized before Ederington, for
instance in Heifner1966.

Ederington introduced the term basis risk and claims: " "A hedge is viewed as perfect if the change
in the basis is zero." Ederington1979 The measure of hedging effectiveness is described by the
coefficient of determination.

^5

Ederington is sometimes presented as the author of the measure of hedging effectiveness, for
instance: Herbst:1989, Pennings:1997, Lee:2001, Bailey:2005, Lien:2005, alexander 2007,
Bhaduri:2008, Lien 2012, Cotter:2012, Go:2015. In fact, the percentage reduction in portfolio
variance over the unhedged asset (spot) is established in the work of Johnson. Moreover, the author
demonstrated ability of deeper reflection on the matter. He did not limit to a trivial statement of
percentual reduction of variance, but he argued the importance of linear tightness in the prices,
price changes respectively. The fact is illustrated by the following deduction: σ p 2 = w s 2 σ s 2
+ w s 2 σ s,f 2 σ f 2 - 2 w s 2 σ s,f 2 σ f 2 i.e. σ p 2 = w s 2 ( σ s 2 - σ s,f 2 σ f 2 ) , if w s
=1 then σ p 2 = σ s 2 ( 1- ρ 2 ) . Back to the hedging effectivenes: HE=1- σ p 2 σ s 2 , certainly
HE= ρ 2 . It is obvious that the parametr ρ 2 is identical with the coefficient of determiation, so
it is not an innovative measurement. Hedging effectiveness was examined also by Heifner.
Heifner1966

Cicchetti examined hedging effectiveness on the money market. His study is focused on treasury
bills traded on the Chicago Mercantile Exchange. He actively refers to Ederington in his paper.
Ciccheti1981 Dale investigated hedging effectiveness on the foreign currency market. He called
attention to the work of Working. In addition, he researched market demand and supply. Dale:1981
Similarly oriented work is represented by Hill and Schneeweis. Hill1981 Examining the same
underlying asset was provided by Hsin, Kuo and Lee. Hsin1994 Moreover, an option is considered as a
hedge instrument too. A paper on reverting to agricultural commodities was provided by Wilson. The
subject of his measure was wheat. Wilson 1982 Cotter introduced a modern view to hedging
effectiveness. He pointed out the  lack of a standard measurement and simultaneously showed that
different measure could provide different results. He suggested using the concept of Value at Risk
in hedging effectiveness. Cotter2006

Another scientific area in hedging exploration is testing hedge ratio stability. Grammatikos and
Sauders investigated this field. The authors referred to the characteristics of previous research.
The data processing could be a shortcoming, since the analyses were based on a large data period.
However, it could be pitfalls. They focused on international money markets in their analysis.
Specifically, the research analyzed the Swiss franc, the Canadian dollar, the British pound, the
German mark, and the Japanese yen. The results indicated the unsuitability of the stable hedge
ratio hypothesis. Grammatikos 1983

Similar results were confirmed by other studies. Eaker:1987 Grammatikos:1986 Malliaris asked a
question if more input data for the analysis could provide better results, because such a dataset
will include more information. In contrast, he introduced a hypothesis of instability of the beta
coefficient. Thereby the first assumption was denied. The reason was that not all information
incorporated in the processing dataset is significantly relevant for hedging purposes. In other
words, the data from remote history did not provide much information for current data. His research
confirmed the hypothesis that the hedge ratio showed instability over time and, at the same time,
added that the beta coefficient was not significantly different. Finally, he argued that foreign
currency futures are convenient tools for hedging. Malliaris:1991

Another popular hedging area among scientists was comparing the performance of the optimum hedge
ratio with the payoff produced by a naive portfolio. Among such studies is the work of Grant and
Eaker. They compared multiple methods of hedging with a naive portfolio. Grant 1989 Similarly
oriented work is presented by Hammer. Hammer 1988. Three methods of hedge optimization are compared
with a naive portfolio in Park and Switzer. Park1995 The data from S&P 500 and TSE 35 was examined
for their analysis. Another paper comparing different forms of hedging with a naive portfolio was
written by Bystrom. He annualized the electricity market Nord Pool. Bystrom2003 However, as noted
by Collins, it is not always possible to confirm a benefit of "sophisticated" and complex
econometrics models over the performance of a naive portfolio. Collins2000

The three aforementioned areas of research developed more or less separately. Nevertheless, they
are closely related. So Marmer decided to examine all three together. The object of his
investigation was the Canadian dollar and exchange rate futures. The results of Marmer´s analysis
spoke in favor of the optimized approach over a naive portfolio. He also rejected the hypothesis of
stable hedge ratio. Further, he declared that with rising duration the hedging effectiveness rose
as well. Marmer1986 Similarly, the variability in hedge ratio and risk reduction over time was
confirmed by Benet. Benet 1992

The fundamental shortcomings of previous models were the unsustainable assumptions about the
stationarity of data. It was only a matter of time before scientists began to deal with the above
circumstances. One way to solve the problem with non-stationarity may be with the ARCH, GARCH
models or co-integration.

Cecchetti presented a modern perspective on this issue. He used the Autoregressive conditional
heteroscedasticity model ARCH for solving hedge ratio. Cecchetti:1988 After the successful
application of ARCH in the area of financial asset valuation, and when the Generalized
autoregressive conditional heteroscedasticity model (GARCH) was introduced it also became utilized
in hedging. Engle1986 Bollerslev 1987 Meyers is among the scientists working with GARCH in the
field of hedging who focused on commodity hedging. Six commodities were investigated in his
analysis under conditional variance and covariance. Myers 1991 Ghosh estimated the optimal futures
hedge ratio for non-stationary data and incorporated long-run equilibrium with short-run dynamics.
The underlying asset was S&P 500. He applied the Error Correction Model (ECM) in his research. The
results of ECM were better than the traditional approach. Ghosh 1993 Chou also chose a similar
procedure. He  was dealing with hedging on the Japanese Nikkei Stock Average. Again the results
were in favor of ECM when compared to the conventional models. Chou 1997

Further similar study is provided by Ghosh and Clayton. This time the CAC40, FTSE 100, DAX and
NIKKEI were explored. Co-integration was used once again and the results confirmed the hedging
effectiveness of ECM over the standard approach. Ghosh1996 Alexander stated: "if spread of spot and
futures are mean reverting, prices are co-integrated." In addition, he demonstrated the occasion
for using co-integration for different purposes like arbitrage, year-curve modeling, and hedging.
He verified hedging on European, Asian, and Far East Countries. Alexander 1999

One such paper was provided by Baillie and Myers. They examined hedging on six commodities. They
showed and emphasized how important it is to take into consideration the non-stationarity in
examined data. Baillie1991 Moschini and Myers introduced a new multivariate GARCH parametrization.
The authors tested hedging effectiveness on models with time-varying volatility. Moschinig.2002
Better hedging performance was confirmed by Yang and Allen in multivariate GARCH over classical OLS
as well. The examined data was from the Australian financial market. Yang2005 Lee and Yoder used
the Markov regime switching GARCH for estimating the minimum variance hedge ratio under a
time-varying variance. The authors used BEKK-GARCH for the analysis. They decided to use a complex
algorithm because of the changing joint distribution of spot and futures over time. The analyzed
areas were prices of corn and nickel. They confirmed a better hedging effectiveness by the surveyed
commodities after using the GARCH model. At the same time, they added that the difference in
comparison to other models is not significant. Lee:2007

Another technique for optimization treats the knowledge of portfolio theory likewise. Although, the
utility function is adopted from portfolio theory, it is different from the minimum variance.
Since, due to the previous optimization, now the extreme value represents the maximum of the
examined function. The portfolio variance is correspondingly employed to minimum risk optimization
but also an expected excess return of the hedge instrument. Note: An excess return is return from
an individual asset or portfolio exceeding the return of risk-free asset. In hedging, the excess
return from futures. Howard and D´Antonio recommended the use of the Sharpe ratio

^6

The expression of Sharpe ratio for finding an optimal hedge ratio is: s= E( r f - r free ) σ f 2 .
Sharpe compared return with undergone risk in so called "reward-to-variability". Sharpe1966 Later
on the ratio was called Sharpe ratio, although the former concept introduced by Roy was reflecting
similar measurement so called minimum accepted return. Roy1952

for estimating hedge ratio. Howard1984 Unless there is an assumption that the expected return of
futures is zero, then even this optimization generates an identical hedge ratio like the minimum
variance.

^7


Chapter 1

Hedging

1.1 Introduction to Hedging

Human life as well as any human activities are inextricably associated with an element of
uncertainty. The assumption of uncertainty implies the concept of risk that is part of our social
system in various forms of understanding. The risk is generally related with some negative meaning.
In the field of finance it is most associated with some diversion from an expected state, more
precisely the risk is embodied in deviation of expected return. It is not necessary, that the
divergence from expected state must not be solely a financial loss. The uncertainty could be
demonstrated equally by the positive deviation with higher profit than expected. So it is not only
limited to financial suffering. Although, this could be the major threat of the market attendance.

The risk is commonly divided into two components in the financial system. One part of the whole
financial risk is a unique risk. Sometimes called unsystematic, undiversified, residual or
idiosyncratic risk [Beja1972]. The concept of financial risk is applied when financial sources are
placed in the financial market no matter if in the form of investment or speculation. In theory,
but in practical application as well, it is common to work with multiple assets concept. A
reduction of residual risk is feasible applying appropriate algorithm for assets allocation
(diversification). Apparently, a complete elimination of this risk could be achieved [Elton1997].

The second part of the overall financial risk is referred as systematic part of risk. This category
of risk is also called as non-diversifiable risk. All actors in the financial market face the
systematic risk, because every asset is exposed to market risk [Frenkel2005]. Its existence can
only be accepted not removed. What applies to the financial markets could be apply without
reservations to the commodity market as well. As noted by [Garner2010]: "Producers and users of
commodities are constantly faced with price and production risk due to an unlimited number of
unpredictable factors including weather, currency exchange rates, and economic cycles". The
embodiment of the risk is actually the uncertainty arising with the nature of the markets. It is
inherent to entire financial market. However, that does not mean that active trading subjects on
the market could not influence the impact of the systematic risk. Hedging is a financial operation,
that aims to reduce the impact of non-diversible risk [Collins1999]. On the other hand, some
authors pointed out that only part of the systematic risk could be eliminated [Kolb2014]. Actually
it is a closure of positions hold in assets. Growth or decline in the price of one asset is offset
by the opposite movement of hedging assets price.

A long and short hedge position could be distinguished from the perspective of trade. The
difference depends on whether the asset exposed to the systematic risk intends to be bought, or the
asset is already owned and later on will be sold. If it is the right purpose, then an opposite
operation, i.e. sell of hedging assets on financial market, must be done. Therefore, such operation
is called short hedging [Rutledge1977]. Other kind of situation in trade relations constitutes such
activities where the short sell will be realized.

^1

Short sell (covered) means that an assets is borrowed and currently sold on the market and will be
returned in any time in the future [Linnertova2012].

Short sell is executed on behalf of securities loan. A similar strategy will be applied, when a
certain asset should be bought in the future. For instance an essential asset for an
entrepreneurial activity. Thus, in order to guard against an eventual rise in the price of intended
asset a sell of high correlated asset to the price of considered asset is required. The rising
price of the asset bought at present will offset an undesirable price growth of asset purchased in
the futures. The above strategy is obviously long hedging [Bessembinder1992].

Derivatives are widely used for hedging purposes [Bingham2013]. Thus, the underlying of derivative
is the object of hedging [Poitras2002]. The class of standard derivatives, like forward, future,
swap and future were commonly used [Hull2006]. Development of financial engineering has caused
usage of more sophisticated and complex instruments in recent years [Avellaneda1995].

^2

Exotic options, synthetic derivatives etc. [Chance2015].

However, a diversion of their primary function could be observed in the last two decades
[Bartram2009]. A growing interest in these instruments can paradoxically lead to increase of market
risk, since a large part of the transaction volume is driven by speculative intention
[Stiglitz2000].

^3

Their share of the market distortion is indisputable, e.g. the global financial crisis in 2008
[Crotty2009].

1.2 Literature review

An interest in hedging can be traced in scientific circles in the first half of the 20th Century.
The pioneers who contributed to the research in this field predominantly focused on the use of
future contract in agricultural products [Howell1938]. Scientists have noted the potential that a
standardized derivative instrument can offer due to price fluctuations. [Yamey1951] states:"The
practice of hedging, by buying and selling futures contracts on organized produce exchanges,
enables manufacturers and merchants to cover themselves against adverse movements of prices of raw
materials in which they deal“. Likewise, he points out that such a protection against risk may not
be sufficient, since hedging may not be perfect. In principle, hedging was regarded as usage of
futures which should guarantee protection against undesirable price movement. The methodology of
protection was based on closure of opposite trade positions in the ratio 1:1, namely the amount of
spot volume should be protected with same amount of futures. Arguments for such manner was
advocated by very similar price behavior in both considered assets [Graf1953]. Thus, a potential
loss on spot price could be eliminated by the gain from futures. Considering prices of underlying
and futures were determined by identical factors, hedging with same weight appeared to be
appropriate [Howell1948].

The fifties were revelatory for finance. The key achievement was the birth of modern portfolio
theory [Markowitz1952]. A new optimization technique could be employed to find an appropriate hedge
ratio thanks to the contribution of Harry Markowitz [Telser1955]. After this point, the attention
of scientists was not merely restricted to statistical characteristics of separate assets, but also
the mutual interaction among them. This handling enabled the improvement of the benefit arising
from affinity within return and risk. Said circumstance motivated further evolution in hedging
research. The trivial procedure of same weights could be put aside owing to the application of the
utility function made known in modern portfolio theory.

One of a cardinal contributions in scientific literature was the article Hedging Reconsidered
[Working1953]. The author pointed out three major economic effects of hedging. First, risk
reduction causes fewer bankruptcies of companies with positive effects on society and the whole
economy. The forthcoming level of spot prices could be estimated more accurately. He mentioned the
positive impact on commodity stocks as well. The radical thought heretofore was a revolutionary
look at the role of hedging. [Working1953a] highlights an incorrect understanding of hedging. He
states that protection against potential financial loss is a secondary function of hedging. In his
opinion the primary function of hedging is for arbitrage purposes [Working1953a]. This finding is
referred to in the scientific literature in later years, too [Ederington1979], [Cicchetti1981],
[Garbade1983], [Tomek1987].

[Graf1953]also discussed the perception of the insufficient ability of hedging to protect against
price risk in the fifties. The author examined the ability of futures to provide risk reduction of
potential losses. He considered the concept of hedging effectiveness and analyzed the degree of
efficiency. He expressed his opinion about changeability in hedging effectiveness. In accordance
with empirical research, hedging shows dynamic development. The area examined in his study was the
Chicago Mercantile Exchange. Futures with near or second near month were used for hedging scope,
and the researched commodities were corn, wheat and oats.

Subsequently, [Johnson1960] brought scientific improvement to the hedging issue. His contribution
is particularly in the area of quantification. In his view, the price and subsequently return and
price risk is represented as a process of random variable [Johnson1960]. Thus, the price risk was
identified as the variance of price change over time. The author defined the optimization process
to determine the weights of hedge instrument represented by futures. The knowledge elaborated in
modern portfolio theory was adopted in his consideration. Weights of futures were calculated
solving optimization problem in utility function. The objective function was the portfolio
variance. The breakthrough was that he could clearly identify a portfolio risk depending on the
linear relation between spot and futures. Furthermore, Johnson additionally developed a methodology
for measuring the effectiveness of the implemented hedging. The inputs are the variance of unhedged
asset (spot) and hedged portfolio (combination of spot and futures). The hedge effectiveness refers
to the percentage decrease in the variance. What is certainly revealing, is the importance of
strong correlation in both assets for effective hedging [Johnson1960].

Stein adopted the concept of expected return to the hedging problem. The idea follows the
difference of current spot and futures price with the expected state of both assets. In his
reasoning, the carrying costs are reflected as well. Like Johnson, he considered the portfolio
variance as risk of the hedged position. In addition, he discussed the graphical interpretation of
dependence between expected return and risk. Furthermore, Stein worked with the theory of convex
indifference curves [Stein1961]. He debates its shape by declining income utility referring to
[Tobin1958].

Scientific papers in the successive years drew from the learning of modern portfolio theory
simultaneously using futures. In some studies, the subject of interest is the measurement risk
reduction on ex post data. Among the known authors in this area is [Ederington1979]. He  examines
futures weights by the ordinary least squares model on empirical data. The regression is done on
the percentage of change of prices.

^4

The application of regression for setting the weight on futures was realized before
[Ederington1979], for instance in [Heifner1966].

[Ederington1979] introduced the term basis risk and claims: "A hedge is viewed as perfect if the
change in the basis is zero". The measure of hedging effectiveness is described by the coefficient
of determination.

^5

Ederington is sometimes presented as the author of the measure of hedging effectiveness, for
instance: [Herbst1989], [Pennings1997], [Alexander1999], [Lee2001], [Bailey2005], [Lien2005],
[Bhaduri2008], [Cotter2012], [Go2015], [Lien2015]. In fact, the percentage reduction in portfolio
variance over the unhedged asset (spot) is established in the work of [Johnson1960]. Moreover, the
author demonstrated an ability of deeper reflection on the matter. He did not limit to a trivial
statement of percentual reduction of variance, but he argued the importance of linear tightness in
the prices, price changes respectively. The fact is illustrated by the following deduction: σ p 2 =
w s 2 σ s 2 + w s 2 σ s,f 2 σ f 2 - 2 w s 2 σ s,f 2 σ f 2 i.e. σ p 2 = w s 2 ( σ s 2 - σ s,f 2 σ f
2 ) , if w s =1 then σ p 2 = σ s 2 ( 1- ρ 2 ) . Back to the hedging effectivenes: HE=1- σ p 2 σ s 2
, certainly HE= ρ 2 . It is obvious that the parametr ρ 2 is identical with the coefficient of
determiation, so it is not an innovative measurement. Hedging effectiveness was examined also by
[Heifner1966].

[Cicchetti1981]examined the hedging effectiveness on the money market. His study focused on the
treasury bills traded in the Chicago Mercantile Exchange. He actively referred to Ederington in his
paper. [Dale1981] investigated the hedging effectiveness on the foreign currency market. He
referred to the work of Working. He also researched market demand and supply. Similarly oriented
work was presented by [Hill1981]. An examining of the same underlying asset was provided by
[Hsin1994]. But moreover, an option is considered as hedge instrument too. The paper provided by
[Wilson1982] reverted to agricultural commodities. The subject of their measure was wheat.
[Cotter2006] introduced a modern view to hedging effectiveness. He pointed out the lack of a
standard measurement and simultaneously showed that different measure could provide different
results. He suggested using the concept of Value at Risk in hedging effectiveness.

Another scientific area in hedging exploration is testing the stability of hedge ratio. This field
was investigated by[Grammatikos1983]. The authors referred to the characteristics of previous
research. The data processing could be a shortcoming, since the analyses were based on a large data
period. However, it could be pitfalls. In their analysis they focused on the international money
market. Specifically, the research analyzed the Swiss franc, the Canadian dollar, the British
pound, the German mark and Japanese Yen. The results indicated the unsuitability of the stable
hedge ratio hypothesis.

Similar results were confirmed by other studies [Grammatikos1986], [Eaker1987]. [Malliaris1991]
asked a question if more input data for the analysis could provide better results, because such a
dataset will include more information. In contrast, he introduces a hypothesis of the instability
of the beta coefficient. Thereby the first assumption was refused. The reason was that not all of
the information incorporated in the processing dataset is significantly relevant for hedging
purposes. In other words, the data from remote history did not provide much information for current
data. His research confirmed the hypothesis that the hedge ratio showed instability over time. He
also added that the beta coefficient was not significantly different. Finally, he recommended that
foreign currency futures are convenient tools for hedging.

Another popular hedging area among scientists was to compare the performance of optimum hedge ratio
with the payoff produced by naive portfolio. Among such studies is the work of [Eaker1987]. These
studies compared multiple methods of hedging with the naive portfolio. Similarly oriented work is
presented by [Hammer1988]. Three methods of hedge optimization are compared with naive portfolio in
[Park1995]. The data from S&P 500 and TSE 35 was examined for their analysis. A further paper
comparing different forms of hedging with naive portfolio was written by [Bystrom:2003]. He
annualized the electricity market Nord Pool. However, noted by [Collins2000], it is not always
possible to confirm a benefit of "sophisticated" and complex econometrics models over the
performance of naive portfolio.

These three introduced areas of research developed more or less separately. Nevertheless, they are
closely related. So [Marmer1986] decided to examine all three of the presented areas together. The
object of his investigation was the Canadian dollar and exchange rate futures. The results of
Marmer´s analysis were in favor of the optimized approach over naive portfolio. He also rejected
the hypothesis of stable hedge ratio. Further, he declared that with rising duration the hedging
effectiveness rises as well. Similarly the variability in hedge ratio and risk reduction over time
was confirmed by [Benet1992].

The fundamental shortcomings of previous models were the unsustainable assumptions about the
stationary of data. It was only a matter of time before scientist began to deal with the
aforementioned circumstances. One way to solve the problem with non-stationarity may be with the
ARCH/ GARCH models or co-integration.

A modern perspective on this issue was presented by [Cecchetti1988]. He used the Autoregressive
conditional heteroscedasticity model (ARCH) for solving the hedge ratio. After successful
application of ARCH in the area of financial asset valuation, and when the Generalized
autoregressive conditional heteroscedasticity model (GARCH) was introduced it also became utilized
in hedging [Engle1986], [Bollerslev1987]. Among the scientists working with GARCH in the field of
hedging was [Myers1991], who focused on commodity hedging. Six commodities were investigated in his
analysis under conditional variance and covariance. [Ghosh1993] estimated optimal futures hedge
ratio for non-stationary data and incorporated long-run equilibrium with short-run dynamics. The
underlying asset was S&P 500. He applied the Error Correction Model (ECM) in his research. The
results of ECM were better than the traditional approach. A similar procedure was also chosen
by[Chou1997]. He dealt with the hedging on the Japanese Nikkei Stock Average. Again the results
were in favor of ECM when compared with the conventional models.

Further similar study is provided by [Ghosh1996]. This time the indexes CAC40, FTSE 100, DAX and
NIKKEI were explored. Again co-integration was used and the results confirmed the hedging
effectiveness of ECM over the standard approach. [Alexander1999] stated:"If spread of spot and
futures are mean reverting, prices are co-integrated". In addition, he demonstrated the occasion
for using co-integration for different purposes like arbitrage, year-curve modeling and hedging. He
verified the hedging on European, Asian and Far East Countries.

One such paper was provided by [Baillie1991]. They examined hedging on six commodities. They showed
and emphasized how important it is to take into consideration the non-stationary in examined data.
[Moschini2002] introduced a new multivariate GARCH parametrization. The authors tested hedging
effectiveness on models with time-varying volatility. Better hedging performance was confirmed by
[Yang2005] in multivariate GARCH over classical OLS as well. The examined data was from the
Australian financial market. [Lee2007] used the Markov regime switching to GARCH for estimating
minimum variance hedge ratio under time-varying variance. The authors used BEKK-GARCH for the
analysis. They decided to use a complex algorithm because of the changing joint distribution of
spot and futures over time. The analyzed areas were the prices of corn and nickel. They confirmed a
better hedging effectiveness in the surveyed commodities after using the GARCH model.  However, at
the same time they added that the difference in comparison to other models is not significant.

Another technique for optimization treats the knowledge of portfolio theory likewise. Although, the
utility function is adopted from portfolio theory, it is different from the minimum variance.
Since, due to the previous optimization, now the extreme value represents the maximum of the
examined function. The portfolio variance is correspondingly employed to minimum risk optimization
but also an expected excess return of the hedge instrument.

^6

An excess return is return from individual asset or portfolio exceeding the return of risk-free
asset. In hedging the excess return from futures.

[Howard1984] recommended using the Sharpe ratio for estimating hedge ratio.

^7

The expression of Sharpe ratio for finding an optimal hedge ratio is: s= E( r f - r free ) σ f 2 .
Sharpe compared return with undergone risk in so called "reward-to-variability" [Sharpe1966]. Later
on the ratio was called Sharpe ratio, although the former concept introduced by [Roy1952] was
reflecting similar measurement so called minimum accepted return.

Unless, there will be an assumption that the expected return of futures is zero, then even this
optimization generates an identical hedge ratio like the minimum variance.

^8

h * = ρ σ s σ f , and that is identical to minimum variance hedge ratio.

Nevertheless, [Chen2013] warned that the Sharpe ratio is not a linear function, which could be
problematic. As a consequence, solving the optimum could lead to finding the minimum instead of the
maximum value.

Another optimization technique for finding the maximum of an objective function is the optimum
mean-variance hedge ratio [Hsin1994]. In addition to portfolio expected return and variance, the
objective function also includes an attitude for risk aversion. This concept was already introduced
by [Heifner1966].

^9

The utility function was expressed as: ψ = ∑ k x k μ k - λ ∑ k ∑ h x k x h σ k,h .

However, he pointed out a subjectivity of this parameter. Under certain conditions the hedge ratio
derived from optimum mean-variance utility function will be identical to the ratio from minimum
variance.

^10

Following the first order conditions the term for hedge ration gives: h * =-( E( r f ) λ σ f 2 - ρ
σ s σ f ) . It is apparent that if the risk aversion goes to infinity or if the expected return is
equal zero, then the ratio transforms to the minimum variance hedge ratio.

The concept of optimum mean-variance was later applied by other authors [Hsin1994], [Moschini2002],
[Casillo2004].

[Cotter2010]examined time-varying risk aversion from the observed risk preferences in market
participants. According to the risk aversion a long and short hedging was implemented. The authors
argued that respecting the concept of time-varying risk aversion outperformed the standard OLS
approach. Subsequently, those authors focused on the application of different utility functions for
risk preferences in hedging. Namely, they employed logarithmic, exponential and quadratic forms of
utility function for determining risk aversion. The results of their study confirmed significant
differences in the optimal hedge ratio upon distinct utility function [Cotter2012]. The
above-mentioned authors also emphasized the relevance of asymmetry in the return distribution,
which could have impact on the hedging effectiveness. They accented a shortcoming of minimum
variance hedge ratio [Cotter2012b].

They are frequently refered to by[Kolb1992] in relation to Gini application in hedging. A modified
version of the Gini ratio, the so-called Extended mean Gini coefficient, was introduced in their
article. The improvement of Gini was done by incorporating an element of risk aversion to the
ratio. Additionally, [Lien1993] emphasized that the parameter of risk aversion plays a crucial role
in the estimation. [Shalit1995] noted a particular problem by comparing the hedging effectiveness
between mean-variance and mean-extended Gini coefficient. [Lien2002] then discussed an economical
implementation of the conventional approach for hedging purpose derived from the expected utility
maximization paradigm with the new econometrics procedures. Similar [Chen2013] compared the
different methodologies for calculating an optimal hedge ratio. Nonetheless, they asserted that a
"modern" approach could not always over-perform the classical OLS.

Emerging studies from recent years have used more and more sophisticated and complex mathematical
tools. The methodology dealing with joint distribution probability function is one example. So new
papers appeared with the models from the copula family to find a proper hedge ratio
[Cherubini2004]. Alternatively, to make things more interesting, a combination of copula calculus
with other econometrics methods were applied [Hsu2008], [Lai2009], [Lee2009]. Of course, the
wavelet transformation could be added to the innovative forms of solving the hedging problem as
well. [In2006] tried to solve the problem with time-varying covariance within spot and futures
using wavelet. Similarly, [Fernandez2008] estimates hedge ratio after wavelet analysis. Wavelet and
ECM were also applied by [Lien2007] with distinctive results. In his article he came to the
conclusion that the time horizon for hedging seems to be important. With a growing time horizon of
the hedged asset the wavelet approach delivered better performance.

Hedging in energy commodities is mostly applied in oil, natural gas and electricity. Typically, a
high level of volatility is common for all three markets. [Chen1987] examined the oil hedging. He
explained the instability in oil prices with restructuring in oil industry in eighties. But how
shown the later oil price process, the high instability in price of oil is inherent. Nevertheless,
he noted that he risk exposure affected primary producers and users and is advocating application
of futures contracts as an appropriate tool helping protect against price risk. [Duffie1999] gave a
guidance on how to handle with empirical behavior of volatility in the energy sector. He analyzed
the problem of stochastic volatility. Simultaneously they described various form of Markovian
stochastic volatility models. [Haigh2002] examined the market of crude oil, heating oil and
gasoline to reduce the price volatility. The applied model was constructed as a crack spread. They
also focused attention on time to maturity effect in futures. The result of their investigation
confirmed usage of Multivariate GARCH methodology as beneficial. [Dahlgren2003] specialized in
protection against price risk in the power market. Their efforts were focused on the promotion of
risk assessment and application of hedging in power sector. [Woo2006] examined the degree of
co-integration of natural gas market in the USA. The data for analysis was from Californian market.
They recommended to use futures contract traded in New York Mercantile Exchange with underlying
spot of Henry Hub to reduce price risk. [Alizadeh2008] focused on the same market in New York but
to examined commodity was oil. The authors applied the concept of dynamic hedging and operate with
high and low volatility regime. According their results a dynamic hedging provides significant
reduction in portfolio risk. [Chang2011a] examined two global oil markets Brent and WTI. In the
analysis was used the model with multi-variation volatility models. The results implied different
hedge ration in accordance with the used methodology and similarly variable level of hedging
effectiveness. They also shown differences in hedging between the two markets [Chang2011a].

1.3 Hedge ratio

The first prerequisite for application any hedging strategy is a selection of convenient asset,
that should be hedged. A convenient asset must exist for hedging purpose. Convenient assets for
hedging are those with high price affinity. Its existence can be simply confirmed by the value of
Pearson product-moment correlation coefficient [Calmorin2004]. The absolute value of correlation
coefficient close to one is a prerequisite for a successful hedging. Which means that prices of
surveyed assets may evince identical or reverse movement. It is irrelevant. The crucial is that the
price extent in both assets is identical. When the prices will have a high negative correlation,
then the hedging will be executed by realization of the same position, i.e. both assets will be
bought or sold. However, the assets show high positive correlation and such situation is closer to
real data, then the hedging must be executed by opposite transaction in assets. The long asset has
to be hedged by selling other asset and vice versa.

However, in the real word it is rather sporadic to trace perfectly correlated prices over long
term. How noted by [Working1953] the circumstance of releasing price tightness between spot futures
prices results an ineffective hedging.

In order to apply hedging it is require to determine the weights of asset commonly named hedge
ratio. The hedge ratio is referring how much hedging assets should be considered to the exposure.

At the very beginning the hedging was based on the assumption of identical price movement in
considered assets. The protection against unintentional price loss should be provided by closing
the open position with hedge asset. In other words, the volume of owned assets was matched by the
same quantity of hedge assets. Given hedging strategy is also called naive portfolio
[DeMiguel2009]. Nonetheless, such protection has been frequently associated with imperfections
[Brooks2002]. The inadequate protection was caused by a lack of perfect correlation. Subsequently,
an unequal hedge ration began to be applied. Handling a new hedging methodology was enabled due to
usage of modern portfolio funding. Afterwards, the hedge position has been determined by using the
optimization of minimum risk. The risk was given in the form of variance. The objective function
for solving the extreme value was exactly variance of two assets (variance of portfolio). Where in
addition to the solely variance also statistical relation in the form of covariance is considered.
Referring to the work of [Markowitz1952] the diversification effect between two assets in order to
minimize risk was applied.

The portfolio variance:

^11

The portfolio is consisting of two assets (spot and futures).

σ p 2 = w T Σ w= ∑ i=1 n ∑ j=1 n w i w j σ i,j

Then the quadratic utility function for solving an optimal hedge ratio represented by futures is:

σ p 2 = w s 2 σ s 2 + w f 2 σ f 2 +2 w s w f σ s,f

Where w s is weight of spot, σ s 2 is variance of spot, w f is weight of futures, σ f 2 is variance
of futures, and σ s,f is covariance between spot and futures. The weights restriction is distinct
to classical optimalization in [Markowitz1952]. It is assumed one unit of spot. The object of
hedging here is to find an appropriate weight proportion of futures to the spot. Solving the
extreme of the objective function will provide following expression:

∂ σ p 2 ∂ w f =2 w f σ f 2 +2 σ s,f (1)

When w f w s = h * then the optimal hedge ratio is derived as:

h * =- σ s,f σ f 2 (2)

The sign of the hedge ratio refers about short position in futures 2. Testing second order
condition we can confirmed that the extreme value of the objective function is minimum 3:

^12

The assumption could be theoretically violated only by risk-free asset. But this is out of
consideration, since a risk-free asset is not part of portfolio variance.

∂ 2 σ p 2 ∂ w f ∂ w f =2 σ f (3)

The ratio is called minimum-variance hedge ratio (MVHR) precisely because of the objective
function. In the practical part will be applied the MVHR. Nevertheless, the statistical
characteristics and their interaction are going to be estimated according to distinct
methodologies. Overall, there are seven distinct techniques to provide the hedge ratio in the
analytical part of the dissertation. It concerns the following methods:

·         Ordinary least square

·         Naive portfolio

·         Error Correction Model

·         Arch/Garch Model

·         Wavelet

·         Copula

·         Extended Mean Gini Coefficient

1.4 Measure of hedge effectiveness

The results of the weights for futures produced by each of the seven above introduced models will
be used for hedging the spot prices of the three examined commodities. The obtained hedge ratios
are applied on the real data in the following twelve months. Thus, the ability to reduce risk in
every method is measured and compared during the specified period.

Conventional measurement

As soon as the hedge ratio is calculated, the value is applied to the real data to calculate
variances and covariance. The evaluation of hedging performance is based on percentage reduction in
spot variance compared to portfolio variance. The metrics were made in accordance with the
methodology of [Johnson1960]:

HE= σ U 2 - σ H 2 σ U 2 .

HE stands for hedging effectiveness, σ H 2 is the variance of hedged portfolio or spot together
with futures, and σ U 2 is the variance of unhedged portfolio or the variance of spot. It is
apparent that the better the futures match the spot, the lower risk the portfolio will evince. In
other words, the risk reduction will be higher and the coefficient HE will be closer to 1 which
actually would be a 100 % reduction of risk. On the contrary, the closer the value of HE is to
zero, a larger imperfection of hedging is present.

In the calculation of the portfolio variance it must be taken into account of the negative futures
weights 2. The reverse weights leads to a reduction of the covariance risk, since the both
variances are additive 4:

σ p 2 = σ s 2 + h 2 σ f 2 &UnderBrace; variance risk -2h σ s,f &UnderBrace; covariance risk . (4)

It is evident that the closer the price co-movement exists, the better hedging will be provided. So
the hedging efficiency is determined by correlation, because it is obviously true:

ρ = σ s,f σ s 2 σ f 2 .

The effects of risk reduction decrease unless the correlation releases. In the case of a
significant drop in correlation, or the correlation even changes to a negative value, then the risk
of the hedged portfolio paradoxically increases.

[Cotter2006]highlights deficiencies arising with the use of a very popular measure of hedging
effectiveness. That´s why he gives a measurement of hedging effectiveness based on the concept of
Value at Risk [Cotter2012b]. Thus the expression for measurement takes the  following form:

HE=1- Va R 1%H Va R 1%uU ,

where VaR corresponds to the ( 100-x ) th percentile of the portfolio over next N days. He applied
x=1 and N=1 and the subscripts H,U mark hedged and unhedged portfolio.

^13

He also uses other metric for measuring the hedging effectiveness. For more information see
[Cotter2012b].

Alternative metrics

The results of hedging effectiveness can be compared partially by months. However, the question is
how to compare the aggregated results over the whole exterminated period. Since it would be
problematic to provide a test of statistical significance from the achieved value in HE between all
methods over the intended period.

^14

More on the problematic of independent observation and testing statistical significance could be
find in [Anderson2011].

Therefore it was advisable to find another measurement to valuate the appropriateness of each
method for the examined commodity. A simple comparison provides tools of descriptive statistics
such as the mean or the median.

And another option for a comprehensive comparison is the sum of differences between a reference
value and the achieved HE. If the reference value is established as one, then the difference will
refer to the residual variance risk in the portfolio. Thus, the relation could be expressed as
follows: R ri =1-exp( λ i ), λ i =ln( H E i ). (5)

Where R r is the residual risk and λ is the natural logarithm of achieved hedging effectiveness by
each model. In fact, the above relation indicates the remaining percentage from perfect hedging
expressed in variance. Obviously it is possible to derive the risk reduction in absolute value: Δ σ
= σ U exp( λ ), Δ σ = σ U - σ H .

In the equation the terms σ U and σ H are the standard deviation of the logarithmic returns of spot
and futures prices, and here the risk reduction is calculated from standard deviations. Then the
standard deviation of the hedged portfolio is possible to write as: σ H = σ U ( 1-exp( λ ) ). (6)

Thus, the cumulative residual risk may be used to compare different methods among themselves.

An alternative way for comparing the models is to establish a ranking according to their
performance in the particular months. The achieved score in all months provides a view about
performance under particular measurement. However, this evaluation does not take into account the
overall effect of risk reduction over the whole period.

Chapter 2

Applied models for determining the MVHR

2.1 Ordinary least square

Let X be a matrix with the dimension n x 2. The matrix will contain one independent variable and a
constant term in the first column. In the system there will be n observations (rows). The matrix X
is regarded as an independent variable. The same number of components n have vectors Y, β and
&epsiv; . Where Y is a vector of dependence variable and &epsiv; is a vector of errors. The β is a
vector of unknown population parameter. Then the statistical model for linear regression between
two variables looks like a subsequent system of equations [Gujarati2009]:

[ Y 1 Y 2 &vellip; &vellip; Y n ] nx1 = [ 1 X 1 1 X 2 &vellip; &vellip; &vellip; &vellip; 1 X n ]
nx2 [ β 1 β 2 &vellip; &vellip; β 3 ] nx1 + [ &epsiv; 1 &epsiv; 2 &vellip; &vellip; &epsiv; n ] nx1
.

With simplified matrix notation, where the equation consists of a systematic component and a
stochastic component:

Y= X β &UnderBrace; syst. comp. + &epsiv; &UnderBrace; stoch. comp. .

The object of a linear regression model is to estimate the parameter β ˆ . The most commonly used
projection technique for estimation of population parameter β ˆ is to minimize residuals in squared
form [Rachev2007]. The vector of residuals could be expressed:

e=Y-X β ˆ .

The objective function for optimization could be then assumed as following [Goldberger1964]:

^1

Since, the expressions e and &epsiv; are not equal, so large e ≠ / &epsiv; [Gujarati2009]. It may
be crucial to understand the distincion between both. The verctor e could be observed, unlike the
stochastic parametr &epsiv; .

( e T e ) 2 ⇒ min. (7)

It is obvious that the following relation is valid:

e T e= ( Y-X β ˆ ) T ( Y-X β ˆ )= Y T Y-2 β ˆ T X T Y+ β ˆ T X T X β ˆ . Certainly, it needs to
derive 7 with respect to β ˆ to find the minimum of the given function:

^2

A bivariate case of OLS regression model can be expressed as: . Y i = α + β X i + e i . To find the
parameters, the same approach to find the minimum of the sum of squared errors (SSE), will be
applied. ∑ i=1 n e 2 ⇒ min . Then: SSE= ∑ i=1 n ( Y i - α - β X i ) 2 . Partial derivation with
respect to β will provide: ∂ SSE ∂ β =-2 ∑ i=1 n X i ( y i - α - β X i ) ⇒ … ⇒ β [ ∑ i=1 n X i 2 -
( ∑ i=1 n X i ) 2 n ]= ∑ i=1 n Y i X i - ∑ i=1 n Y i ∑ i=1 n X i n ⇒ β = ∑ i=1 n Y i X i - ∑ i=1 n
Y i ∑ i=1 n X i n ∑ i=1 n X i 2 - ( ∑ i=1 n X i ) 2 n . Which is of course an identical notation
with 2.

∂ e T e ∂ β ˆ =-2 X T Y+2 X T X β ˆ =0. (8)

To make sure it is really a minimum, the second order conditions must be solved:

∂ 2 e T e ∂ β ˆ ∂ β ˆ =2 X T X.

As long as the full rank of X is true, the matrix is positive definite. Therefore, assumption of
finding a minimum was correct. Solving the equation 8 will generate normal equations [Verbeek2008]:

^3

The matrix ( X T X ) is square and symmetric. Then ( X T X ) -1 must exist and indeed ( X T X ) -1
( X T X )=I . Where I is identitymatrix k x k. However, a perfect multicollinearity, that means
some columns in X are lineary dependent, will violate this assumption.

( X T X ) β ˆ = X T Y (9)

And after all the desired parameters β ˆ will be:

β ˆ = ( X T X ) -1 ( X T Y ).

Properties of Ordinary Least Square estimators

The properties of estimators can be derived through normal equations 9.

If Y=X β ˆ +e then:

( X T X ) β ˆ = X T ( X β ˆ +e ).

Which provides the following relation:

X T e=0.

In accordance with the notation above the characteristics of OLS are:

·         There is no correlation between observed X and residuals.

·         The predicted Y is uncorrelated with residuals.

·         The sum of residuals is equal to zero.

·         The sample mean of residuals is zero.

·         The mean of predicted and observed Y is equal, then Y ˆ &OverBar; = Y &OverBar;
[Wooldridge1995].

·         The regression hyperplane goes over the observed values means, then X &OverBar; = Y
&OverBar; [Wooldridge1995].

More about OLS properties can be found in Greene:2003. The stated characteristics should always
hold true. Even though the properties of residuals are certain, there is not any information about
inquired parameter β ˆ . To make any further suppositions about the real population parameter beta,
it is necessary to make some assumptions.

For a classical linear regression model the following assumptions are essential:

·         Linearity in Parameters,

The assumption requires that the dependent variable Y is a linear combination of regressors and the
stochastic terms &epsiv; [Andersen1998].

·         The rank of matrix X is full,

The matrix of explanatory variables must be full rank. There are two essential conditions for this
assumption. The number of observation N is larger then number of explanatory variables (K) or at
least equal to N, N ≥ K [Wooldridge1995].

^4

In our case there is only one explanatory variable - futures. Second, X T X is a singular matrix,
i.e. there is no multicollinearity.

·         An exogenous Explanatory Variable,

There is no ability of regressor to explain the error terms &epsiv; [Berry1993].

·         The Error Terms have to be Independent and Identically Distributed,

Given assumption requires the expected value of Errors to be zero and simultaneously the variance
of Errors must be constant, &epsiv; i ∼ iid( 0, σ 2 ) [Berry1993]. Actually, it is the assumption
of homoscedasticity and no autocorrelation.

^5

Uncorrelated errors.

·         The distribution of Error Terms in the population is Normal.

The claim of Central Limit Theorem could be applied for the Assumption 5, i.e. if N is enough
large, then estimated coefficient will be asymptomatically normal distributed [Gujarati2009].

According to the Gauss-Markov Theorem, if the assumptions 1- 4 are satisfied, the Estimator of
Ordinary Least Square is the Best Linear, Unbiased and Efficient Estimator (BLUE) [Greene2003].

In order to find the optimal hedge ratio h * a regression of spot on futures was provided. Since
the closing prices do not fulfill the required assumption of classical linear regression the data
was transformed. Instead of closing prices were applied the percentage changes of spot and futures.
Thus, the optimal hedge ratio according to OLS methodology should corresponds to estimated
parameter β ˆ . The transformed data are:

r st =ln( P t spot P t-1 spot ), (10)

and

r ft =ln( P t futures P t-1 futures ). (11)

P spot is the closing price of spot and P futures is the closing price of futures. Then the hedge
ratio could be estimated by the following model:

r st = α + β r ft + &epsiv; t .

2.2 Naive portfolio

Hedging based on a naive portfolio, or in other words an equally-weighted portfolio, is the easiest
way to protect oneself against price risk. The formal notation for weights of a naive portfolio
could be expressed as:

w s = w f ,

or w f w s =1,

hence h * =1.

While it may seem that a naive portfolio is a primitive technique with insufficient hedging effect,
this may not be true. Several facts speak in favor of this approach. The methodology is trivial and
thus easily implementable. It is not too sensitive to small changes in the parameters in comparison
to more complex models. It is also more robust compared to other techniques. In some cases, the
performance of a naive portfolio is nearly as good as more sophisticated models [DeMiguel2009].
From a different perspective, no model can always provide consistently better performance then a
naive portfolio [Tu2011]. Some authors even suggest that a naive portfolio can achieve the highest
Sharpe ratio when compared with other models [Poitras2002]. Moreover, arbitrarily chosen weights
could serve as a benchmark [Amenc2002].

^6

In the concept of naive portfolio the effect of diversification could be mentioned too. In
accordance with the assertion of Markowitz, i.e. Markowitz´s Law of Average Covariance, assuming
significantly large number of assets with equal weights the portfolio variance corresponds with an
average covariance [Markowitz1976]. Surely the portfolio variance could be expressed as following:
σ p 2 = ∑ i=1 n w i 2 σ i 2 + ∑ i=1 n ∑ j=1,i ≠ /j n w i w j σ ij . And if n → ∞ thus, the fact of
average variance is evident from the succeeding claim [Elton1997]: σ p 2 = 1 n 2 n σ i 2 &OverBar;
+ n-1 n σ ij &OverBar; and then: lim x → ∞ σ p 2 = σ ij &OverBar; .

2.3 Error Correction Model

Financial time series are characterized by the presence of non-stationarity. The classical linear
regression model can not be applied to such a data set. If a regression were performed, then the
results would not be correct. Although a model can show a significant statistical dependency since
the value of R 2 is high. In fact, the model does not display a real dependency. Instead the result
shows spurious regression [Greene2003]. The fundamental problem in financial time series is that
the data is integrated, but not related. Even [Granger1974] pointed out the problem with spurious
regression.

Numerous analyses have shown that economic and financial data can evince short-run and long-run
relationships. A short-run relation only exists for a short time period, and then disappears. In
contrast, a long-run relation will not disappear over time. This relationship is described as
equilibrium and time series show a tendency to oscillate around it [Pesaran1996]. As the system is
exposed to continual shocks it is never in equilibrium, but may be in long equilibrium, i.e. in a
state that converges over time to equilibrium. Such time series are then co-integrated [Engle1987].

The authors suggested a two-step procedure to solve the problem with spurious regression. In the
first step. a stationary of researched data is investigated. The unit root test is applied for this
purpose. The test of unit root constitutes to validate a hypothesis of random walk against an
alternative hypothesis represented by the AR(1) process [Dickey1979]. The notation for unit root
test according to the given methodology is:

y t = θ y t-1 + &epsiv; t ,

with the null hypothesis H 0 : θ =1 and the alternative hypothesis H 1 : θ <1 .

^7

The model for testing unit root could be also expanded of a constant or a time component: y t = α +
θ y t-1 + &epsiv; i or y t = α + γ t+ θ y t-1 + &epsiv; t .

An alternative expression for the model:

Δ y t =( θ -1 ) y t-1 + &epsiv; t = δ y t-1 + &epsiv; t , (12)

where δ = θ -1 . Thus the unit root hypothesis H 0 : δ =0 against H 1 : δ <0 .

However, the original Dickey-Fuller test carries certain disadvantages [Phillips1990]. Therefore a
modified version, the so-called Augmented Dickey-Fuller unit root test, was introduced [Said1984].
Dickey-Fuller tests for time series with stationary and invertible residuals could be modified to:

y t = θ y t-1 + &epsiv; t ,

where

&epsiv; t + ∑ i=1 p varPhi i &epsiv; t-i = e t + ∑ j=1 q &upsi; j e t-j , (13)

and e t &thksim; IID( 0, σ e 2 ).

Although the lags of p and q values are unknown, the process could be approximated due to an
autoregressive process [Said1984]. Hence the unit root test could be based on the following model:

Δ y t = θ y t-1 + ∑ i=1 n ψ i Δ y t-i + η t .

After testing the hypothesis H 0 : θ =0 , which is the evidence of random walk in y 1 , y 2 , … , y
N and thus the process has a unit root. The alternative hypothesis expects H 1 : θ <1 , thus
stationary.

Once the order of differentiation to obtain a stationary in data is determined, it may be provided
a regression to identify the long-term relationship in the series, or long memory - β ˆ
[DeBoef2001]. The co-integrating regression corresponds to:

y t = α + β x t + μ t . (14)

The static cointegration regression is like the OLS: the closing prices of spot as a dependent
variable, and futures closing prices as an independent variable. Whenever a short-term relationship
(short memory) within the residuals is demonstrated, that implies cointegration in time series
[DeBoef2001]. Then the further changes in the dependent variable Δ y t = y t - y t-1 can be
regressed on changes in the independent variable Δ x= x t - x t-1 and the equilibrium error from
the previous period. The model corresponds to:

Δ y t = α + β Δ x t - γ μ t-1 ˆ + η t .

The parametr β ˆ represents an estimate for equilibrium rate and the parametr γ ˆ a short-run
dynamics [DeBoef2008]. The model has to evince a permanent memory, in other words the unit root is
confirmed and the errors from cointegrating regression 14 are not serially correlated and there
will be no simultaneity [Enders1998]. The parameter β ˆ can be used as a hedge ratio h * = β ˆ
[Moosa2003].

2.4 ARCH/GARCH

Financial time series are characterized by the presence of a dynamic development of character
traits. One of the most investigated fields in finance is risk. Since volatility represents a
numerical measure of risk, it attracted the attention of scientists. Variation in the volatility of
financial data over time has been clearly demonstrated [Andersen1997]. In addition, it is possible
to identify a volatility clustering in financial time series [Lux2000]. It was confirmed that the
GARCH models could capture the volatility clustering [Andersen1998].

There are valid objections in the application of the standard OLS model for estimating the hedge
ratio, as the model is inconsistent. The inappropriateness emerges, because the model does not
respect the heteroscedasticity embraced in prices [Park1987]. Another disadvantage of the OLS model
is a neglect of relevant information [Myers1989]. The given deficiencies were removed by the
introduction of an innovative approach in the form of the autoregressive conditional
heteroscedastic model (ARCH). The Arch model introduced by [Engle1982] should capture the
characteristic of heteroscedasticity in financial series in the proper way, especially the
variation of volatility and volatility clustering. The ARCH model expresses the mean process of a
financial time series in the following way:

r t = μ + &epsiv; t ,

where t=1,2, … ,N , r t are the analyzed time series with N observations, μ is the mean of the time
series and &epsiv; t are residuals. The variances of residuals in ARCH model are assumed as:

&epsi; t = σ t z t ,

simultaneously z t ∼ N( 0,1 ) , thus the σ t 2 corresponds to the following process:

σ t 2 = α 0 + α 1 &epsi; t-1 2 + … + α q &epsi; t-q 2 , (15)

assuming α 0 >0 α i >0 and i>1 .

The ARCH model was able to describe the stochastic process of analyzed time series and to predict
residuals. However, the model exhibits some shortcomings as well. Namely, the model works with a
hypothesis of symmetrical shocks. In fact, a different effect on prices was confirmed by positive
and by negative shocks [Sadorsky1999]. Further, the model could capture the volatility dynamics
only if a sufficient number of observations and parameter are integrated [Maddala1992].

In order to exceed the restraints of the ARCH model, a generalized form of ARCH (GARCH) was
introduced by Bollerslev. The conditional variance in the univariate GARCH model could be denoted
as:

σ t 2 = α 0 + α 1 &epsi; t-1 2 + … + α q &epsi; t-q 2 + β 1 σ t-1 2 + … + β p σ t-p 2 , (16)

on the conditions α 0 >0 and β i >0 .

Unlike 15 the formula 16 numerous residual´s lags can be replaced by a limited number of
conditional variances as well as the number of estimated parameters can be omitted
[Bollerslev1986]. In fact, there is a large set of GARCH models [Bollerslev2008]. However, the most
frequently used is obviously the basic form GARCH (1,1). The expression is based on the given
formula 16:

σ n 2 = γ V L + α &epsi; n-1 2 + β σ n-1 2 ,

where V L represents a long-run variance. The parameters γ , α , β are weights satisfying the
condition: γ + α + β =1 . In other words the σ n 2 is based on the recent term &epsi; n 2 and the
recent σ n-1 2 . Substituting the γ V L by ω mostly leads to giving the form of GARCH (1,1):

σ n 2 = ω + α &epsi; n-1 2 + β n-1 σ n-1 2 . (17)

As mentioned in the literature review, the GARCH models were also used for hedging purposes. The
main motivation for the application of GARCH for hedging was argued by the same information set
affecting both spot and futures prices [Baillie1991]. Hence, the bivariate GARCH (BGARCH) models
have been used on cash and futures to estimate hedge ratio [Park1995]. The main benefit of the
Multivariate GARCH application is its ability to capture a time-varying hedge ratio.

The general formula of BGARCH for spot and futures could be expressed:

r st = μ s + &epsiv; st ,

r ft = μ f + &epsiv; ft ,

simultaneously:

[ &epsiv; st &epsiv; ft ] &smid; Ω t-1 ∼ N( 0,H ),and H t =[ h ss,t 2 h sf,t 2 h sf,t 2 h ff,t 2 ].

Here r st corresponds to 10, r ft corresponds to 11, H t is a positive definite matrix with
conditional time-varying covariances.

Consequently the equation for H t will be:

vech( H t )=vech(C)+ ∑ i=1 q Γ i vech( &epsiv; i-1 &epsiv; i-1 T )+ ∑ i=1 p Δ i vech( H t-1 ),

where the C is 2 x 2 matrix and Γ i and Δ i are 3 x 3 matrices.

^8

The term “vec” implies vectorization of matrices and the “vech” is applied for the lower
triangular. Indeed, the matrices are symmetric positive definite.

To ensure positive definiteness of the conditional covariance matrix is not feasible as long as
non-linear restraints are not set up [Lamoureux1990]. In addition, the model assumes a large number
of parameters. Which provides a clumsy model. For this reason it is useful to make certain
assumptions. Some simplification of the model will assume that H t depends only on its lagged
values and lagged residuals [Bera1997]. Thus, the model corresponds to:

The required conditions for the model are C s >0 , C f >0 , C s C f - C sf 2 >0 , γ ss >0 , γ ff >0
, γ ss γ ss - γ sf 2 >0 .

^9

An absence of this condition will violate that H t is positive definite.

Another simplification of H is assuming that the conditional correlation among epsilon is constant
[Bollerslev1990]. Such a model can be expressed as follows:

H t =[ h ss,t 2 h sf,t 2 h sf,t 2 h ff,t 2 ]=[ h s,t 0 0 h f,t ][ 1 ρ sf ρ sf 1 ][ h s,t 0 0 h f,t
]. (18)

The correlation coefficient thus does not depend on time in addition &smid; ρ sf &smid; <1
alongside the parameters h s,t 2 and h f,t 2 are standard univariate GARCH process 17. The model
modification is quite pleasant, however it is questionable whether the empirical data are
consistent with the assumption of correlation [Bera1997].

Another adaptation of the model was introduced by [Engle1995]:

H t =[ c ss c sf c sf c ff ]+ [ γ ss γ sf γ sf γ ff ] T [ &epsi; ss,t-1 2 &epsi; s,t-1 &epsi; f,t-1
&epsi; s,t-1 &epsi; f,t-1 &epsi; ff,t-1 2 ][ γ ss γ sf γ sf γ ff ] + [ δ ss δ sf δ sf δ ff ] T H
t-1 [ δ ss δ sf δ sf δ ff ].

The model guarantees a positive definiteness of the matrices. In the case that Γ and Δ are 0, then
H t becomes a constant conditional covariance [Myers1989]: H t =[ c ss c sf c sf c ff ].

For the estimation of the hedge ratio the presumption in 18 will be applied. Thus, the hedge ratio
will be generated by the BGARCH process and the expression states for:

h t * = h sf,t 2 h ff,t 2 .

2.5 Wavelet

For a long time, scientific attention in the processing of financial time series was focused only
on mathematical or statistical methods. However, in recent years some innovation in finance can be
observed. One of the new areas in finance is the application of tools for signal analyzing. The
spectral analysis and Fourier transformation could be assigned as one of the unconventional tools
applied in processing financial data [Box2015].

From the perspective of spectral analysis it is possible to examine the frequency of behavior in
the time series. A motivation for why signal processing methodology may be applied could be the
predictive ability. The results of spectral analysis can be used in various areas. Among others, in
risk management [Acerbi:2002]. The utilization was based on the findings of previous research. In
physics and astronomy, spectral analysis examines a light signal and its decomposition. It could be
assumed that similar structural changes may occur in a financial time series corresponding to
different frequencies [Ozaktas2001].

Usually modified data is used for spectral analysis, i.e. data that is stationary. In principle
this methodology composites the input data into a set of frequency bands [Bloomfield2004]. The
interdependence of data is examined on the level of the evaluated spectrum. The principal
disadvantage of the approach is that it can not match the structural changes to the point where it
was held [Nason:1995]. In other words, the examined data is decomposed without the time component.

^10

There exists a modification of Fourier Transform, which could incorporate time location. For
instance, so called Piece Forier Tansform. But still it could be problematic neider in the view of
power multiresolution or by cathing low frequencies if the pieces are too small.

However, for the examination of financial time series it represents a major shortage [Gencay2001].

Nevertheless, as mentioned, the spectral analysis can only work with a stationary date. If the data
is not stationary, then it is necessary to transform it. This may cause the loss of important
information. Fortunately, there is a tool in the field of signal processing that can eliminate the
mentioned shortcomings. The tool is called wavelet transformation. The wavelets are functions with
specific claims [Vidakovic:2009].

^11

For instance the integration goes to zero, i.e. waves above and below the x-axis are in sum zero.
Other reqirement is that an easy calculation of direct and inverse wavelet transform exists.

Similarly, like the Fourier Transform any function can be depicted via the sine and the cosine, any
function could be represented by wavelets. Unlike the Fourier Transform the wavelet analysis
maintains the time component. Since, the wavelet function has time-frequency localization property
[Chan1999]. It can processes non-stationary data and it is able to examine the interdependence in
data [Percival:2008]. These assumptions make the wavelet an appealing apparatus for application in
hedging.

Analyzing data via the wavelet methodology is a relatively new discipline, especially in the field
of finance. However, the methodology is not a result of ultimate scientific contribution. According
to the documented literature, the concept of wavelet dates back to the beginning of the previous
century [Graps:1995]. The theoretical foundations can be noted in the work of [Haar1910].

^12

However, Chan noted that the origin of wavelet is linked with the work of Weierstrass (1873)
[Chan1999].

In the next decades the concept was not developed, nor was widely applied. The turn occurs in the
80´s of the previous century due to the merit of Jean Morlet and other physicists and
mathematicians.

^13

For example, Yves Meyer contributed with own wavelet function. Later, Ingrid Daubechies or Stephane
Mallat. Here inherently also arises the origin of wavelet [Mallat1989].

There are distinct types of wavelets. Actually wavelets are functions which are applied to data or
some other functions. The reason for using wavelets is that data after linear decomposition can be
analyzed more efficiently. Although the wavelet transform guarantees a linear expression of other
function, the wavelets are non-linear functions [Ghanem:2001]. Thus, a function could be composed:
f(t)= ∑ k α k ψ k (t).

Here k represents an integer for finite sum, eventually infinite sum, α k are the coefficients for
expansion of real value and ψ k is an expansion set representing a group of real-valued functions
[Percival:2008].

In a speech of wavelet a signal or a function breaks down as follows: x t = ∑ k s J,k φ J,k (t)+ ∑
k d J,k ψ J,k (t)+ ∑ k d J-1,k ψ J-1,k (t)+ … + ∑ k d 1,k ψ 1,k (t), (19)

where the function φ (.) is so called father wavelet and the function ψ (.) stands for mother
wavelet function, with the coefficients S J,k = ∫ ψ J,k (t) x t &InvisibleTimes; &DifferentialD; t
and d J,k = ∫ ψ J,k x t &InvisibleTimes; &DifferentialD; t , j=1,2,3 … ,J and concurrently the
number of scales J=lo g 2 n , where n is the number of data points and k ranges from 1 to the
number of coefficients in the given component. For more information see [Percival:2008]. So the
concept of wavelet is based on multi-scale decomposition, multi-scale analysis, or
multi-resolution. The decomposition is intended into Hilbert space L 2 (R) [Daubechies1992]. As
indicated by Francis and Sangbae a two-dimensional family of functions is reflected in the basic
scaling function using scaling and translation in the following manner [In2013]: φ J,k (t)= 2 - j 2
φ ( 2 -2 t-k )= 2 - j 2 φ ( t- 2 j k 2 j ). (20)

The expression 2 j represents a sequence of a scales, also called a scale factor, and 2 j k
represents a translation or shift parameter. The scale factor partitions the frequency. The scaling
function bridges a space vector over k: S j =Span{ φ k ( 2 j t ) }.

Then the following conditions must be met: … ⊂ S -2 ⊂ S -1 ⊂ S 0 ⊂ S 1 ⊂ … ⊂ L 2 . (21)

Hence, the multi-resolution analysis allows the analysis of time series into each of the
approximation subspaces S j . The multi-resolution equation represents the following relation: φ
(t)= ∑ k g(k) 2 φ ( 2t-k ),k ∈ Z, (22)

where g(k) is low-pass filter or scaling function coefficients and represents a sequence of real or
complex numbers [Burrus1997]. Determining the scaling function is the first step in establishing
the wavelets, since they could be considered as a weighted sum of shifted scaling functions. Then
the relation is declared as: ψ (t)= ∑ k h(k) 2 φ ( 2t-k ), (23)

where h(k) is a high-pass filter. Subsequently, the mother function corresponds to: ψ j,k (t)= 2 -
j 2 ψ ( 2 -j t-k )= 2 - j 2 ψ ( t- 2 j k 2 j ). (24) But then it must be inevitably true that any
time series x t ∈ L 2 can be expressed as a series expansion in terms of scaling function and
wavelets 19: f(t)= ∑ k=- ∞ ∞ s(k) φ k (t)+ ∑ j=0 ∞ ∑ k=- ∞ ∞ d( j,k ) ψ j,k (t).

And thus confirming the property that any function can be expressed by a linear combination of
wavelet or the scaling function respectively. However, some conditions must be satisfied to apply
the wavelet concept. According to [Mallat1989] the multi-resolution analysis could be applied on
all square integrable functions in space L 2 . Further, the assumptions include admissibility,
vanishing moments and orthogonality [Kim:2005]. The first prerequisite is required more in
theoretical applications.

^14

The admissibility is satisfied if: C ψ = ∫ ∞ ∞ |H(w)| w &InvisibleTimes; &DifferentialD; w < ∞ ,
where H(w) represents the Fourier transforms with frequency w of ψ (t) in continuous wavelet
transform.

Vanishing moments are satisfied from the 24. The orthogonality is fundamental for the wavelet
transformation [Grossmann1984]. From the relation of the given father 20 and mother 24functions the
orthogonality is satisfied: langle φ ( .-k ), φ ( .-l ) rangle=0,k,l ∈ Z. For the continuous
wavelet transformation the assumption of orthogonality would be expressed according to
[Vidakovic2009]: ∫ ψ j,k . ψ j ˜ , k ˜ =0, where j= j ˜ and k= k ˜ do not satisfy
contemporaneously. And finally with regard to the assumption in 21 the orthogonality must satisfy
[Tang2010]: L 2 = S 0 ⊕ D 1 ⊕ D 2 ⊕ D 3 … .

Apparently the affinity of S 0 to the wavelet space corresponds to: S 0 = D - ∞ ⊕ … ⊕ D -1 .

Alternatively, the wavelet could be used for determining the low-pass filter from 22 and high-pass
filter from 23. Thus, a continuous function may break out into: g(h)= 1 2 ∫ φ (t) φ ( 2t-k )
&InvisibleTimes; &DifferentialD; t , h(k)= 1 2 ∫ ψ (t) ψ ( 2t-k ) &InvisibleTimes; &DifferentialD;
t ,

eventually: h(k)= ( -1 ) k g(k).

Even with the use of a low-pass and high-pass filter, it is essential to ensure the main
assumptions which are the value of mean: ∑ k=0 J-1 h k =0,

the prerequisite of unit energy: ∑ k=0 J-1 h k 2 =1,

and the already discussed orthogonality: ∑ k=0 J-1 h k h k+2n =0,n ∈ Z ⊥ n ≠ /0.

For the purpose of the thesis 1 will be applied to find the optimal hedge ratio. With respect to
the sample size restriction, which must be a multiple integer of 2 J , the maximum overlap discrete
wavelet transform (MODWT) will be considered. For the hedging purpose it is necessary to find the
variance of decomposition spot prices and covariance of decomposition spot and futures prices. If
there is a stochastic process {X} and the sample size is divisible by 2 J then applying the
discrete wavelet transform the wavelet coefficient can be expressed from the high-pass filter in
pyramid algorithm [Mallat1989]:

d j,t = ∑ k=0 J-1 h j,k X t-1 ,

and the scaling coefficient analogous from the low-pass filter: s j,t = ∑ k=0 J-1 g j,k X t-1 .

Apparently, there will be N 2š scaling and wavelet coefficients. However, the compliance of
discrete wavelet transform (DWT) conditions is rather complex [Serroukh2000]. Therefore the MODWT
seems to be more convenient for released orthogonal assumption. Thus, the wavelet and scaling
coefficients are [Percival:2006]: d ˜ j,t = 1 2 j 2 ∑ k=0 J-1 h ˜ j,k X t-1 ,

and s ˜ j,t = 1 2 j 2 ∑ k=0 J-1 g ˜ j,k X t-1 .

Hence the wavelet and the scaling filters are obtained from [Percival:2006]: h ˜ j,k = h j,k 2 j 2
,

and g ˜ j,k = g j,k 2 j 2 .

Consequently, it is possible to write down the variance. The wavelet variance at scale j has the
following relation with the stochastic process {X} : ∑ j=1 ∞ σ X,j 2 = σ X 2 .

It is obvious that σ represents the contribution of variance at scale j to the overall variance.
The given property enables the decomposition of the variance into components that are associated
with certain time scales [Gallegati:2008].

^15

So it could be defined the spectral density function S(.) , since it is true: σ X 2 = ∫ - 1 2 1 2 S
X (f) &InvisibleTimes; &DifferentialD; f .

Hence, according to MODWT assumptions, an unbiased estimator of the wavelet variance may be
obtained: σ ˜ X,j 2 = 1 N ˜ ∑ T=Lj N d ˜ j,t 2 ,

where N ˜ =N- L j +1 represents the maximum overlap of coefficients at scale j . Further, the
length of the wavelet filter at scale j is L j =( 2 j -1 )( L-1 )+1 [Craigmile2005]. The covariance
is needed to determine the futures/spot ratio. The formula for wavelet covariance of two random
variables X and Y at the scale j can be expressed according to [Vannucci1999]:

σ XY,j = 1 N ˜ ∑ t=Lj-1 N-1 d ˜ j,t X d ˜ j,t Y .

2.6 Copula

One of the most important financial topics or issues in risk management is the quantification of
overall risk. In other words, the aggregation of individual risks. However, a simple summation
should be considered only if the assets are not dependent. Likewise, the relation must not be
strictly linear. In such a case the expression of the joint risk becomes more complex, because the
joint distribution is mostly unknown. If a wider set of dependencies between multiple assets should
be expressed, a Gaussian behavior is commonly assumed. Then the information of dependences is
displayed in a covariance matrix.

The investigation of dependence in quantitative interpretation goes deep into history. According to
the preserved information, data maps were already being applied in the seventeenth century to
describe monsoon rains [Dorey2005]. The introduction of the correlation concept in 1888 by Francis
Galton was crucial for the field of finance [Hauke2013]. Later [Pearson1896] worked out the concept
of bi-variate normal correlation in 1896.

^16

More about the history of the correlation could be found in [Hauke2013].

The Pearson moment-product correlation corresponds to: ρ XY = E[ ( X- μ X )( Y- μ Y ) ] σ X σ Y .
(25)

Since the Pearson product moment correlation can only be used if some strong requirements are met,
an another methodology that relaxes these assumptions was introduced, the so called Spearman rank
correlation [Spearman1904]. Thus, 25 could be applied to ranked data or expressed by its own
formula: r s =1- 6 ∑ i=1 n ( p i - q i ) 2 n( n 2 -1 ) , where p i and q i are ordered values from
any random variables. Spearman´s model is a non-parametric measurement of dependence. It reflects
rather the strength of monotonic relations. A similar concept based on ordered data represents the
rank correlation introduced by [Kendall1948]. The formula then corresponds to: τ = C p - D p n( n-1
) 2 ,

here C p stands for concordant pairs and D p represents the number of discordant pairs from the
random variable X and Y , with n pairs ( x 1 , y 1 ),( x 2 , y 2 ), … ,( x n , y n ) . Thus, a
concordant pair is present if x i > y j ⊥ y i > y j or a pair is said to be discordant if x i > y i
⊥ x i < y j or x i < y i ⊥ x i > y j and simultaneously i ≠ /j .

Nevertheless, until the fifties the Pearson correlation was applied in finance. Moreover, the
relation was examined as a cross-section data analysis. However, the problem of temporal
correlation began to be gradually more significant [Longin1995].

Many financial and econometric models are based on strong assumptions about distribution. Normal
distribution is commonly assumed. Financial tools like the Capital Asset Pricing Model and the
Arbitrage Pricing Theory are also based on these assumptions. However, the real data mostly does
not fulfill such requirements. For instance, asset returns are fat-tailed [Rachev:2005].
Additionally, if the dependence of two random variables is examined, it is assumed that they have
identical distributions. But empirical observations disprove this assumption often.

The above mentioned shortcomings initiated new development of risk models. Among others, the copula
concept, which was introduced to the field of finance in the beginning of the 21st century. The aim
of scientists was to highlight the deficiencies in the application of Pearson´s correlation
[Bouye2000]. Apart from normal distribution, it included the inability to work with time-varying
volatility. Longin1995 Eventually, it is not possible to correctly deal with the problem of
heteroskedasticity in data. Loretan2000 Furthermore, some articles pointed out the relationship in
extreme values [Embrechts2001]. Primarily, the ability of copula to describe the relation between
assets in extreme events like periods of crises advocated the application of copula
[Rockinger2001], [Hartmann2004].

^17

Economists and financial market participants had begun to notice that financial markets were
becoming more interdependent during the financial crisis. The attention was focused for instance to
the Mexican Tequila crisis (1994-1995), the Asian flue crisis (1997) or the Russian default crisis
(1998) [Calvo1999], [Corsetti1999], [Scholes2000].

Especially the crises caused by balance of payment [Costinot2000]. Copula gained considerable
popularity in the application of contagion [Rodriguez2007]. Paradoxically, one of the models from
the copula family has been blamed for being one of the causes of the recent global financial crisis
[Salmon2012].

^18

Salmon highlighted the genius mathematician David X. Li, who presented to the financial world the
formula of Gaussian Copula Function "The formula that killed Wall Street". However, in connection
with the US mortgages the saying about fire was appropriate. It could be a good servant but a bad
master, if it is not in the right hands. Especially this was true for the financial alchymie that
had produced the “AAA-rated” products. More about moral hazard and mortgage backed securities can
be found in [Brunnermeier2009], [Crotty2009], [Jacobs2009].

The birth of the statistical tool copula can be found in the fifties of the previous century. The
basics of copula were given presumably in contrition of the work on bivariate and trivariate
distributions with given univariate margins [Frechet1951], [Dall1956]. Nevertheless, the real
breakthrough was the work of [Sklar1959]. He introduced a new function with the name copula, that
joins multiple distribution functions to a joint one-dimensional marginal distribution function
[Nelsen1991].

^19

Sklar used the Latin word copulare which could be translated as joint together.

The initial impetus for finding copula was a study on probabilistic metric space on a theoretical
level [Sklar1959]. The first statistical application was first realized in the eighties
[Schweizer1981]. Some scientific articles in this period dealt with the question of whether there
exists a linkage between the above given metrics of dependence [Frees1998].

^20

[Schweizer1981]showed that copula could be applied to express Kendall´s τ : τ =4 ∫ ∫ [ 0,1 ] 2 [
0,1 ] 2 C( u 1 , u 2 ) &InvisibleTimes; &DifferentialD; C ( u 1 , u 2 )-1 with C as an associated
copula for a joint distribution. Nelson then managed to express the relation between copula and
Spearman rank-order correlation coefficient [Nelsen2007]: r s =12 ∫ ∫ [ 0,1 ] 2 [ 0,1 ] 2 u 1 u 2
&InvisibleTimes; &DifferentialD; C ( u 1 , u 2 )-3 . Finally, a linkadge between copula and the
Pearson moment-product correlation was also proved [Nelsen1991]: ρ XY = 1 σ X σ Y ∫ ∫ [ 0,1 ] 2 [
0,1 ] 2 [ C( u 1 , u 2 )- u 1 u 2 ] &InvisibleTimes; &DifferentialD; Φ 1 -1 ( u 1 )
&InvisibleTimes; &DifferentialD; φ 2 -1 ( u 2 ) .

In fact there are more copula functions. Actually, copulas represent instruments which can describe
the dependence properties of multivariate random vectors, and can also identify the relation
between the joint distribution and the marginal distributions. Consider a pair of continuous random
variables ( X,Y ) with their marginal cumulative distribution functions:

F(x)=P( X ≤ x )andG(y)=P( Y ≤ y ),

such that ∀ x,y ∈ R and a joint cumulative distribution function:

H( x,y )=P( X ≤ x,Y ≤ y ),

here F(x) , G(y) are marginal cumulative distribution functions both in interval I =[ 0,1 ] and H(
x,y ) is a joint cumulative distribution function in I . According to Sklar Theorem, if there is a
joint distribution function H( x,y ) , then there exists a copula C(u,v) with the bivariate uniform
distribution [Sklar1959]: C( u,v )=P( U ≤ u,V ≤ v ),u,v ∈ [ 0,1 ].

Furthermore, the joint distribution function could be expressed trough the copula: H( x,y )=C(
F(x),G(y) )=C( u,v ).

Thus, there are transformed variables U=F(X)andV=G(Y) both in the interval I .

Conversely, the copula could be defined by the inversion distribution function: C( u,v )=H( F -1
(u), G -1 (v) ), (26)

where F -1 is the pseudo-inverse of F and G -1 is the pseudo-inverse of G or the quantile functions
of margins. In this point of view, copula transforms the random variables X and Y into other random
variables ( U,V )=( F(x),G(y) ) , which have margins in I . But it is important that the dependence
will be preserved. In other words, copula allows the dependence within random variables with
distinct marginal distributions to be analyzed.

It is apparent that the copula function is from the domain I 2 to I .

^21

It stands for bivariate copula, but d-dimensional copula could be considered as well.

The copula function has to fulfill the consequent properties [Nelsen2013]:

·         For every u,v in I

^22

After this conditions are met, the function is called grounded [Nelsen2013].

C( u,0 )=0 &xwedge; C( v,0 )=0,

and C( u,1 )=u &xwedge; C( v,1 )=v.

·         For every u 1 , u 2 , v 1 , v 2 in I such that u 1 < u 2 and v 1 < v 2 , C( u 2 , v 2
)-C( u 2 , v 1 )-C( u 1 , v 2 )+C( u 1 , v 1 ) ≥ 0. This property guarantees that the C(.) is
2-increasing or quasi-monotone.

^23

Analogous it could be considered d-increasing. The 2-increasing function is analog to
non-decreasing one-dimensional function.

·         Copula is a Libschnitz function: |C( u 1 , v 1 )-C( u 2 ,v2 )| ≤ | u 1 - v 1 |+| u 2 - v
2 | . This condition ensures that the copula function is continuous in its domain.

^24

More on Libschnitz function in [Mao2003].

For the purposes of the thesis, the random variables were closing spot and futures prices. Since
there are large number of copula functions, it is crucial to first select the appropriate copula
function. The R-package VineCopula was applied to determine the corresponding copula on the three
analyzed commodities [Brechmann2013]. The logarithmic returns were examined by all three
commodities. Hence, the results selected the t-copula as the appropriate one from the copula family
in all three cases.

^25

An another copula will be applied in the case when sample data are just closing prices. Then the
Clayton copula was identified for WTI, t-copula for HH and the BB1-copula for CAPP. More about BB1
see in [Joe2014].

Bivariate Student´s t-copula

It would be advisable to begin with the canonical univariate Student´s t distribution. The t
distribution seems to be more appropriate to the real data then the normal distribution. The reason
for this statement is that the t distribution unlike the normal distribution has heavier tails
[Ruppert2004]. Then the Student´s t distribution is given by the probability density function f ν t
(x) : f ν t (x)= Γ ( ν +1 2 ) π ν Γ ( ν 2 ) ( 1+ x 2 ν ) -( ν +1 2 ) ,- ∞ <x< ∞ ,

with the parameter ν

^26

( ν ≠ /v )

identifying the degrees of freedom simultaneously it must hold ν >0 and the Γ (.) is the Euler
Gamma function [Cherubini2011].

^27

The Euler gamma function represents a function Γ : R + → R + , that is defined: Γ ( α )= ∫ 0 + ∞ x
α -1 e -x &InvisibleTimes; &DifferentialD; x [Cherubini2004].

Thusly the bivariate correlated t distribution corresponds to: f 2, ν t ( x,y )= 1 2 π 1- ρ 2 ( 1+
x 2 + y 2 -2 ρ xy ν ( 1- ρ 2 ) ) -( 1+ ν 2 ) ,

here ρ is the correlation coefficient 25 [Winer1971]. Respecting 26 then the bivariate Student´s t
copula corresponds to the following equation: C ρ , ν t ( u,v )= ∫ - ∞ t ν -1 (v) ∫ - ∞ t ν -1 (u)
f( t 1 , t 2 ) &InvisibleTimes; &DifferentialD; t 1 &InvisibleTimes; &DifferentialD; t 2 ,

where f( t i ) represents the density function of Student´s t distribution and t ν -1 denotes the
quantile function of the standard univariate Student´s t distribution [Demarta2005]. Hence, the
copula density function according to [Embrechts2001] is: c ρ , ν t ( u,v )= f ρ , ν t ( F ν -1 (u),
F ν -1 (v) ) f ν t ( F ν -1 (u) ) f ν ( F ν -1 (v) ) ,u,v ∈ I .

In the above equation F ν -1 (.) denotes the quantile function of the marginal t distribution with
ν degrees of freedom and f ρ , ν t is the joint density function.

The properties of T copula are determined by the underlying Student´s t distribution:

·         The copula is symmetric,

·         It belongs to the elliptical copula family,

^28

For more information see [Nelsen2007].

·         The T copula exhibits tail dependence.

The hedge ratio is calculated from the copula covariance matrix generated from simulated data.
After the copula is selected, the required procedure for obtaining the covariance matrix can be
initiated. The algorithm consists of sampling the multivariate Student´s distribution with an
appropriate correlation matrix R [Fantazzini2004]. Then each margin is converted, employing the
probability integral transformed with the t distribution function. According to [Embrechts2001] the
algorithm supposes the following steps:

·         Find Cholesky decomposition L for the correlation matrix R ,

^29

Cholesky decomposition of R represents the unique lower-triangular matrix L, such that L L T =R
[Higham1988].

·         Generate a vector Z with p independent random variables Z =( z 1 , z 2 , … , z p ) ∈ N(
0,1 ) ,

·         Simulate a random variate s from χ 2 such that is independent of Z ,

·         To obtain a p-variate normal random variable with correlation matrix R , it must be set
y= LZ ,

·         Set x= ν s y ,

·         Set u i = t ν ( x i ),i=1,2, … ,p , here t ν represents the univariate cumulative t
distribution with ν degrees of freedom,

·         Then a sample of T copula with ν distribution function and correlation structure R could
be denoted as ( u 1 , u 2 , … , u p ) T ∼ C R, ν t .

In the analysis of the thesis, the number of iterations corresponded to the number of observations
in the examined data. Afterwards, the covariance matrix from the simulated data was used to
calculate the optimal hedge ratio in accordance with 1 .

2.7 Mean Extended Gini Coefficient

The concept of mean-variance portfolio theory is based on the normality in returns [Markowitz2000].

^30

Other eventuality could be a utility function of decision makers to be quadratic [Lien2002].

However, the empirical data does not fulfill this assumption. Taking into account the absence of
the attitude to risk aversion of decision makers, the stochastic-dominance rules can not be adhered
to. Providing the required conditions in accordance with the stochastic-dominance theories is
arduous, even when the assumptions of maximizing expected utility are fulfilled [Chen2013a]. The
Gini´s mean approach solves some shortages, which other models can not treat, and ultimately
provides a framework respecting the stochastic-dominance theory [Yitzhaki1983]. The Gini´s
coefficient was initially used for the analysis of wealth distribution in society [Gini1921].
However, this mathematical apparatus was later applied to risk evaluation too [Yitzhaki1982].

Since, the Gini´s methodology incorporates an area for measuring the variability of the random
variable. From the perspective of portfolio theory, the Gini´s mean difference could be applied on
a random return. Thus, let R be a random return of a portfolio falling into the interval langle a,b
rangle , F(.) and f(.) is the distribution function and the density function of R, respectively.
Further, F(a)=0 and F(b)=1. Then, if Γ is the sign for Gini´s mean difference, the equation
corresponds to: Γ = 1 2 E{ &smid; R 1 - R 2 &smid; },

alternatively: Γ = 1 2 ∫ a b ∫ a b &smid; r 1 - r 2 &smid; f( r 1 )f( r 2 ) &InvisibleTimes;
&DifferentialD; r 1 &InvisibleTimes; &DifferentialD; r 2 . (27)

Under a condition that both arguments R 1 and R 2 are independent and evince the same distribution
like R [Lien1993]. Essentially, Γ can be estimated in following way: Γ = ∫ a b [ 1-F(r) ]
&InvisibleTimes; &DifferentialD; r - ∫ a b [ 1-F(r) ] 2 &InvisibleTimes; &DifferentialD; r .

Afterwards, the variance of R is: Γ = 1 2 E[ ( R 1 - R 2 ) 2 ],

alternatively: Γ = 1 2 ∫ a b ∫ a b ( r 1 - r 2 ) 2 f( r 1 )f( r 2 ) &InvisibleTimes;
&DifferentialD; r 1 &InvisibleTimes; &DifferentialD; r 2 .

[Shalit1995]refined the original model and enhanced it by the measure of risk aversion. The newly
defined model, Mean-extended Gini (MEG), with the parameter of risk aversion v , where 1 ≤ v< ∞ has
the following form: Γ (v)= ∫ a b [ 1-F(r) ] &InvisibleTimes; &DifferentialD; r - ∫ a b [ 1-F(r) ] v
&InvisibleTimes; &DifferentialD; r ,

and after modification according to [Lerman1984]: Γ (v)= μ -a- ∫ a b [ 1-F(r) ] v &InvisibleTimes;
&DifferentialD; r .

In the case of a risk-neutral investor the parameter v=1 and then Γ (1)=0. Equally, the situation
v=2 represents a special case of mean-extended Gini, where it corresponds to the Gini´s mean
difference 27, so Γ (2)= Γ . Obviously, if v increases indefinitely, the term will reduce to > lim
x → ∞ → Γ (v)= μ -a . Presumably, the process of computing Γ from the equation 27 will be onerous.
Therefore, a viable solution for Γ (v) was proposed by [Shalit1984] in the following form: Γ
(v)=-vCov( R, [ 1-F(R) ] v-1 ).

The modification above allows for a convenient calculation. The application of Extended mean Gini´s
Coefficient allows the satisfaction of the first and second degree of stochastic dominance
[Hey1980]. Hence, it is a proper tool for hedging purposes. A proof of satisfying the conditions
can be introduced with the following statement. λ n = ∫ a b [ 1-F(r) ] n &InvisibleTimes;
&DifferentialD; r - [ 1-G(r) ] n dr,n=1,2,3, … , (28)

where F(.)andG(.) are distribution functions of return in a portfolio A and a portfolio B . If B is
dominated by A, then the necessary conditions for the first and the second stochastic dominance are
λ n >0,n=1,2,3, … [Yitzhaki1982]. After integration by part 28 and n=1 the equation is: λ 1 = μ A -
μ B ,

here μ A is the mean of portfolio A and μ B is the mean of portfolio B . Simultaneously, let Γ A
(v) and Γ B (v) are the extended Gini´s coefficients, so after that for n=2 the expression takes
the following form: λ 2 =( μ A - Γ A (2) )-( μ B - Γ B (2) ). (29)

Then μ A > μ B guarantees the first-degree stochastic dominance and μ A - Γ A (2)> μ B - Γ B (2)
garantees the second-degree stochastic dominance [Levy1992]. Unless, the Γ (v) is used as
measurement of risk, then the optimal hedge ratio could be found by minimizing the Γ (v)
[Kolb1992]. Although as evidenced by Kolb and Okunev the optimal hedge ratio can also be obtained
by maximizing [Kolb1993].

^31

In the approach of maximization the utility function is given by: E{ U(R) }= μ - Γ (v) . The hedge
ratio will be obtained from the derivation with respect to h . Thus, let r i = r si -h r fi be a
return of portfolio consisting of spot return r si and futures return r fi and hedge ratio h .
Then, with the empirical distribution function F ˆ (.) the extended Gini´s Coefficient will be: Γ
(v)=- v N { ∑ i=1 N r i [ 1- F ˆ ( r i ) ] v-1 -( ∑ i=1 N r i N )( ∑ i=1 N [ 1- F ˆ ( r i ) ] v-1 )
} [Lien2002].

[Shalit1995]examined the value of a portfolio. A rational investor prefers a larger value of a
portfolio over a smaller one. The value of a portfolio V pt at a given time t could be expressed: V
pt = P st +h( P ft-1 - P ft ),

where P s , P f are prices of spot and futures, respectively. [Shalit1995] determined the optimal
hedge ratio with the following formula: h * = Cov( P s , [ 1-G(V) ] v-1 ) Cov( P f , [ 1-G(V) ] v-1
) ,

taking G(.) as a distribution function of V and r s and r f are returns of spot and futures.
Assuming that the distribution function of r f is similar to G(V) , since the empirical ranking of
V p should be alike P f , then the reliable estimate of h * ˆ could be stated in the act of: h * ˆ
= ∑ i=1 n ( r si - r s &OverBar; )( z i - z &OverBar; ) ∑ i=1 n ( r fi - r f &OverBar; )( z i - z
&OverBar; ) ,

where z i = [ 1-G( r fi ) ] v-1 [Lien2000]. Since v=2 satisfies the conditions 29 then the
calculation in the practical part based on MEG will only consider v=2 .