“Hierarchical Structure in Financial Market” whit R

“Hierarchical Structure in Financial Market whit R”

Francisco Javier Parra Rodríguez

Universidad de Cantabria (UNICAN), España


In R-Pub:

Hierarchical Structure in Financial Market

##Hierarchical Structure in Financial Market

Log-returns of different assets display a high cross-dependence, even across industries and asset classes. This is generally explained in terms of synchronizations among market participants, due to common flows of information and overlapping investment strategies. Since the seminal work by Mantegna [1], correlation-based networks, it has been observed that the structure of such networks contains significant economic information, related to the industrial sectors classification,but it also conveys important independent information [2].

In Mantegna, financial markets are well defined complex systems. The paradigm of mathematical finance is that the time series of stock returns are unpredictable, Within this paradigm, time evolution of stock returns are well described by random processes. A key point is if the random processes of stock returns time series of different stocks are uncorrelated or, conversely, if common economic factors are present in financial markets and are driving several stocks at the same time. Investigating the portfolio of the stocks used to compute the Dow Jones industrial average index and the portfolio of stocks used to compute the Standard and Poor’s 500 index in the time period from July 1989 to October 1995,she find a topological arrangement of stocks traded in a financial market which has associated a meaningful economic taxonomy. The topological space is a graph connecting the stocks of the portfolio analyzed. The graph is obtained starting from the matrix of correlation coefficient computed between all pairs of stocks of the portfolio by considering the synchronous time evolution of the difference of the logarithm of daily stock price. The hierarchical tree of the subdominant ultrametric space associated with the graph provides information useful to investigate the number and nature of the common economic factors affecting the time evolution of logarithm of price of well defined groups of stocks[1].

For both portfolios, the correlation coefficient $\rho_{i,j}$ for daily logarithm price differences ($Y_i=ln P_{i(t)}-ln P_{i(t-1)}$) can vary from -1 (completely anti-correlated pair of stocks) to 1 (completely correlated pair of stocks), when is 0 the two stocks are uncorrelated. The $n x n$ matrix of correlation coefficients is a symmetric matrix with ones in the main diagonal. In Mantegna [3] are an investigation of the statistical properties of the set of correlation coefficients.

The correlation coefficient of a pair of stocks cannot be used as a distance between the two stocks because it does not fulfill the three axioms that define an Euclidean. Mantengna [2] using as distance an appropriate function of the correlation coefficient: $1-\rho_{i,j}$.

The distance matrix can beis used to determine the minimal spanning tree connecting the n stocks of the portfolio. The minimal spanning tree (MST) is attractive because provides an arrangement of stocks which selects the most relevant connections of each point of the set. (Mantengna,1998)
##Hierarchical clustering in R

The generic function “hclust”, performs a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered. Initially, each object is assigned to its own cluster and then the algorithm proceeds iteratively, at each stage joining the two most similar clusters, continuing until there is just a single cluster. At each stage distances between clusters are recomputed by the Lance-Williams dissimilarity update formula according to the particular clustering method being used.

A number of different clustering methods are provided in this function: Ward’s minimum variance method,the complete linkage method and the single linkage method, amonng others. The hclust function in R uses the complete linkage method for hierarchical clustering by default.

The distance measure to be use are computed whit the generic fuction “dist”. This must be one of “euclidean”, “maximum”, “manhattan”, “canberra”, “binary” or “minkowski”. The dist function in R use ecuclidean distance mesure by default.
The “dist” method of as.matrix() and as.dist() can be used for conversion between objects of class “dist” and conventional distance matrices.

And, as.hclust(), converts objects from other hierarchical clustering functions to class “hclust”.
The package “vegan” include a function, spantree, that finds a minimum spanning tree (MST) connecting all points, but disregarding dissimilarities that are at or above the threshold or NA.[4]

Minimum spanning tree is closesly related to single linkage clustering, a.k.a. nearest neighbour clustering, and in genetics as neighbour joining tree available in hclust and agnes functions. The most important practical difference is that minimum spanning tree has no concept of cluster membership, but always joins individual points to each other. Function as.hclust can change the spantree result into a corresponding hclust object.

The historical monthly return data from December 1977 through December 1987, can be downloaded from Berndt’s The Practice of Econometrics[5]. Here is the csv file of the returns: http://web.stanford.edu/~clint/berndt/

setwd(“~/Word Press/Econometria aplicada/Hierarchical Structure”)

# requires data in file berndt.csv

# read prices from csv file
berndt.df = read.csv(file=”berndtc1.csv”,sep=”;”,dec=”,”, stringsAsFactors=F)


# create zooreg object – regularly spaced zoo object
berndt.z = zooreg(berndt.df[,-1], start=c(1978, 1), end=c(1987,12),frequency=12)
index(berndt.z) = as.yearmon(index(berndt.z))

# note: coredata() function extracts data from zoo object
returns.mat = as.matrix(coredata(berndt.z))

# create the correlation coefficients

coef.corr <- cor(returns.mat)
coef.d <- (1-coef.corr^2) # compute distance (Mantegna, 1998)

# hierarchical cluster whir hclust

d <- as.dist(as.matrix(coef.d)) # find distance matrix
#Function spantree finds a minimum spanning tree (MTA) connecting all points, but disregarding dissimilarities that are at or above the threshold or NA.

tr <- spantree(d)
## Add tree to a metric scaling
plot(tr, cmdscale(d), type = “t”)
## Find a configuration to display the tree neatly
plot(tr, type = “t”)
## Depths of nodes
depths <- spandepth(tr)
plot(tr, type = “t”, label = depths)
## Plot as a dendrogram
cl <- as.hclust(tr)



[1] Mantegna, R. N., “Hierarchical Structure in Financial Markets” in arXiv:cond-mat/9802256 [cond-mat.stat-mech] (or arXiv:cond-mat/9802256v1 [cond-mat.stat-mech] for this version) (1998). https://arxiv.org/pdf/cond-mat/9802256.pdf

[2]R.J. Buonocore, N. Musmeci1, T. Aste, and T. Di Matteo,()”Two different flavours of complexity in financial data”.Eur. Phys. J. Special Topics 225, 3105-3113 (2016)

[3] Mantegna, R. N., “Degree of Correlation Inside a Financial Market” in [Proc. of the ANDM 97 International Conference], Edited by J. Kadtke, AIP press, (1997).

[4] Community Ecology Package: URL: https://cran.r-project.org, https://github.com/vegandevs/vegan

[5] Berndt’s Ernest R. The Practice of Econometrics .Addison Wesley, 1991, ISBN 0-201-51489-3.(1991).



Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s