6. Visualization of multivariate data
http://www.statistics4u.com/fundstat_eng/wrapnt3EE177_basic_knowledge.html
jessicasmaps.blogspot.com
www.mathworks.com
www.spatialdatamining.org
Multivariate Data
• Consist of multiple types of attributes
– E.g., weight w, height h, shoe size s of randomly
selected sample of people
– The triples (w1, h1, s1), (w2, h2, s2) then form a set
of multivariate data
• Techniques for visualizationof lists and tables
of data that generally do not contain explicit
spatial attributes
Point-Based Techniques
• Scatterplots - projection of data records from
n‐dimensionaldata space to an arbitrary
k‐dimensionalspace of output device
• Data records are mapped onto k‐dimensional
points
• Each record is associatedwith a certain
graphical representation
Scatterplots
• One of the first and most used visualization
techniques used for data analysis
• Data analysis consists of:
1. Search for a subset of input data dimensions
2. Dimension reduction (PCA, multidimensional
scaling)
3. Dimension embedding – mapping dimensions
onto additional graphical atributes (color, size,
shape)
4. Multiple displays – displaying multiple plots
together at once (superimposition, juxtaposition)
Multiple Displays
• Scatterplot matrix
– Grid containing scatterplots
– N2 cells, where N is the number of dimensions
– Each dimension pair is displayed twice – just
rotated by 90°
– Usually symmetric about the main diagonal
– Main diagonal displays
• Description of corresponding dimension or
• Histogram of the given dimension
Násobné zobrazení
Force-Based Methods
• Projection of points from large dimensions
into 2D or 3D space
• Aims to preserve the properties of
N-dimensionaldata while projecting to
different dimension
• Projection can introduce unwantedartefacts
to appear in the resulting visualization
Multidimensional Scaling (MDS)
1. Given dataset consisting of M records and N dimensions,
create MxM matrix Ds containing results of similarity
measurement between individual record pairs
2. Supposing that we want to project the input data to K
dimensions, create MxK matrix L, which contains
placement of projected points
3. Compute MxM matix Ls containing similarity between all
record pairs from L
4. Compute the value stress S by measuring the differences
between Ds a Ls
5. If S is sufficiently small, terminate the algorithm.
6. Else shift the positions of records in L in the direction
which will reduce the stress value
7. Return to step 3
Multidimensional Scaling (MDS)
• Numerous variants of the algorithm exist. The
main differences are in:
– Method for similarity and stress computation
– Definition of start
and end conditions
– Strategy for updating
the position of points
Problems
• Results are not unique – small changes in start
conditionscan lead to different results
• Coordinate system after the projection may
not be easily understandable to the user –
with respect to the dimensions of the original
data
– The most significant are the relative positions of
individual points, rather than their absolute
positions, which may differ from algorithm to
algorithm
RadViz
• Based on Hooke’s law of elasticity for finding
equilibrium position of the point.
• For N-dimensionaldataset, N so‐called
“anchor” points are placed on the
circumference of a circle (for simplicity we
consider a unit circle placed at the origin of
the coordinate system) – these represent fixed
ends of N strings assigned to each data point.
bioinformatics.oxfordjournals.org
RadViz
• For a given normalized vector of data record
and a set of vectors A,
where Aj is the j-th anchor point, we get the
equilibrium equation:
where p is the vector for the point in
equilibrium position and can be found as:
),...,,( 1,1,0, −= Niiii dddD

−
=
=−
1
0
0)(
N
j
jj dpA


−
=
−
=
= 1
0
1
0
)(
N
j j
N
j jj
d
dA
p
RadViz
• Different placement and order of anchor
points leads to different results
• Points with different position in the Ndimensional
space can be mapped to the
same position in 2D space
• These problems concern all the techniques for
projection and dimension reduction
• The simple solution for RadViz is enabling the
user to interact (manipulate) with anchor
points
RadViz
RadViz – analogical definition
• Point in N-dimensional space [y1, y2, …, yn]
• To each anchor point Sj there is attached a
virtual spring of rigidity yj – changing
according to the value of
the given parameter
• All springs are connected
at one point u
• We search for the equilibrium
of the spring system
https://cyber.felk.cvut.cz/research/theses/papers/216.pdf
RadViz
• Algorithm searching for the arrangement of
dimensions on the circumference of the circle
leading to maximal dispersal of the data
Vectorized RadViz (VRV)
• Constructs multiple dimensions for individual input
dimensions
• Similar to sorting data into bins
• Each original dimension is represented with vector
of new dimensions – each new coordinate in this
vector is then either 0 or 1, depending on whether
the given data record contains the value
corresponding to this dimension or not
• For one record, each new vector contains exactly
one dimension with value 1, all the others contain
value 0
Line-Based Techniques
• Records are displayed in such way, that the
corresponding points are connected with
either straight or curved line
• Using additional properties, such as curvature,
crossings, etc.,
lines can display
relationships
between data
www.frontiersin.org
Line Charts
• Visualizationtechnique for single variable,
where vertical axis represents possible range
of variable values and horizontal axis
represents certain ordering of records in a
given dataset
• Extension for
multivariate data
– superimposition,
juxtaposition
Superimposition vs. juxtaposition
www.craniofacial-id.com
www.usenix.org
Line Charts
• Classic line chart for 8-dimensinal dataset vs.
stacked line chart (for each added dimension
the chart of previous dimension serves as the
base)
Line Charts
• Sorting of records by single dimension
Line Charts
• If the dimensions have the same units, it is
possible to use one of the previous techniques
• However, if the individual variables have
different units, it is necessary to use different
approach,e.g.:
– Using multiple vertical axes
– Vertical stacking of charts for individual
dimensions
Parallel Coordinates
• Introduced in 1985 (Inselberg) as a mechanism
for studying the geometry of higher dimensions
• Extending methods for analysis of multivariate
data
• Instead of orthogonal placement, axes are placed
parallelly next to each other
• Data record is depicted as a polyline, which
crosses each axis at the position corresponding to
its value in the given dimension
Parallel Coordinates
Parallel Coordinates
• Interpretation of the chart – we look for:
– Similar lines
– Similar intersections and lines, that are either
isolated or have significantly different tilt than their
neighbours
• Problem: parallel coordinatescan display only
relationshipsbetween pairs of neighbouring
dimensions
• The user can observe the relationships across
all dimensions with the help of interactive
selection and highlighting of records
Parallel Coordinates – Interactive Selection
Parallel Coordinates – Median
• Become too cluttered with large amount of
data
Parallel Coordinates – Enhancements
• Hierarchical parallel coordinates
• Using semi-transparent lines
• Clustering, regrouping
• Grouping data to cluster bands
• Including histogram
• Fitting curves to intersections
• …
Andrews Curves
• Developed in 1972 by David F. Andrews
• Each multivariate data point is
used for generating a curve with formula
–if the number of dimensions is odd, then
the last term is:
–if it is even:
),...,,( 21 NdddD =
...)2cos()2sin()cos()sin(
2
)( 5432
1
+++++= tdtdtdtd
d
tf





 −
t
N
2
1
cos






t
N
2
cos
Andrews Curves
• The order of dimensions influences the
resulting shape of the curve
Andrews Curves
• Smoothing
http://www.mathworks.com/products/statistics/examples.html?file=/products/
demos/shipping/stats/mvplotdemo.html#7
Radial Axis Techniques
• For each technique with horizontal and/or
vertical orientation of coordinate system there
exists equivalent technique using radial
orientation
• Radial line chart
publib.boulder.ibm.com
Radial Techniques
• Radar
• Star chart
• Polar chart
– Displaying
polar coordinates
www.prlog.org
commons.wikimedia.org
www.alteryx.com
Radial Techniques
• Radial column charts
• Radial bar charts
• Radial area charts
debaakies.nl
datavizproject.com
Types of Techniques for Radial Axes
• Concentric circles
• Continuousspiral – does not exhibit discontinuity
at the end of each cycle
• Compared to traditional
bar representation enables
observation of patterns
between elements at the
same position in different
cycles
Techniques for Area Data
• Usage of filled polygons of given size, shape,
color, …
• The aim of some of these techniques is not
showing individual data records, but their
clusters and distribution
• Original designed for univariate data (single
variable) – pie charts and bar charts.
Subsequently extended for multiple
dimensions
red.helios.eu
bidwcz.blogspot.com
Number of customers
Bar Charts/Histograms
• Rectangular columns used for displaying
numerical values
• Effective thanks to human perception ability
to distinguish the length and general linear
properties well
• Textual labels are
assigned to describe
the bars
Bar Charts/Histograms
• Determining the number of necessary bars for
the best data representation is essential
• Given N variables, if N is not too big, we can
use 1:1 mapping
• For displaying summary or distributionof
dataset we can use histogram
• Nominal values – the number of bars is equal
to the number of different values
• Ordinal values – creating intervals of values,
each interval correspondsto one bar
Bar Charts/Histograms
• Multivariate data – stacked bar chart
Cityscapes
• Using 3D blocks instead of 2D rectangles
• Bars placed on a grid, 2 dimensions define the
position, next dimensions the height and color
• Name derived from the appearance –
resembles the buildings in the city
• All cells of the grid
filled = 3D histogram
Problems of 3D Bar Charts
• Partial occlusions
• Possible solutions:
– Enabling the user to rotate the scene
– Decreasing the thickness of the bars
– Changing the opacity of the individual bars
todaycreate.com
Tabular Visualizations
• Multivariate data often in tables
• Heatmaps
– displaying records using color instead of text
– each value is rendered as a colored rectangle
akweebeta.com
Example of Application
www.caver.cz
Tabular Visualizations
• Survey plot
– Instead of color, the size of
the cell depicts the value
– Centres of the cells are
aligned to individual
attributes
– Measurement of area is
more prone to errors than
measurement of length
Tabular Visualizations
• Combinationof aforementioned methods into
level-of-detail technique
http://ds.cc.yamaguchi-u.ac.jp/~ichikay/pfp7/iv/pics/SeeSoft-line.jpg
http://ds.cc.yamaguchi-u.ac.jp/~ichikay/pfp7/iv/pics/SeeSoft-line.jpg
Dimensional Stacking
• Mapping of data from discrete N-dimensional
space to 2D image in such way, that the data
occlusions are minimalized, while the majority
of the spatial information is preserved
Dimensional Stacking
• Dataof 2N+1 dimensions
• Select final cardinality for each dimension
• Select one dimension as dependant variable, the rest of
the dimensions are independent
• Create ordered pairs of independent variables (N pairs)
and assign unique value (speed) to each pair – from 1 to N
• Pair corresponding to speed 1 creates virtual image with
size corresponding to the cardinality of its dimensions
• In each position of this virtual image, new virtual image
corresponding to the dimensions of pair with the speed 2
is created
• The process is repeated, until all dimensions are not
included
Dimensional Stacking
• Begins with discretisation of the range of each
dimension. Orientation and order is then
assigned to each dimension. Dimensions with
two lowest orders are then used to split the
virtual screen into sections - the cardinality of the
dimensions indicates, how many sections are
generated on horizontal and vertical axes. Each
generated section is then used for recursive
splitting of virtual screen in next two dimensions
in the same way. This process is repeated until all
the dimensions are not processed and the data
are not placed to their corresponding positions
on the screen.
Dimensional Stacking
Treemap
Combinations of Techniques
• Hybrid techniques based on combinationsof
aforementioned techniques for points, lines
and areas
• Best known:
– Glyphs (pictograms)
– Dense pixel displays
Glyphs and Icons
• Visual representation of parts of data or
information,where graphical entity and its
attributes are driven by one or more
attributes of input data
• Graphical attributes, to which the data values
can be mapped:
– position, size, shape, orientation, material, line
style, dynamics
Glyphs and Icons
• Types of mapping:
– 1:1 – each data attribute is mapped to unique
graphical attribute
– 1:N – set of redundant mappings (e.g., mapping
data attribute simultaneously to size and color)
– M:N – multiple or all data attributes mapped to a
common type of graphical attribute
Glyphs and Icons
• We must be aware of inaccuracies and
restrictions of these techniques:
– Inaccuracy of perception – depends on the type of
used graphical attributes
– Distance between graphical attributes influences
the accuracy of their comparison – the closer, the
more precise comparison
– Number of dimensions and data records which
can be effectively displayed using glyphs is limited
Glyphs and Icons
• After selection of the type of glyph there are N!
possible orderings of the dimensions, which can
be used when mapping
• Several strategies for selection of suitable order
exist:
– Sorting of dimensions based on their correlation
– Increasing influence of glyph with symmetricalshape
– Sorting by the values of dimensions in a single record
– Manual sorting based on knowledge of the domain
Placement of Glyphs
• Three basic types of strategies for placement
of glyphs on the screen:
1. Uniform
2. Data-driven
3. Structure-driven
Uniform Placement
• Uniform placement on screen
• Elimination of overlaps, effective usage of
screen space
Data-Driven Placement
• Two approaches:
– Select two dimensions to direct the placement (left)
– Positions derived using PCA, MDS (right)
Structure-Driven Placement
• Using structure of the data – cyclic, hierarchical
Dense Pixel Displays
• Hybrid method between point-based and
regional (area-based) methods
• Maps each value to individual pixel and for
each dimension creates filled polygon
• Displaying millions of values within one screen
• Number of data points determines the
number of individual items in the image
• The technique relies on application of color
Dense Pixel Displays
• Simplest form:
– Each dimension of dataset generates independent
separated “sub-image” on the screen
– Each dimension can be considered as an independent
set of numbers, each set determines the color of the
corresponding pixels
– The placement of the items within the set
(highlighting relationships between close points):
alternating passes form right to left and from left to
right; if the edge of the image is reached, move to the
next line
Dense Pixel Displays
screen filling recursive patterns
Recursive Patterns, Circular Segments
• Placement of sub-images using different
approaches:
Dense Pixel Displays
• Last important aspect is ordering of the data
• Time-series data have fixed ordering
• In other types of
data the change
of order can
reveal interesting
properties
More Approaches
• Enable overlaps of sub-images:
– „Value and Relation“ technique using
multidimensional scaling
Pixel Bar Charts
• Overloading of classical bar chart – including
more information about individual items
Pixel Bar Charts
• Each pixel of the bar represents a data point
belonging to the group represented by this
bar
Pixel Bar Charts
• Internet shopping – relationship between
the type of product and the price. Color is
mapped onto:
amount spent number of visits size of sales
Pixel Bar Charts
• Placement of dense pixels to bar chart
Pixel Bar Charts
• We can derive, e.g.:
– The largest amount of customers came in December,
while in February, March, and May there was
minimum of customers.
– From February to May there were largest amounts of
purchases.
– Number of purchases in December is average.
– From march to June the customers returned more
frequently than in other moths. December customers
were mostly one-time customers.
– Customers shopping the most are returning more
often and buying more stuff.