10 Movement Measures
The chapters in the first two parts of this book provide detailed information about the technology and skills necessary to conduct eye-tracking research (Part I), and how to process eye-tracking data after recording (Pari II). In Part III, we cover the vast range of measures which can be calculated on the basis of the events and representations described in Part I. Measures of eye movement are many and diverse, and this is where we begin.
Eye-movement measures, as defined in this chapter, refer to different properties of movement events during a finite period of time. The properties of movement are direction, amplitude, duration, velocity, and acceleration.
Movement measure group Uses Page
Movement direction measures In what direction did the eye move? 301
Movement amplitude measures How far did the eye move? 311
Movement duration measures For how long did the eye move? 321
Movement velocity measures How fait did the eye move? 326
Movement acceleration measures How fast did the eye accelerate? 332
Movement shape measures What is the shape of the eye movement? 336
AOI order and transition measures How similar are movements in AOls? 339
Scanpath comparison measures How similar are two or more scan- 346
paths?
Also, all movements have the more ill-defined property shape. These six general properties of movements have generated the measures that are listed in Sections 10.1-10.6.
In Sections 10.7 and 10.8, we classify measures that quantify the order of movement through space: AOI visits and transition sequences between AOIs, and methods to calculate the similarity between pairs of eye-movement sequences (i.e. scanpaths).
Many of the movement measures have a ratio value type, which makes their usage statistically straightforward. However, some measures in the later sections require the use of more advanced statistics.
10.1 Movement direction measures
Movement direction measures pertain to single instances of movement events such as sac-cades, glissades, drifts, microsaccades, smooth pursuits, and scanpaths. Some but definitely not all of these events move along a straight line, but by no means always and definitely not all. The movement of saccades, glissades, and smooth pursuits can be curved, i.e. altering direction along the event. When we refer to a scanpath's direction we refer to its overall direction, if it has any directionality at all.
The resulting values tat direction of movement (
Blink duration
Fig. 10.17 To the left a plot of a blink recording, with the lid closure as the vertical amplitude dimension. To the right, blink duration and other blink measures defined on the background of a blink amplitude curve. WHh kind permission from Springer Science+Business Media: European Journal ol Applied Physiology. Experimental evaluation of eye-blink parameters as a drowsiness measure, 89(3), £003, Phillipp P. Caffier.
3500
3000
2500
£ 2000
1500
1000
500
100 200 300 400 500 600 700 800 900 1000 Blink duration in ms
Fig. 10.18 The histogram of blink durations during mathematical problem solving, as measured with a tower-mounted 1250 Hz system, and analysed in BeGaze 2.3. Bin size 10 ms. Total number of blinks in this histogram is 32700, of which 203 are above 1000 ms in duration. The peak below 80 ms shows that this particular blink detection algorithm accepts various measurement noise as blinks.
and Schroeder (1994). Morris and Miller (1996) found that blink duration increases as a function of time on task. However, Veltman and Gaillard (1996), in a study of pilots, conclude that blink duration is affected by the visual demands of the task rather than by the cognitive workload in general. Alcohol and anaesthetic sedation Blink duration appears to be sensitive also to low levels of sedation (Jandziol, Prabhu, Carpenter, & Jones, 2006) and alcohol (Biederman et a/., 1974).
Like saccadic suppression, where visual intake is reduced before the physiological movement of die eye begins, there is a similar phenomena for blinks known as blink suppression. Ridder III and Tomlinson (1995) propose that these mechanisms are produced by similar un-
326 (MOVEMENT MEASURES
deriving systems. According to Volkmann, Riggs. and Moore (1980), blink suppression starts around 50-100 ms before the blink onset, and lasts until 100-150 ms after offset. However, others have found that full acuity is not recovered until 200-500 ms after a blink (Ehrmann, Ho, & Papas, 2005). In addition, the amount of blink suppression seems to be directly influenced by factors such as blink amplitude and task (Stevenson, Volkmann, Kelly, & Riggs, 1986).
10.4 Movement velocity measures
Average velocity (v) over a movement is related to the amplitude (0) and duration (/) by the classical equation
0
v = - (10.9)
Velocity levels can change along a movement, and that is why average velocity for one eye movement event (e.g. a saccade) is only of marginal use. Instead the instantaneous, tangential velocity (0) is approximated by the distance between consecutive raw data samples (0) multiplied by the sampling frequency of the eye-tracker fs = i. Mathematically, this calculation of (6) corresponds to a differentiation of position data. The instantaneous velocity is typically lowpass filtered before being processed further, and the exact velocity values are therefore intimately related to the properties of the lowpass filter (Inchingolo & Spanio, 1985). Note that filters may introduce latencies in the velocity data.
10.4.1 Saccadic velocity
Target question What was the peak/average velocity of the saccade?
Input representation A saccade
Output Saccade velocity (degrees/s)
Saccadic velocity is the first derivative of position data with respect to time. Saccadic velocity is typically calculated as part of event calculation (Chapter 5). An example of a velocity plot can been seen in Figure 10.20. Velocity plots are one of the most important data inspection tools in eye-tracking research. Precision problems, optic artefacts, and the successes and failures of event detection algorithms can be readily inspected in them.
Traisk et al. (2005) showed that search coils reduce the velocity of saccades. in particular for longer amplitudes. Saccades are therefore probably most accurately recorded with video-based pupil and corneal reflection eye-trackers. Furthermore, McGregor and Stern (1996) found that saccades occurring during a blink were significantly slower than those occurring independently of a blink. An example of a blink-accompanying saccade is given in Figure 10.19.
In Figure 10.20, we can see the velocity curves for three saccades that develop very differently in time. The first saccade is a fairly common type. The velocity peak appears slightly before the middle of the saccade, so that the acceleration phase is faster than the deceleration phase. Two small glissadic movements are appended to this saccade. The middle saccade in Figure 10.20 is what has often been considered the ideal velocity curve. It has a clear velocity peak with only a slightly faster acceleration than deceleration. The glissade is minimal. The third saccade has multiple velocity peaks, of .which the first two are saccadic and the last one possibly a glissade. The eye moves in spurts, but is never completely still between each burst. Zivotofsky, Siman-Tov, Gadoth, and Gordon (2006) report very similar data from
MOVEMENT VELOCITY MEASUBESl 327
«on starts However.
fcrectly in-m.&. Rigas.
lif by the
(10.9)
br one eye .tangential 5 101 miscalculation m typically ercfore in-. Note that
lie veloc-i velocity ainspec-ssses and
-T-i—ri—
35.140 35.160 35,180 35.200 35.220 35,240 35,260 35,280 35.300
Time [ms]
35,320 35,340
Fig. 10.19 180 ms saccade (black velocity) coinciding with a blink (dashed velocity). The saccadic velocity peaks at 35.180 ms, and the blink takes over at 35,190 ms (dashed artefactual velocity line). At 35,340 ms, note how the now prolonged saccade ends, around 180 ms after starting. Recorded during the mathematical problem solving task, number 4 on page 5, using a tower-mounted system at 1250 Hz. The data quality was high.
700 „600 g 500 ^400 I 300 e 200 > 100 0
:r.:::z: ----Saccades----------
z———-——[—I—
\ -------jl]----
-•-:-1-.-:-1-■-1-1-1-—-----1-r
9.100 9,150 9,300 17,450 17,500 17,550
Time [ms]
Fig. 10.20 Three saccades by one participant shown in a velocity over time diagram. Data taken from the mathematical problem solving task, number 4 on page 5, recorded with a tower-mounted 1250 Hz eye-tracker. The data quality was high.
i particular I video-U996) soccurring i in Figure
> very dif-> slightly deration ade in " veloc-I minimal, i and the I between r data from
a participant with Stiff-Person Syndrome, where the multiple peaks are most likely a result of incomplete muscle control. Rucker et al. (2004) link these multi-step saccades to Tay-Sachs disease. The data in Figure 10.20 are from a presumably healthy participant—a young student—but he was the only person from over 300 participants in this reading study who exhibited this pattern. The saccade is curved, and different velocities occur along the curved trajectory, possibly as the three pairs of eye muscles alternate controlling the eye. Three common values are derived from velocity data such as in Figure 10.20:
1. Average saccadic velocity is an average of velocities over the entire duration of a saccade. Averages are however poor representations of the Gaussian-shaped saccadic forms.
2. Peak saccadic velocity is the highest velocity reached during the saccade. Average peak saccadic velocity reported by your software may be heavily affected by the peak velocity threshold in the event detection algorithm, as Figure 10.21 illustrates. Not only are saccades with a peak velocity lower than the threshold excluded—a large number of
328 I MOVEMENT MEASURES
2000
Saccadic peak velocity in 7s
Fig. 10.21 The distribution of saccades at different peak velocities, as a function of peak velocity threshold- Note that a higher threshold reduces the number of saccades also at peak velocities aoove the threshold, due to the additional conditions on saccades in this Implementation (p. 173). Bin size 10°Is. 10 participants reading for 10-15 minutes each, recorded with a tower-mounted system at 1250 Hz with high data quality. Saccades detected with the velocity algorithm.
the saccades with velocities from the threshold up until around 300°/s are also eliminated by the higher settings. Similar results were found by Bahill eta!. (1981), but this also depends on the exact implementation of the detection algorithm. 3. Time to peak is the duration from the onset of a saccade until the peak velocity is reached.
The highest recorded peak saccadic velocity tends to be around 10W7s, as Figure 10.12 on page 317 shows for a large saccade population during mathematical problem solving. Such fast saccades are very rare, however. In the peak velocity histogram in Figure 10.21, even the fastest reading return sweeps across the entire monitor have velocities no larger than approximately 700°/s.
The lower end of peak saccadic velocities is less investigated, although of great importance in event detection algorithms. On page 171-172, we concluded that a proper selling is about 30-40 °/s for a high-speed system with good precision. If there are long fixations in the data, there is a certain risk that occasional microsaccades will be detected, as a few microsaccades have peak velocities above 50°/s (Engbert, 2006).
Saccadic velocity has been used as a measure of cognitive activation level, or what is often called arousal level. Circumstances and factors that influence saccadic velocity include: Arousal levels and sleepiness Low vigilance decreases saccadic velocity (Galley, 1989), and so does tiredness (McGregor & Stern, 1996; Becker & Fuchs, 1969) and sleep deprivation (Russo et al., 2003; Bocca & Denise, 2006). However, McGregor and Stern (1996) present results that suggest caution in interpreting saccadic velocity change as an index of 'fatigue', since the reduction in average saccadic velocity may be secondary
MOVEMENT VELOCITY MEASURES] 329
to increases in blink rate.
Anticipation Anticipatory saccades, made to targets that are so predictable that the saccade can be launched before target onset, have lower velocities than reactive saccades (Smit & Van Gisbergen, 1989; Bronstein & Kennard, 1987).
Task Saccadic velocity increases as the difficulty of the task increases (Galley, 1993) and decreases with an increasing time on task (McGregor & Stern, 1996). When the task requires a higher saccadic rate (greater frequency of saccades), the saccadic peak velocity increases (Lueck, Crawford, Hansen, & Kennard, 1991).
Age Saccadic velocities are of the same size in children as with adults (Salman, Sharpe, Eizenman, et al., 2006), and Abrams, Pratt, and Chasteen (1998) found velocities not to differ between younger and older adults. However, Moschner and Baloh (1994) found velocities to be 20% slower for participants older than 75 years compared to participants younger than 43.
REM sleep During sleep, REM saccades—i.e. rapid eye movements—arc about half the velocity of equal amplitude saccades made when awake (Aserinsky, Joan, Mack, Tzankoff, & Hum, 1985). However, a more recent study could not confirm this finding (Sprenger ei al., 2010).
Melancholia This is a disorder of low mood and lack of enthusiasm, and is associated with a difficulty in increasing peak velocities as target amplitudes increase (Winograd-Gurvich, Georgiou-Karistianis, Fitzgerald, Millist, & White, 2006).
Neurological disorders Slow saccades can be an indication of lesions in the pons, the midbrain, or the basal ganglia. Low saccade velocities also occur with Alzheimer's disease, AIDS, certain drugs, and a few other specific diseases. See the excellent summary in Wong (2008) for details.
Drugs and alcohol Peak saccadic velocity has for a long time been one of the prime oculomotor measures when studying the neurological and behavioural effects of drugs and alcohol (Abel & Hertle, 1988; Griffiths, Marshall, & Richens, 1984; Jürgens, Becker, & Kornhuber, 1981; Lehtinen et al., 1979; Franck & Kuhlo, 1970).
10.4.2 Smooth pursuit velocity
Target question What was the peak or average velocity of the smooth pursuit
event?
Input representation A smooth pursuit event
Output Velocity tdegrees/s)
Smooth pursuit velocity is calculated just like saccadic velocity. Velocity plots on pages 169 and 179 show smooth pursuit velocity from a participant watching a pendular movement. We differentiate between average and peak smooth pursuit velocity.
It is generally thought that smooth pursuit velocity, peaking at 25-40°Is, is much slower than saccadic velocity (Boff & Lincoln, 1988; Young, 1971). However, Meyer et al. (1985) recorded 100°/s smooth pursuit on ordinary participants. Measuring professional baseball players who simulated hits on a baseball, Bahill and LaRitz (1984) recorded smooth pursuit velocities of up to !307s. According to Bahill and LaRitz, p. 235, "The success of good players is due to faster smooth pursuit eye movements, a good ability to suppress the vestibulo-ocular reflex, and the occasional use of an anticipatory saccade". Due to their large overlap in velocity range, smooth pursuit and saccades may be hard to separate by considering velocity alone. When the smooth pursuit follows targets moving with velocities of greater than about 307s, there tend to appear catch-up saccades in the data.
330 | MOVEMENT MEASURES
Smoom pursuit velocity has been argued to reflect the following:
Path curvature When the target moves in curved paths, smooth pursuit velocity decreases with decreasing radii of the curves (De'Sperati & Viviani, 1997),
Age Smooth pursuit velocity is slower for older participants (mean 67 years) than for younger (mean 42) (Sharpe & Sylvester, 1978). Newborn children have an undeveloped smooth pursuit, and have difficulty tracking targets even at low speeds such as 25°/s (Kremenitzer, Vaughan Jr, Kurtzberg, & Dowling, 1979).
Drugs Few studies have been conducted on the effects of drugs, and often with negative results. For instance, Tedeschi, Bittencourt, Smith, and Richens (1983), contrary to expectations, found no effect of amphetamine on smooth pursuit velocity.
Disorders Children with autism (Takarae. Minshew, Luna, Krisky, & Sweeney, 2004), adults with a childhood history of physical and emotional abuse (Irwin, Green, & Marsh, 1999), as well as patients with schizophrenia and post traumatic stress disorder (Cerbone et at., 2003) all show a decreased ability to smoothly track targets at higher velocities.
10.4.3 Scanpath velocity and reading speed
Target question What was die velocity of the scanpath?
Input representation A scanpath. consisting of a sequence of saccades Output Velocity (degrees/$)
Scanpath velocity (which you may also see referred to as 'average saccadic velocity', 'eye movement speed', or 'eye velocity') is defined as the product of average saccadic amplitude (in °) of saccades of which the scanpath is comprised, and saccadic rate (in l/s). Apart from being a measure of scanpath velocity, this measure allows for a crude approximation of average saccadic velocity for data collected with such low sampling frequency and poor precision that actual saccadic velocities cannot be calculated.
The measure was defined by Saito (1992), who compared monitor work (23°/s) with similar work without a monitor (9°/s).
In reading studies, an alternative calculation of the same scanpath velocity has been made as the product of reading speed (in characters per second) and letter size (in visual degrees per character). The originators of this calculation, Krischer and Zangemeister (2007), investigated optimal conditions for reading, and conclude that the best skill- and acuity-matched letter size gives an eye movement speed of 8°/s during reading. Small letters, in particular, slow down the eye movement speed.
Beymer, Russell, and Orton (2005) instead divided text distances read by the time it took to cover them. This velocity measure was used to compare paragraph widths, and the authors found that a 4.5 inch monitor text is read only slightly faster than a 9.0 inch text
An ambitious operational definition of reading speed (RS) was provided by BuIIimore and Bailey (1995), who defined it as
#forward saccades
RS = FR—;-----(average saccadic amplitude in letters) (10.10)
#total saccades
where FR is fixation rate (p. 416). The measure was used to study participants with macular degeneration and the effect of luminance, which both have effect on reading speed.
96
MOVEMENT VELOCITY MEASURESl 331
80000 70000 60000 if 50000 g 40000 1 30000 20000
10000 0
1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 Frames
Fig. 10.22 Broken line, left eye; unbroken line, right eye. Max, maximum pupil area; Min, minimum pupil area. PCV, pupil constriction velocity, is calculated from the middle portion of the curve between Max and Min; PDV, pupil dilation velocity, is calculated from the middle portion of the curve between Min and Max. Data from a healthy participant about 60 years of age watching a variable light source. Recorded using a custom-built 60 Hz pupillometer. With kind permission from John Wiley and Sons: Acta Ophthalmologics, Relative afferent pupillary defect in glaucoma: a pupillometric study, 35(5), 2007, Lada Kalaboukhova, Vanja Fridhammar, and 8ert.il Lindblom pp. 519-525.
10.4.4 Pupil constriction and dilation velocity
Target question What was the velocity of the pupil closure or opening move-
ment?
Input representation Raw sample data
Output Velocity { min/s or mm2/s)
In pupillometry, pupil velocity is an established dependent variable. Pupil constriction velocity is approximately three times faster than dilation velocity (Ellis, 1981), so these should be measured separately. Figure 10.22 shows a pupillometric recording, and demonstrates *hat portions of data should be selected for pupil velocity calculations.
Average pupil velocity is calculated by selecting a constriction or dilation period with constant velocity and dividing the change in pupil diameter or area by the duration of the period (Figure 10.22). Tangential pupil velocity is calculated either by differentiating the horizontal diameter by time (Bitsios, Prettyman, & Szabadi, 1996), or by differentiating the pupil area by time (Figure 10.22). After tangential velocity has been calculated, maximum pupil velocity can easily be calculated.
Pupil velocity measurements have mainly been used clinically, for instance to assess pupil parasympathetic function, effects of medication, or the effect of glaucomas on pupil dilation. Age differences are known. For instance, Bitsios et 01. (1996) found a smaller maximum dilatation velocity in an elderly compared to a younger group.
332 IMOVEMENT MEASURES
Acceleration phase
Deceleration phase
Time
Fig. 10.23 Acceleration (thick line) and velocity (thin line) over time lor an ideal saccade. Note that acceleration rises before velocity; and that acceleration is 0 when velocity peaks. Also note that during the deceleration phase, acceleration is negative.
10.5 Movement acceleration measures
The instantaneous, tangential acceleration 6 is calculated by differentiating velocity 8 with respect to time:
Mathematically, acceleration is therefore the second derivative of position data. Peak and average acceleration can be retrieved from the continuous acceleration data. Deceleration— slowing down—can be seen as negative acceleration (Figure 10.23), Jerk (©) is the third derivative, measuring the change in acceleration over time.
Movement acceleration is necessary to start a movement and to increase the velocity of it. Acceleration and jerk drive velocity, and therefore a change in acceleration arc accompanied by a change in velocity. Unfortunately, numerical differentiation magnifies noise in the signal, and may require additional filtering.
10.5.1 Saccadic acceleration/deceleration
Target question Wlmt wu.i the peak/average acceleration of the saccade?
Input representation Asaccade
Output Acceleration (degrees/s-) _
Saccadic acceleration is the derivative of saccadic velocity with respect to time. Acceleration thresholds are used in some of the saccade detection algorithms (most prominently in the EyeLink parser from SR Research), and are valuable to separate smooth pursuit movements (low acceleration due to small variation in velocity) from saccades (high acceleration/deceleration at onset/offset).
Saccadic acceleration rises very fast, and reaches a maximum peak of up to 100000°/s2, with typical peak values ranging between 6000-120007s2. Extremely high acceleration values may be an indication of optic artefacts in the data. As Figure 10.24 shows, the distribution has a rightward skew.
(10.11)
MOVEMENT ACCELERATION MEASURES| 333
Acceleration (7s2)
105
Rg. 10.24 A normalized distribution of saccades at different peak accelerations. Acceleration is calcu-s:ed using the algorithm by Nystrom and Holmqvist (2010). Ten participants reading for 10-15 minutes each, recorded with a tower-mounted 1250 Hz system. Data quality is high.
Acceleration and deceleration were found by CoIIewijn eta], (1988) to increase as a function of saccade amplitude; however, data were recorded with coils, not the type of commercial lideo-based eye-trackers in common use in research today.
Saccadic acceleration is a very uncommon measure that appears to have attracted mainly ■eurotogists with an interest in the cerebellum and superior colliculus (important saccade programming areas of the brain). All participants in Straube and Deubel (1995) showed an i^osyncratic pattern of saccadic acceleration and deceleration.
The existence of glissades and saccades with multiple velocity peaks means that some saccades have more than one acceleration phase. Whether this also means that the eye muscles are actually pulling the eye during all acceleration phases, or whether some acceleration ptoses are the consequence of eye lens inertia during deceleration still remains an open ques-
10.5.2 Skewness of the saccadic velocity profile
Taget question How much of the saccadic duration is taken up by acceleration
and deceleration phases, respectively?
representation A saccade
Output Skewness
The skewness of the saccadic velocity ('saccadic skewness') is defined as the degree newness of the velocity plots of saccades. It attempts to measure the duration of the acceleration versus deceleration phases in saccades, as shown in Figure 10.23.
The measure has been operationalized in at least three different ways; First, CoIIewijn et ^.(1988) define the skewness value as the acceleration phase (lime to peak velocity) divided ly fee total saccade duration. A symmetrical saccade therefore has a skewness of 50%. Figure
334 |MOVEMENT MEASURES 2500,-
2000 | 1500
3
0
1 1000 z
500
0
15 20 30 40 50 60 70 80
Skewness{%)
Fig. 10.25 Histogram of saccadic skewness from more than 30,000 reading saccades from project 1 on page 5. Skew is defined as the duration of the acceleration phase divided by the total duration of the saccade. Bin size 2%. Recorded with a tower-mounted eye-tracker at 1250 Hz. Saccades detected using the velocity algorithm at a threshold setting of 30°/s. Glissades are largely included in saccades.
i-1-1-1-1-1-r-1
-50 0 50 100 150 200 250 300
Time (ms)
Fig. 10.26 The skewness of a saccade has previously been shown to be dependent on the saccadic amplitude. Data recorded with a coil system. Reprinted with kind permission from John Wiley and Sons: Journal of Physiology, Binocular co-ordination of human horizontal saccadic eye movements, 404(1), 1998, Collewijn, H.. Erkelens. C.J., & Steinman. R.M., pp. 157-182.
10.25 shows a histogram of saccadic skewness using this calculation. A similar operational definition used by Straubc and Deubel (1995) and others is to calculate the acceleration phase divided by the deceleration phase. Symmetrical saccades in this case have the value 1; right-skewed are lower, and left-skewed (the few that exist) have a value above 1. Using this, the distribution of skewness values in Figure 10.25 will be even more skewed than as depicted. A third operational definition was provided by Van Opstal and Van Gisbergen (1987) who
MOVEMENT ACCELERATION MEASURES] 335
approximated a gamma function to the velocity profile of the saccades. By finding values for a, p. and y that optimize the fit of
v{t) = a-{!p)y-Xe~(f) (10.12) to the velocity curve v(t), the skewness can be calculated as
Skew = — (10.13)
Skewness is a little used measure. Anticipatory (or predictive) saccades were found to be slightly more skewed than visually guided saccades (Smit & Van Gisbergen, 1989). Liao et al. (2006) found that head saccades (turning the head) have a constant skew of 0.5, while the skew of eye saccades varies. Soetedjo, Kaneko, and Fuchs (2002) found that injection of the substance muscimol in the superior colliculus of monkeys increased the skewness since the duration of the deceleration phase increased more than the duration of the acceleration phase.
In previous research based on scleral coil recordings, the acceleration phase has been shown to be of approximately the same duration across all amplitudes, while the deceleration phase increases rapidly with increasing amplitudes up to 90° (Figure 10.26). However, using the data described on page 5, from video-based eye-trackers, we have not been able to find any correlation between amplitude and the skewness of the saccadic velocity profile for saccades up to 40°.
The many glissades that exist in video-based eye-trackers (but are suppressed with coil systems) are categorized with the saccades by some of the detection algorithms. In effect, this means that glissadic saccades of whatever amplitude will be heavily skewed. In fact, the skewness measure will itself have a skewed distribution, as shown in Figure 10.25. The 0
where R is a normalized transition matrix and r, are the cell values of that matrix with probabilities p(rj).
Entropy can be calculated for any dwell map or transition matrix. The lowest possible value is zero (0), which is only reached if there is just one cell in the matrix, i.e. when there
342 I MOVEMENT MEASURES
A B C
0.0625 0.0625
0.0625 0.5
0.25 0.0625
Fig. 10.29 Fictitious transition matrix with normalized values.
Table 10.2 Step-by-step entropy calculations for the transition matrix in Figure 10.29.
Transition P(n) p(rj)log2p(ri)
A —> ß 0.0625 -4.0 -0.25
A^C 0.0625 -4.0 -0.25
B-^A 0.0625 -4.0 -0.25
B-+C 0.5000 -1.0 -0.50
C^A 0.2500 -2.0 -0.50
C^B 0.0625 -4.0 -0.25
is no uncertainty about what type of transition will occur. The maximum value for entropy is when all cells have the same value. In our example with six cells, the maximum value would be -6g log2(g), which equals 2.59 bits. 'Bits' is not a very intuitive unit, however, but by dividing the H{R) value with the theoretically maximal value for the system (in our example 2.59), we arrive at a normalized entropy that allows for comparisons of results across groups and stimuli.
Calculating the entropy of a transition matrix, Shic, Chawarska, and Scassellati (2008) argued that a high resulting value is aligned with a preference for exploration, while low values indicate data with transitions mainly between a few of the AOls. Jordan and Slater (2009) interpreted a drop in scanpath entropy over time as an indication that the virtual environment they tested "had cohered into a meaningful perception" (p. 185). They calculated entropy from transition matrices created from 1-second intervals, and collapsed data from all participants to estimate the total change in entropy over time.
10.7.4 Number and proportion of specific subscans
Target question How common is a specific subscan?
Input repre serttaüon AOI-pmcessed data
Output The number or proportion of each subscan. commonly deter-
mined in a histogram
This measure counts the number of unique subscans in string representations of scanpaths. Groner et at. (1984) decided "in a compromise between theoretical considerations and statistical arguments" (p. 529) to analyse subscans with a length of three AOIs. As their face stimuli had seven AOls (Figure 10.30(b)), they found the total number of possible subscans of length three to be 210 (see Equation 6.1 on page 195). Interestingly, the 20 most common subscans subsumed 57% of the total number of subscans. The two most common subscans moved between the eyes, but there was considerable individual variation (Figure 10.30(a)). Each subscan corresponds to a cell in a 3D transition matrix as the one on page 194.
AOI ORDER AND TRANSITION MEASURES| 343
TRIPLETS
SUBJECTS
LRU 2. ME 3 BR 4. BK 5.W1 8 TR Tolal
2-3-2 7 a 13 35 0 68 133
3-2-3 20 ■ 10 25 1 48 111
7-2-3 13 2 S 4 0 31 55
2-3-« 3 e 3 19 0 17 53
4-2-3 10 9 a 10 3: 17 52
5-2-3 14 13 3 e 3 e 45
2-3-5 21 1 2 8 2 8 43
3-2-4 E 7 0 12 3 13 40
$-4-3 3 2 0 0 15 8 28
3-2-5 5 4 3 7 T 8 28
4-3-2 6 1 0 5 7 8 27
7-S-4 9 2 0 0 8 8 27
4-5-4 1 3 0 4 it 2 21
2-4-3 4 3 1 4 3 4 19
Tabte 1: The 14 most frequent triplets Numbers indicate frequencies Of observatoř Under linings symbolize significant higher frequency than escpected by independence'.Oi-sauare test, 01)
(a) Subscan "triplet" frequency tabfe for six participants looking at faces.
S
i __<
2 3 «38?
1 Forehead 2- Lefl eye 3. Right eye 4-Nose
5. Mouth
6. Ch«l
7 Eers end sides
(b) Stimulus and the seven AOIs.
Fig. 10.30 Subscan frequency analysis. Reprinted from Advances in Psychology, Volume 22, Rudolf Groner, Franziska Walder, and Marina Groner, Looking at Faces: Local and Global Aspects of Scanpaths, pp. 523-533., Copyright (1984), with permission from Elsevier.
Again using subscans of length three, Koga and Groner (1989) compared non-native learners of Japanese Kanji sign sequencies before and after training using two different presentation modes. Based on the 20 most common subscans, they conclude that subscan frequency is not influenced by the presentation mode.
When analysing eye-movement data from participants picking up and dropping block objects, Ballard el al. (1992) found that the most common subscan sequence consisted of four AOIs. Tasks and participant strategies definitely influence the prevalence of longer subscans.
10.7.5 Unique AOIs
Target question Does the scanpalh over AOIs represent a focused or an
overview scan?
Input representation AOI strings
Output Proportion of scanpaths in each category
The unique AOI measure counts how many unique AOIs there are in a substring, but is not concerned with the exact order of the AOI visits. For example, the strings ABB and BBA of length three both have two unique AOIs (A and B) but different AOI order. Since the unique AOI measure does not have to consider all possible subscans in a transition matrix, substantially fewer data need to be processed, and longer subscans become easier to investigate. However, the price is that sequential information within the subscan is completely disregarded. The measure resembles the local versus global and ambient versus focal of pages 265-267, but differs by using AOIs rather than amplitudes.
The unique AOI measure is calculated by letting a window of length ( slide over the recorded AOI sequence, and counting the number of unique AOIs in the window. Suppose for instance that for a hypothetical participant and trial, we record an AOI sequence IAIBACIAIABCDIAIAI. Repetitions are not allowed, and are collapsed to include only a single letter (as in compressed string edit representation). Unique-AOI sequences are then defined by letting a window of size 5 travel along the AOI sequence. First the window will
344 |MOVEMENT MEASURES A
I = INPUT
y skär den verticals axetn I en punkt med negativt y-varde.
y skär den horisontetta axe in i en punkt med negativt x-vartie.
y har en vertikal asympiot x ■ 3.
y har en hofisontell asymptot y = 3.
(a) Typical stimulus with AOIs.
Last window of size 5 betöre dick to answo'
unlque-2 unlqut-3 uniques unique-5
(b) Histogram of uniqueness values, all participants and the last five AOIs before clicking to answer.
Fig. 10.31 Unique-AOl data. Sequence length i = 5, number of AOIs n = 5. High-ability participants clearly made more painrvise AOI inspections compared to low-ability participants.
encounter IAIBA, and see that there are three unique AOIs (i.e. I, A, B). Next, we move the window one step, and find AIBAC, which has four unique AOIs. We continue like this, until we have reached IAIAI at the end, which has only two unique AOIs. In total we will have 14 AOI-uniqueness numbers from the recorded sequence of 18 AOIs, namely 3-4—4—4-3-3-3-4-5-5-5-4-3-2. We now count how many unique-2, unique-3, unique-4 and unique-5 there are in this sequence and we find: 1 unique-2, 5 unique-3, 5 unique-4 and 3 unique-5.
Figure 10.31 shows a mathematical problem and the unique-AOIs values resulting from the last five AOIs looked at before participants clicked on the alternative they thought was correct (experiment described on p. 5). During this decision phase, high-ability students (defined as those solving a larger proportion of problems compared to the whole student group) made 25% pairwise comparisons—that is cases with five consecutive dwells, in which only two unique AOIs were visited—and another 43% unique-3 sequences. Low-ability students have a higher tendency to make overview scans.
When the uniqueness values calculated from each window are not collapsed over time, they can be used to study the relation between focus and overview looking over time. Calculations of the chance levels for each uniqueness group give chance probabilities of 0.0156 for unique-2, 0.328 for unique-3, 0.562 for unique-4, and 0.0938 that participants will sequence all five AOIs consecutively. Hoimqvist, Andra, et al. (2011) present calculations of chance probabilities for a whole variety of numbers of AOIs and window sizes, and describe how significance calculations should be carried out.
10.7.6 Statistical analysis of a transition matrix
Target question Are some transitions significantly more common than others?
Inpul representation A transition matrix
Output Significant transitions and p-valttes
In this section, wc illustrate the use of log-linear statistical analysis for the analysis of the transition matrix given in Table 6.1 on page 194. The principles behind log-linear analysis are briefly explained on page 90.
AOI ORDER AND TRANSITION MEASURES | 345 Table 10.3 Adjusted residuals for the data in Table 6.1 on page 194.
LS LF RF RS I E O
Left Side (LS) 8.58 -1.93 -2.04 -1.41 -1.90 4.45
Left Front (LF) 5.50 0.42 -1.93 3.35 -1.57 -4.39
Right Front (RF) -1.34 1.67 5.92 -1.58 -1.44 -1.88
Right Side (RS) -1.84 -5.21 9.58 -2.07 5.90 1.65
Instruments (I) -0.60 0.58 -2.53 -1.63 -0.28 1.74
Engine (E) -1.84 -4.89 -2.82 -0.29 0.85 8.21
Other (0) -1.06 1.57 -2.49 0.31 -0.22 -0.25
In a two-dimensional table, as in Table 6.1, the first step in the analysis is to exclude the two-way interaction from the model, and to calculate the expected number of transitions based on only the two main effects. Note that the calculation of expected number of transitions in tables that contain structural zeros is realized by means of an iterative procedure, and should be left to a computer. The resulting value of chi-square indicates whether or not the expected numbers of the more parsimonious model are still reasonably close to the observed frequencies. In the case of Table 6.1, they are clearly not, since the chi-square value of 473.411 with 29 degrees of freedom28 is highly significant.
The fact that chi-square is significant indicates that the two factors are dependent on one another. Translated to transition matrices, this means that transitions between certain AOIs were either significantly more or less likely than expected.
The natural question that follows is, of course, which transition numbers in the matrix deviate significantly from expected. Cells for which the expected number deviates significantly from the observed number may be identified through examination of the adjusted residuals, i in Table 10.3. Adjusted residuals are indications of the distance between an observed and an expected value, similar to standard scores. Positive values indicate that the observed number of transitions is higher than the expected number, whereas negative values indicate that the observed number is lower than the expected number. Absolute values that are larger man 1.96 are significant with p < 0.05. These are marked in bold in Table 10.3. The table shows, for instance, that the number of transitions from left side to left front was significantly higher than expected, whereas the transitions from left side to other were significantly fewer man expected.
If you have an a-priori interest in one or more transitions, the analysis of adjusted residuals may be pursued further (Bakeman & Gottman, 1997). Suppose that the transitions from left side to left front in cell (1,2) were of particular interest for the study. The effect of these transitions on the overall chi-square can be evaluated by declaring cell (1,2) structurally zero, ind then recalculating the expected numbers. In the above example, the exclusion of this cell
ults in a new value of chi-square of 395.666 with 28 degrees of freedom. This is a reduction compared to the earlier value, but obviously, the new value is still highly significant.
In the case where transitions are being examined within the context of an experimental design, the situation becomes more complicated. A direct comparison of transition numbers across groups is not allowed. Instead, however, the researcher may choose to use a measure of the strength of association within the table as the dependent variable within the design. One such measure that is suitable for the analysis is the log odds ratio (Bakeman & Gottman, 1997). A restriction to this measure is that it is based on a two-by-two table. Therefore, the researcher either needs to incorporate this in the design of the experiment, or collapse across
^(Number of columns-l) * (Number of rows-l)-Number of structural zeros = Degrees of freedom for a two-ifenensional transition matrix.
346 [MOVEMENT MEASURES
multiple rows and columns in a larger table. If the cells of a 2 x 2 table are designated as Fu, F\2, F2\, and F22 respectively, the log odds ratio can be calculated as,
Log odds ratios may be calculated for the individual items and participants in the experiment, and the resulting values are used as the input to a statistical test.
Two other methods have been used for the analysis of transition matrices: Markov modelling and correspondence analysis. Markov models were introduced on page 196. This method has been used within eye-tracking research by Liu (1998) and others. Correspondence analysis has not been applied often for the analysis of eye-tracking data. An exception is found in Loslever, Popicul, and Simon (2007). Correspondence analysis is an exploratory technique that may be used to see which rows or columns in the table arc similar. Usually, the degree of similarity is visualized in a so-called biplot that shows which rows or columns are at close distance within a two-dimensional space. For an introduction to correspondence analysis, we refer the reader to Green acre (2007).
10.8 Scanpath comparison measures
Scanpath comparison measures are used to estimate the similarity between two or more scan-paths. They draw on the methods for scanpath representation, simplification, and sequence alignment that we introduced in Chapter 8. In this section only measures that take into account the ordinal aspect of scanpaths are listed, and therefore position-based measures like attention map difference, Kullback-Leibler distance, and Mannan distance can be found in Chapter 11, devoted explicitly to gaze position.
Many of the measures described here perform mainly pairwise comparisons, but there are measures that allow for groupwise comparisons as well. For example Feusner and Lukoff (2008) describe a method for calculating statistical significance between groups of scanpath comparisons, even if the basic comparison measure is only pairwise. Moreover, several group-wise similarity measures have been devised using attention map sequences.
10.8.1 Correlation between sequences
Target question Do participants visit the AOls of your stimulus in the same way.
or in the particular order that you predict?
Input representation AOl strings
Output Correlation value 1-1,1]
The correlation between sequences takes two numerical representations of AOI strings and correlates them. The simplest is to correlate the strings of two participants or conditions. An alternative is to have a predicted order and correlate each participant's string to the string representing the predicted order.
Table 10.4 shows fictitious data from a participant looking at two different designs, one after the other, each having nine AOIs (named 1 to 9). Data in the table only include the first entries into an AOI. Calculating the correlation between the observed and predicted order of the AOI entries then gives us an estimate of the accuracy of the prediction. In the example. Design 1 has a correlation value of 0.95 and Design 2 the value -0.017. This indicates that Design 1 better triggered the predicted eye-movement behaviour from this participant.
SCANPATH COMPARISON MEASURES| 347 Table 10.4 Order of predicted AOIs above, and below the fictitious AOI sequences of one participant looking at two different designs.
Predicted 1 2 3 4 5 6 7 8 9
Design 1 2 1 3 4 6 5 8 7 9
Design 2 8 7 2 3 6 1 5 4 9
Suppes (1990) used this measure to compare a normative model of arithmetic problem solving against eye-tracking data, finding average correlation values ranging from 0.583 to 0.874 for the different participant groups, Holsanova et al. (2008) used the measure to compare a serial versus a radial design of information graphics, finding an average correlation of 0.95 between predicted and actual reading behaviour for the serial information graphics, and no correlation for the radial design.
10.8.2 Attention map sequence similarity
Target question Do individual participants, or groups of participants, sequence
the AOIs of your stimulus in a particular order?
Input representation Raw data samples
Output Similarity value
Attention maps are typically static entities. However, attention maps can evolve over time and become map sequences (Wooding, 2002a). Such sequences can be seen as a collection of scanpaths from one or many participants that preserve ordinal information. As such, an attention map sequence is a three-dimensional scanpath function /(5). as defined on page 253.
If we have a still image as a stimulus, we can generate different attention maps, each for a separate time interval along the trial duration. The resulting map sequence gives insight into how the attention map evolves over time, information which cannot be extracted from the tinal attention map collapsed over the entire trial. In fact, initial attention and overall attention can differ significantly in distribution, as Figure 10.32 shows (from Nystrom, 2008).
Since map sequences can be generated from single and multiple scanpaths, both pairwise and groupwise scanpath comparisons are possible. Nystrom et al. (2004) used the harmonic Kullback-Leibler distance (p. 376) to compare map sequences generated from two videos: an original video and a foveated, compressed version of it. Comparing the eye movements predicted by their well-known computational model of saliency to the real eye movements of human observers, Itti (2004) devised a metric that evaluated each recorded data sample on the saliency map, and then calculated the resulting average value over all data samples and frames. Itti found the saliency model to predict gaze locations better than chance. Grindinger, Duchowski, and Sawyer (2010) proposed a similar groupwise measure where fixation samples from all viewers in one group were compared to an attention map built from all viewers in another group. To validate their approach, a classifier was used to lest whether experts' scanpaths were more similar to other experts' scanpaths than they were to novices', and vice versa. Grindinger et al. (2010) used still images as stimuli, and reported that their method achieved higher classification-accuracy than did the string edit distance (adapted for group-wise comparison).
Given two map sequences, any of the attention map based measures in Chapter 11 can be used to compare the maps at a certain time instant (or ordinal position). Then all similarity values can be added to find the overall scanpath similarity. This approach can be generalized to all point-based measures that compare two or more sets of data samples over time.
348 I MOVEMENT MEASURES
Fig. 10.32 The heat maps visualize how recorded data sample positions from seven viewers are distributed over different time intervals when viewing three different versions of an image. The second column shows where attention is located between 300-350 ms, the third column where attention is located between 600-650 ms, and the fourth column shows the overall attention. The photo is reproduced here with permission from the Signal Compression Lab, University of California Santa Barbara.
ABCDEFGH I JKL
ABCDEFGH I JKL
(a) Scanpaths represented by strings A6 C5 F0 II Jl K2 13 and A6 AO FO II Jl L6 13 have a string edit distance of 2.
(b) Scanpaths represented by strings A6 C5 F0 II Jl K2 13 and A6 C6 Fl 12 Kl J2 J3 have a string edit distance of 6.
Fig. 10.33 Examples showing a major weakness of the string edit measure: the sensitivity to AOI borders.
10.8.3 The string edit distance
Target question What is the AOI sequence similarity between two scanpaths?
Input representation AOI strings of dwells
Output String edit distance (string symbols)
Comparing two scanpaths and giving a distance value for them, the string edit measure (known as the 'Levenshtein distance' in computer science after originator Levenshtein, 1966) assumes that both scanpaths are represented with an AOI string of dwells. Both gridded and semantic AOIs are being used with the string edit distance measure (see p. 206 for the distinction between these AOI types).
SCAN PATH COMPARISON MEASURES! 349
Distance is calculated as the minimum number of insert, delete, and substitute operations needed to transform one string into the other. A smaller distance means that fewer transformations have to be made, and therefore the scanpaths should be more similar. For instance, if we have two scanpaths
Si :A6 C5 FO II Jl K2 13 S2: A6 AO FO II Jl L6 13
we need to substitute AO with C5 and L6 with K2. This gives us a total editing distance of two (2) for the comparison of S\ against Sj- If instead we compare S\ to a very different string
S3.
Si :A6 C5 FO II Jl K2 13 S3:A6 C6 Fl 12 Kl J2 J3
we need to substitute the last six AOIs in one string, giving an edit distance of six (6). This pair of scanpaths is thus three times more dissimilar than the first pair.
In order to compare two scanpaths of lengths m and ft, the string edit distance d is often normalized against the maximum string length, which equals the largest possible edit distance.
=1--j—r (10.20)
max(/n, n)
Thus, d varies from 0 to 1, where 1 signifies two identical strings. In our example, we would have a d of 0.71 for the more similar pair, and 0.14 for the less similar pair.
The string edit measure is undoubtedly the most employed measure for pairwise scanpath similarity. In the literature, it has been used to study:
Scene perception versus subsequent imagery For instance, Brandt and Stark (1997) and
Gbadamosi and Zangemeister (2001) both compare imagery theories. First viewing versus a second viewing Holm (2007) studied the role of expectations on the
perception of an upcoming stimulus. Website scanning Josephson and Holmes (2002) tested the scanpath theory with repeated viewings of the same web content. Also see Pan et al. (2004) who investigated behaviour when viewing wcbpages using the string edit method. The validity of different theories about scanpaths Privitera and Stark (2000) and Foulsham and Underwood (2008) compared the fixation sequences of human viewers of a scene versus the scanpaths predicted by saliency-based models of eye movements. Hacisalihzade, Stark, and Allen (2002) investigated the relative role of deterministic and probabilistic influences on scanpaths, and the originators of this research, Choi et al. (1995), directly questioned the "usefulness of using string editing algorithms as a tool to quantify the similarity of fixation sequences for human participants searching a quasi-natural stereoscopic three-dimensional environment". Moreover, the string edit algorithm is implemented in the publicly available software programmes of West et al. (2006) and Myers and Schoelles (2005).
Privitera and Stark (2000) denoted the edit distance 'sequential similarity' and complemented it with a 'locus similarity' measuring the proportion of letters in one siring that were present in the other string, regardless of order. To represent the large number of similarity scores within and between participants and stimuli, they proposed what they call parsing diagrams, where only the average score for each combination was given. Comparing computer generated and human fixations, Privitera and Stark (2000) conclude that the algorithms they test do a poor job at predicting human fixation sequences.
350 |MOVEMENT MEASURES
(a) Pairs of vectors from scanpath (a) in Figure (b) Pairs of vectors from scanpath (b) in Figure
Fig. 10.34 The length of the differential vector in each pair equals the distance between the two arrow tips.
String editing is not limited to data representations based on two-dimensional AOIs, but can be extended to handle vectors. The vector string is calculated based on the amplitude and direction of the scanpath saccades, as described on page 269. In short, each vector is represented by two values: one for amplitude and one for direction. Using the same example as in Figure 10.33, the three vector strings will be:
• Left side thin scanpath: 23 IB 46 34 83 92
• Dashed thick scanpath: 24 IB 47 22 83 A3
• Right side thin scanpath: 0D 3A 46 52 7A E9
Since the example uses hexadecimal values, only 16 possible amplitudes/directions can be represented.
As in standard AOI-string editing, two strings are compared using editing operations, and the result is an editing distance. Since the strings represent real vectors, Gbadamosi (2000) added geometrical costs to the comparisons:
• Insertion or deletion equalled the amplitude of the inserted or deleted vector.
• Substitution of vector it by vector v added a cost equal to the length of the differential vector ||h — v||.
In our example, there are no insertions or deletions but only substitution in the minimal edit distance, so the cost consists of a sum of the differential vectors. Figure 10.34 shows the pairs of vectors. For each pair, the differential vector equals the Euclidean distance between the tips of the arrows. For the pairs in (a), the tip distances are 7.95, 5.83, 12.14, 20.37, 20.6, 21.09, which gives a total editing cost (i.e. similarity) of 87.98. In the (b) pairs, the dp distances are 116.05,106.16,11.09,26.08, 74.10,97.33, with a total editing cost of 430.81.
Gbadamosi (2000) suggests normalizing all distances (d) against the maximum d^ (in our case 430.81) to yield a value span of between -1 and 1, giving the normalized edit cost
Vector string editing handles scanpath shape much better than the classical AOl string editing. The major weakness of the vector string editing method is that it completely ignores spatial position. For example, two scanpaths which produce the same vectors—or angles— when connecting fixations, will give identical similarity scores when using vector string editing, even if the overall scanpath is focused in a different area, if it is spatially scaled, or if the amplitudes between saccades in the sequence arc completely different (see Figure 10.35). Vector string editing per sc therefore needs to be complemented with a position comparison method.
10.33
10.33.
(10.21)
SCAN PATH COMPARISON MEASURES | 351
Rg. 10.35 Pairs of scanpaths where the first fixation in each begins in the top left. Vector-based string editing omits any comparison of position: each scanpath (thick line) and its pair (dashed line) are identical i terms of vectors, but completely different in terms of overall spatial position (left panel), scaling (middle panel), and amplitude between fixations (right panel).
Design issues with the string edit measure
Originally, the string edit distance algorithm by Levenshtein (1966) was developed to be a string comparison method that is extensively used on one-dimensional strings in computer communication theory and genetic research. The method was imported into the eye-tracking world around 1990 by Lawrence Stark and collegues (compare Choi ex at., 1995). Was the import successful? Does the string edit measure correspond to our subjective feeling of similarity? Figure 10.33 shows the scanpaths represented by the strings we used earlier. As we can see, the pair of scanpaths found to have distance 2 appear much less similar than the pair with distance 6.
It is easy to create other examples showing non-intuitive results of this measure. One fundamental weakness of the measure is that it was originally designed for single-dimensional strings and not for a two-dimensional space with its built-in Euclidean distance. Therefore two AOIs at a far distance are considered equally similar to two AOIs in immediate proximity. That is the reason why we constructed the example in Figure 10.33(a) with the AO and L6 fixations in it. In Figure 10.33(b). we placed fixations close to each other, but on either side of a border, to make two very similar scanpaths appear very dissimilar in the string edit measure. In Figure 10.33(a), we placed fixations as far away as possible from each other, yet within the same AOI, to make two dissimilar scanpaths appear similar to the measure.
The weaknesses illustrated in Figure 10.33 are not uncommon, as the probability for two fixations to be located on either side of a border is not very low. Noise and poor precision increase the danger that two fixations that should be at the same position are in different AOIs. However, over many scanpath comparisons, the effect can be expected to be somewhat milder. Also, some users of the string edit measure, for instance Foulsham (2008), ran the same similarity calculations over several different grid sizes (Figure 10.36), to reduce the arbitrariness of the division of space into gridded AOIs.
If semantic AOIs with very varying sizes are used, the intuitiveness of the string edit measure is even lower. Figure 10.37(a) shows two widely different scanpaths that would be classified as identical, even though this is clearly not the case.
Apart from the size of AOIs, the semantic content of the AOI matters. If each AOI is semanlically homogeneous, as in Figure 10.37(b)—where every AOI has only one semantic object and it does not matter where we look—then semantic AOIs give more intuitive results with the string edit measure, than if AOIs contain many different semantic elements. Also, the distance between the AOIs should not be of the same kind as the distance within the AOIs. In Figure 10.37(b), there is no meaningful semantic or spatial distance in the stimulus between the bottle and the glasses that can be expressed in terms of the height of the bottle or the diameter of the glasses. The lack of such between-AOl distances eliminates the possibility that AOIs which seem distant to us are treated by the string edit measure as equally close
352 [MOVEMENT MEASURES
Fig. 10.36 The normalized Levenshtein string-edit distance for two random scanpaths as a function of size of the gridded AOIs. Grid dimensions vary from 1 >[ 1 (the stimulus is a single region where all fixations are evaluated as equal) to 10 x 10 (100 regions). The distance expected by chance increases as a finer grid is used. With kind permission from Springer Science+8u sines s Media: HCI and usability for education and work. Knowledge-Based Patterns of Remembering: Eye Movement Scanpaths Reflect Domain Experience, Lecture Notes in Computer Science 5298.2008, Andreas Holzinger, Figure 3.
(a) Semantic AOIs and two scanpaths (big ar- (b) The string edit measure best suits dearly
rowheads and broken line, versus small arrow- delimited AOIs with a good distance between
heads and full line) that are considered identical them, a generous margin and a single semantic
by the string edit measure. unit inside each AOI.
Fig. 10.37 Poorer and better AOIs when using the string edit measure for scanpath similarity.
SCANPATH COMPARISON MEASURES| 353
together as two AOIs which are in fact separated by only a small distance.
The third important AOI design issue is the proximity of the AOIs. Ideally, when the string edit measure is employed, AOIs should be clearly spatially separated, and have large enough safety margins, as in Figure 10.37(b), to reduce the danger that small variations in gaze position cause large differences in string edit distance.
10.8.4 Refined AOI sequence alignment measures
Target question How similarly have iwo scanpaths sequenced the AOIs'.'
input representation AOI strings
Output Similarity score
The limitations of the string edit algorithm call for more flexibility in adapting semantic and spatial relationships between AOIs, as well as cost parameters to match the specific question under investigation. To some extent the string edit algorithm can be extended by assigning different costs to operations. For example, Hacisalihzade et al. (2002) empirically found that 1 for substitution, 2 for insertion, and 3 for deletion were relevant costs with regard to their experimental questions.
In recent years, more advanced implementations of sequence alignment algorithms have been used for scanpath analysis; in particular the Needleman-Wunsch algorithm (Needleman & Wunsch, 1970) and the Clustal family of algorithms (Chenna et al., 2003). These algorithms use a comparison matrix combined with gap penalties to dynamically find the optimal alignment between two sequences (p. 274).
The Needleman-Wunsch algorithm is at the heart of the publicly available software packages ScanMatch (Cristino et al.. 2010) and eyePatterns (West et al., 2006), and has been used to compare scanpaths when studying decision strategies (Day, 2010). Two implementations of the Clustal software, ClustalG (Wilson, Harvey, & Thompson, 1999) and ClustalW i Thompson, Higgins, & Gibson, 1994), have been used by Fabrikant, Rebich-Hespanha, An-drienko, Andrienko, and Montello (2008) "to systematically compare and summarise individual inference making histories collected through eye-movement analysis", and by Turano, Geruschat, and Baker (2002) and Turano, Geruschat, and Baker (2003) to quantify scanpath similarity during walking.
'ScanMatch' by Cristino et al. (2010) compares AOI strings in which duration has been taken into account by repeating letters in proportion to their fixation durations. It also allows AOIs to be labelled with two letters, allowing more AOIs than the 26 of the English alphabet. AOI strings are aligned with the Needleman-Wunsch algorithm, which appears earlier to have been used for scanpath alignment by West et al. (2006) in their 'eyePattem' software. Given a comparison matrix M and a gap penalty, the ScanMatch algorithm computes a similarity score (d) by summing values along the optimal path through the matrix. Each element in the comparison matrix represents a relationship between two AOIs, for example how far apart they are in space, or their semantic relationship.
To normalize for sequence length (is), the final similarity score is computed as
d"=-tur\ d it i S (1022>
max(A/)max(£spfs2)
giving the best possible match the value 1.
Evaluating the algorithm in three experiments, Cristino et al. (2010) found ScanMatch to be more robust than the string edit distance in correctly classifying similar scanpaths, as well as identifying scanpaths recorded during a specific task.
354 |MOVEMENT MEASURES
Also using the Needleman-Wunsch algorithm, Day (2010) offers a slightly different way to compute similarity by dividing the number of identical letters N; in the aligned sequences by the sequence length Is
d' = ^- (10.23)
10.8.5 Vector sequence alignment
Target question How similarly have two scanpaths sequenced through space?
Input representation Saccade vectors, fixations, and input parameters Output A suite of similarity scores [0. lj
Regardless of the method used to calculate similarity between AOl strings, string edit principles are inevitably limited by the fact that stimulus space is carved up, which only crudely represents the fullness of scanpaths. An alternative to AOI strings is to represent scanpaths with Euclidean vectors.
G-----0-
-——7&
Fig. 10.38 Two example scanpaths where one is spatially shifted in relation to the other. Circle diameter represents fixation duration.
A scanpath can be seen as a collection of vectors, and hence comprise a vector space. This way of representing scanpaths was adopted by Jarodzka, Nystrom, and Holmqvist (2010), Prior to comparison, the authors simplify scanpaths using thresholds for saccade direction and amplitude such that subsequent saccade vectors that go in a similar direction or have very short amplitudes are merged into larger vectors.
On the simplified representations, they propose the following sequential steps to compare two scanpaths:
1. Create a comparison matrix where each value corresponds to the similarity between two vectors in the scanpaths.
2. Create a graph (as in graph theory) from a set of rules defining how the matrix elements are connected. Assign each node in the graph with a weight proportional to the similarity value in the comparison matrix.
3. Calculate the shortest path from the top left element in the matrix to the bottom right element using Dijkstra's algorithm.
4. Align scanpaths along the shortest path such that each vector in one scanpath is matched with a vector in the other scanpath.
SCAN PATH COMPARISON MEASURES| 355
Then similarity is calculated on the aligned scanpaths with respect to five different aspects of the scanpaths: shape (length of vector difference), difference in amplitude between vectors, distance between fixation positions associated with saccades (equals the starting position for the saccade vector), difference in direction between vectors, difference in duration between fixations. Each measure is normalized with its largest possible value, to obtain values on the interval [0, Ij; 0 represents the best possible match and 1 the opposite.
Calculating similarity in several ways, the vector based algorithm has the potential to capture aspects of similarity that would otherwise be disregarded or hidden in the overall score. Figure 10.38 shows an example of two otherwise identical scanpaths, where one has been shifted in relation to the other. Given a 5 x 5 gridded AOl division of space as indicated in the figure, the string edit distance is 1 since no fixations do share the same AOI. However, calculating similarity according to the vector-based algorithm gives no difference in amplitude, direction, and shape, but clearly points out the difference in fixation position (0.15) and duration (0.32) (Jarodzka, Nystrbm, & Holmqvist, 2010).