Štatistická analýza tvaru a obraz Mnohorozmerné štatistické metódy Distance-based PCA Classical PCA Stanislav Katina 1 Ústav matematiky a statistiky Přírodovědecká fakulta Masarykova Univerzita v Brne Prednášky z analýzy tvaru [blok 2], ZS 2011 f evropský 1 sociální MINISTERSTVO ŠKOLSTVÍ, OP Vzdělávání %^J^" MLÁDEŽE A TĚLOVÝCHOVY pro konkurenceschopnost 74NA"S fond V ČR EVROPSKÁ UNIE INVESTICE DO ROZVOJE VZDĚLÁVÁNÍ Stanislav Katina Distance-based PCA Classical PCA Definition (Distance-based PCA; cont. SVD of covariance matrix Zx is defined as follows zx = rArr = y,U xnnJ- Let H = I- (tj117) be centring matrix, then SVD of ZY can be written as zY = Vhy =V (x- 1,4) h (x- r = VxrHxr = rrzxr. If x = (Xi, ...,Xk)T ~ Nk (Mx, Zx), then O E (Yy) = 0 and Var (Yy) = 7/Zx7y = Ay ^ covariance of transformed variables is equal to Cov (Y, Yy) = 7,r5:x7y = Ay7,r7y = 0,/' 7^7, Var(Y,) > Var (Y2) > ... > Var (Y<), £x7y = Ay7y Q covariance of original and transformed variables Cov (X,, Yy) = 7,yAy I Q correlation coefficient p (Xh Yy) = (7^ y/Ay/ (Zx)„) ; i,j =1,2, .../c J Stanislav Katina Definition (Distance-based PCA) The Principal Component Analysis (PCA) finds a set of standardized linear combinations, called principal components (PCs), which are orthogonal and taken together explain all the variance of the random vector Vcx1 (Xh...,Xk)T with £(X) = Mx and Var(X) = Zx, where /xx is k-vector and Zx is k x k matrix. Let Xf,; = 1,2, random sample of k-vectors (the rows of Xnxk), where k < n -principal component transformation is defined as .n be a 1. Then the Xnxfc ^ Ynx/( — (Xpxif 1n/^x) kxk> where r is orthogonal, rrZxr = A = d;ag(Ai, ...A^), a1 > ... > A^ > 0, Ay,7 = 1,2, ...k are eigenvalues of r and 7- (/th column of r) are eigenvectors of r. The7'th PC of Xnxk is defined as7'th column of Ynxk by equation Yy = (XnXfc—1nA*x) 7/' where 7- is the 7'th column of r and is called 7'th vector of PC loadings, and R,y = Y#,;' = 1,2, ...n are PC scores of ;'th individual (R,y Yj is ;'th element of n-vector Yy) Stanislav Katina Distance-based PCA Classical PCA Definition (Distance-based PCA; cont.) 1 Total variance is equal to tr (Zx) = tr (TM7 )=/r(A) = 5>, 7=1 and generalized variance det(Zx) y=i Stanislav Katina Distance-based PCA Classical PCA Definition (Distance-based PCA; cont. If £x = x, Ex = Sx = I E"=i (x> - *) (x> - x)7 = ^XrHX, then sample PCs are defined as follows Ynxp = ^XnXp — 1nX ^ T, sx = r/vfr = EyLi*npj,£y = frsxr. ifx = (xu...,xk)T~Nk(Mx,zx), then Q y = 0 and V&Tjy) = 7y7SX7y = Ay Q sample covariance of transformed variables is equal to Cov (y,-,yy) = 7,7Sx7y = Ay7,r7y = 0,/ ^j, Var (yi) > Var (y2) > ... > Var (yfc), SX7y = Ay7y Q sample covariance of original and transformed variables COV (X;,yy) =7(/Ay O sample correlation coefficient P (x/,yy) = r (x/,yy) = (%>/V(sx)//) : '.V = 1,2, Stanislav Katina Spatial PCA PCA for EEG data Definition (Spatial PCA, cont.) 9 let Žfl=(Be-)Q/2Ž(Be-)Q/2 be the sample covariance matrix of (Be )a/2 y,0, i.e. generalized sample covariance matrix of y,0 • the non-zero eigenvalues of Zs are /y with corresponding eigenvectors g, (PC loadings) 9 Moore-Penrose generalized inverse of Bg/2, (Be ) = Ey A, 7y 7y • the PC scores are */2 f, = g/(B-)a/ y/0;/ = 1,2,...n;v'= 1,2,.../c Stanislav Katina Spatial PCA PCA for EEG data (Katina 2011) Definition (Spatial PCA) let y, represent a k-vector of EEG responses for individual ;',; = 1,2, ...n, measured in k sensor locations on the human head in R3 (projected to R2, in our case) in general, these sensor locations might be different for each individual—but here, we consider their x(1)- and x(2)-coordinates be the same and form a k x 2 matrix X with respect to X, y- are /-coordinates of the surface (x^\x(j2), j = 1,2, ...k. Let y be mean response spatial PCA is generalized PCA, where PCs are calculated with respect to the bending energy matrix Be or its inverse consider a random sample of n surface values (here EEG/ERP values) y; = (yn,yi2,...yik)T, / = 1,2,...n the bending energy matrix Be is calculated for the mean position of the electrodes X (here fixed position X of the electrodes on the head let Z = tjYJYo be k x k sample covariance matrix, where ;th row of Y0 is equal to y,0 = y, - y Stanislav Katina Spatial PCA PCA for EEG data Definition (Spatial PCA, cont.; PCs and PC scores are useful tools for describing the non-affine surface variation in particular, the effect of the jth PC can be viewed by plotting y(cy,v>) = y±CyBea/2a7;/2,ry c-?/2 Cjlj for various values of /}■ e (0, max(|rj|)) (or reasonable magnification of f 71/2 standard max(|rj|); alternatively, fixing q = 1, magnification of /. deviation of PCy scores), where B^/2 = Ey A"/27y7y7 • to emphasize large scale variability (global bending), a = 1 • for small scale variability (local bending), a = -1, and • if a = 0, then we take B° = I as the k x k identity matrix and the procedure is exactly the same as classical PCA • visualization the effect of each PC—grid of gray-scale rectangles with colors corresponding to the surface values with superimposed contours built up based on TPS, where the fixed positions of the electrodes were re-sampled in the convex hull data-space Stanislav Katina Spatial PCA PCA for EEG data Definition (Spatial PCA, cont.; • a PC summary of the surface data y, = y + B?/2 J2 r 1, can be written as y/(PC(/i,...v,)) = y + b?/2 E w = 1, 71,-7(7 and then Ypc^,...;,) is the matrix of y^PC^...^) Stanislav Katina Spatial PCA PCA for EEG data Definition (Spatial PCA, cont.) Consider the null model y = M + e, where y = (yi,y,-,... yk)T, e ~ Nk(0, Zs) Special case of this model can be written as y = , + Bffcigf + £, 7=1 where q ~ A/(0,1), e ~ A/^(0, a2\kxk) independently, ^g7- = 0, gjgy = 1 and 9;Ta/ = 0(/ ?j) -Then ;'=1 Finally, Zs can be estimated by Zs and a2 = J2j=q+-\ h Stanislav Katina Spatial PCA PCA for EEG data Definition (Spatial PCA, cont.; • affine contributions to the variability—an affine subspace PCA on the n x k matrix with the rows yAj,; = 1,2, ...n • non-affine contribution to the variability —a non-affine subspace PCA on the n x k matrix Yam with the rows yNAj, i = 1,2, ...n • in affine subspace,TA stands for sample covariance matrix of y, and spatial PCA is calculated with respect to bending energy matrix B° = I; • in non-affine subspace, we have (Bg )a/2 Zam (Bg )a/2, because pure bending is independent of affine component • to find the affine component we use linear regression model (LRM) y; = 7/3; + c*. where (3, = (yry)~1yry;, y/, /' = 1,2, ...n, are the rows of n x k matrix Y; then yA,i = y/3, is the affine component, finally, we get non-affine component (residuals of LRM), yNAj = y, - yAj Stanislav Katina Spatial PCA PCA for EEG data FP1 FP2 F7 F3 Fz F4 F8 1 2 11 3 17 4 12 T3 C3 Cz C4 T4 T5 P3 Pz P4 T6 01 02 13 5 18 6 14 15 7 19 g 9 10 16 labels Obrázok: Ul 10-20 systém pozícií elektród EEG s k = 19 elektródami Stanislav Katina Spatial PCA PCA for EEG data Obrázok: TPS siet farebných štvoruholníkov s farbami korešpondujúcimi vyhladeným hodnotám plochy superponovanými kontúrami (použitím optimálnej A vypočítanej pomocou GCV) Stanislav Katina Spatial PCA PCA for EEG data Obrázok: Spatial PGA—PGA oflocal bending patterns (outlier Nr.5; upper left), classical PCA (outliers Nr. 12 and 14; bottom left), global bending patterns (upper right), and PCA in the affine subspace (outlier Nr.5, 12, and 14; bottom right) Stanislav Katina Spatial PCA PCA for EEG data 10 20 30 40 50 0 10 20 30 40 50 bending energies (penalties) Obrázok: Histogram, boxplot, and quantile plot of penalties (bending energies; outliers—Nr.14 and 5) Stanislav Katina Spatial PCA PCA for EEG data Obrázok: Iterative process of outlier detection and relaxation in the subspace of first two PCs of local bending patterns with 'curves décolletage'—first PCA (outlier Nr.5; upper left), second PCA (outlier Nr.12 and 14; upper right), third PCA (outlier Nr. 11; bottom left), final PCA (without outliers; bottom right) Stanislav Katina Spatial PCA PCA for EEG data Spatial PCA PCA for EEG data Obrázok: Iterative process of outlier detection and relaxation in the subspace of first two affine PCs with 'curves décolletage'—initial PCA with incorrect relaxation direction (outlier Nr.5, 12, and 14; upper left), initial PCA with correct relaxation direction (outlier Nr.5, 12, and 14; upper right), final PCA (without outliers; bottom) Stanislav Katina GMM Simulations—quint examples with added gaussian noise with added directional noise with added noise - 6 ,1? 'ft. Obrázok: 250 quints generated from a normally distributed sequence of 1000 random numbers—X = (xi,X2,X3,X4), X,- ~ N& (mx,^Isxs), M = (-1,0), /x2 = (0,1), /x3 = (1,0), = (0, -1), and M5 = (0,0), cr2 = 0.001 (left); 9 quints with different random noise (middle, right) Stanislav Katina Euler angles ip, 0, and 4> in degrees (clockwise around x^1'-, x^2'-and y-axis) of original and affine-relaxed surfaces (OLS planes) were calculated from a 3D rotation matrix. Additionally, translation in absolute and relative scale (in the range of y values including whole sample) was calculated as a difference of original and affine-relaxed surface centres. Tabulka: Affine outliers—angles of rotation about particular axes (clockwise, in degrees)—tp about x^'-axis, 0 about x(2'-axis, 4> about y-axis; translation of surface centers in absolute (t.abs) and relative (t.relat; in % of the range of y of the whole sample) scale outliers 4, 0 0 t.abs t.relat Nr. 5 Nr. 12 Nr. 14 -0.65 -12.22 2.00 0.15 -3.24 1.26 -0.07 4.72 -1.38 1.00 -1.37 1.81 13% -17% 23% Stanislav Katina GMM Simulations—quint examples with added gaussian noise o 0. CO o o C§<3M o °0 1 3 o ^ 0 o O a 0 O O MĚMMÍĚř °q ° oo o° °o o c pspll% 0 0 o -1.0 0.0 0.5 1.0 -0.3 -0.1 0.1 0.3 PC1 scores (11.7%) Stanislav Katina GMM Simulations—quint examples Obrázok: Procrustes form space with k x CS (first row) and ln(/f x CS) (second row), k = 1,2,5 Stanislav Katina GMM Simulations—quint examples with added directional noise] shape PC 1 plu: PC1 scores (74.2%) ;-( 23 4 5 6 7 8 9 shape-space PC 1 pli PC1 scores (99.8%) Stanislav Katina GMM Simulations—quint examples with added directional noise -2-10 1 2 123456789 -0.5 0.0 0.5 1.0 PC1 scores (99.8%) with added noise 1 6_1ÍÍ? -2-10 1 2 Stanislav Katina GMM Simulations—quint examples 34 8 -1.0 -0.5 0.0 0.5 1.0 PC1 scores (74.2%) \ 1 . • "1 Stanislav Katina GMM Simulations—quint examples PC1 scores (83.8%) PC1 scores (97.3%) Stanislav Katina GMM Key knowledge O form—information about object geometry that remains after translation and rotation effects are removed Q shape—information about object geometry that remains after translation, rotation, and size effects are removed Q object geometry—2D/3D Cartesian coordinates in k x d configuration matrix X O shape components—affine (uniform) XA, non-affine (nonuniform) XNA [local benging and global bending] Q biological homology—biologically correspondent parts of an organism but point locations with respect to deformation TPS model—landmarks 0 geometrical homology—with respect to some minimization criteria {bending energy of TPS model) between source and target configuration— semilandmarks on curves and surfaces Q vectorization—Vectorized X = (x(1):x(2):... :x(d)) is defined as Vec(X) (x (1) „(2) x(d)), then Xs is n x dk matrix of vectorized Procrustes shape coordinates Vec(XPj) covariance matrix is written as S xp ,• as its rows and its Stanislav Katina GMM Simulations— quint example PC1 scores (81.7%) PC1 scores (60.1%) Stanislav Katina Geometric Morphometries Generalized Procrustes Analysis—Procrustes /c-point registration Definition (Generalized Procrustes Analysis, GPA) Procrustes form coordinates xfi,y =r,-(x,j - t,), where I", is rotation matrix and t, is translation, xfjj are rows of Xfj, /' = 1, ...n. Then we say that X,,; = 1,2, ...n are in optimal position or have the best Procrustes fit in the sense of 'form' if arginf ^ || X,,, - XfJ ||2 = 1k (x)]r, continuous radial (nodal) basis function log (||x||l) ,V||x||2>0 j (x,) = (Xi-xj), i,j =1,2, .../c. Stanislav Katina Geometric Morphometries Bending Energy Obrázok: TPS deformation grid, bending, and bending energy J(f) Stanislav Katina Geometrie Morphometries Affine and non-affine coordinates Relative Warp Analysis Generalized PCA—from shape space to affine and non-affine subspaces 1 Definition (Affine and non-affine coordinates) Regressing each k x d matrix XP)/ (d = 2,3) onto the XP can be defined by the MMLRM (Multivariate Multiple Linear Regression Model) Xp, = Xpßj + e;;ßi XpXp) \TpXPJ,i = ^,2,...n. Let ft = \ ph \pi2) for 2D and # = i/foi/fo J for 3D, then Q affine Procrustes coordinates: XAjj = XP^(5-, Q non-affine Procrustes coordinates (residuals of MMLRM): Xna;i = XP + (XP)/ - XA,,) Stanislav Katina Relative Warp Analysis Generalized PCA—from shape space to affine and non-affine subspaces 2 Definition (Relative Warp Analysis (RWA), cont.) The effect of the y'th RW can be viewed by plotting Vec (XP (c,y, a)) = Vec(XP) ± CjBa/2gfi/2, r} = cfi/2 for various values of rs e (0,max(|rj-|)) (or reasonable magnification of max(|r^|); alternatively, either cv ~ N(0,1) or fixing Cj■■ = 1, magnification of/-' , standard deviation of RWV scores), where b^2 = J2j ^j'^ffj ■ To emphasize O large scale variability (global bending), a = 1, Q sma// sca/e variability (local bending), a = -1, 0 a = 0, then we take b° = i as the d/c x d/c identity matrix and the procedure is equivalent to PCA of Procrustes shape coordinates Stanislav Katina Definition (Relative Warp Analysis (RWA)) If bending energy matrix be is calculated for the mean shape XP, then dk x dk matrix b = \dxd be. Let Generalized covariance matrix with respect to bending energy is equal to s^Q) = (b-)q/2s(b-)q/2, where (b")a/2 J2j a^2ljlj is Moore-Penrose generalized inverse of ba/2.The non-zero eigenvalues of calculated by SVD are /, and corresponding eigenvectors gv (relative warps, RW). Then RW scores ry = gf (b")q/2 Vec (Xs>/), / = 1,2, ...n-j = 1,2,...Jd, where is the number of non-zero eigenvalues (d = 2,3). Stanislav Katina Relative Warp Analysis Generalized PCA—from shape space to affine and non-affine subspaces 3 Definition (Relative Warp Analysis (RWA), cont.) 0 Affine contribution to the variability by performing affine subspace PCA on the covariance matrix SA of n x dk matrix with the rows Vec (XAj), i = 1,2, ...n (which is equivalent to the RWA with a = 0) Q Non-affine contribution to the variability by performing non-affine subspace PCA on the covariance matrix Sw^ of n x dk matrix Xw^ with the rows Vec (XNAj),; = 1,2, ...n 0 Contribution of (a)symmetry by augmenting relabeled and reflected Procrustes configurations to vectorized matrix of Procrustes shape coordinates and performing SVD of S^s Q Size contribution by augmenting vectorized matrix of Procrustes shape coordinates by column of centroid sizes v7^ (ln(CSi),ln(CS„))r, where CS, ||X,|| = tr(XiXf), then n x (dk + 1) matrix of vectorized form coordinates XF = (Xs:xs;ze), and finally performing SVD of SF Stanislav Katina GM vs KM GM neurokránia rýb z rodu belica • neurocrania-roaches Rutilus rutilus and Rutilus virgo (Actinopterygii: Cyprinidae) • R. rutilus (nrr = 30) and R. pigus neurocrania (nrp = 50), 27 measurements Stanislav Katina GM vs KM GM neurokránia rýb z rodu belica Stanislav Katina GM vs KM GM neurokránia rýb z rodu belica ' ' r' * Stanislav Katina Traditional vs Geometrie Morphometries Fish Neurocrania—Rutilus rutilus and R.pigus (Cyprinidae) Obrázok: PCA of inter-landmark distances Stanislav Katina Traditional vs Geometric Morphometries Fish Neurocrania—Rutilus rutilus and R.pigus (Cyprinidae)—Shape Space PCA 0.04 -0.02 0.00 0.05 RW1 scores (50.28%; Stanislav Katina Relative Warp Analysis Generalized PCA—Generelazed PCA for paired data Definition (RWA for paired data) Q Let xpj = Vec (XPj/),; = 1,2,n; n 48 be a 2k-vector of (i):„(2h Procrustes shape coordinates, where XPj/- = (Xpy.Xp)) (x(d) x(d) x(d)) d ,*i2 ) ■■■iAik j > u 1,2 Q Let xD, be 2/c-vectors (k = 22) of matched-pair differences of vectorized Procrustes shape coordinates, xD, = xP 15, - xP 10,, xP,i5,/ = Vec(XP,i5,/) and xP,i0,/ = Vec(XP,i0,/) Q SD be the covariance matrix of the data xD,, Q XPj10 = (x^1)10:x^2)10) = (xi, ...,x.k)T be k x 2 matrix of mean Procrustes shape coordinates x) of 10-year group, j = 1,2,k, then o .11 '-kxk l21 L3x/< l12 i 22 Stanislav Katina Traditional vs Geometric Morphometries Fish Neurocrania—Rutilus rutilus and R.pigus (Cyprinidae)—Form Space PCA Stanislav Katina Relative Warp Analysis Generalized PCA—Generelazed PCA for paired data Definition (RWA for paired data, cont.) 9 9 9 S where L is symmetric positive definite, the inverse of S exists as long as the landmarks are at least four in number, not all on one straight line, and also not in the same place (coincident); then inverse of L exists and is equal to Lr1 ,js - (x) = \\xf2 log (\\x\\22), V ||x||2 > 0, if ||x||2 = 0, 0(x) = 0 k x k matrix Be = L11 is called bending energy matrix of XPji0, 2k x 2k matrix B = l2X2 <8> Be, and 1^Be = 0, XrBe = 0, so the rank of the bending energy matrix is k - 3 then (B )"/2SD (B )"2 is generalized covariance matrix of matched-pair differences of vectorized Procrustes shape coordinates, xD, non-zero eigenvalues are ls with corresponding eigenvectors g, (PC loadings, RWs) Stanislav Katina Relative Warp Analysis Generalized PCA—Generelazed PCA for paired data Definition (RWA for paired data, cont.) • RWscores are defined as r,j = gj (B )a/2 xD,/ • the effect of the jth RW can be viewed by plotting Vec (XP (qj, a)) = Vec(XP,10) ± qBa/2g^/2, rt = cfi/2, q e R+ for various values of r,j e (0, max(|rj|)) (or some magnification of max(|rj|); alternatively, fixing q = 1, magnification of^/2 as standard deviation of PCy scores) • the effect of the linear combination of RW^ and RW2 can be viewed by plotting Vec{XP (ci, c2, a)) = Vec(XP,10) ± c,Ba/2g,?,/2 ± c2Ba/2g2?2/2 9 a PC summary of the shape data 2 2 Vec(Xpi15,,(a)) = Vec(Xpi10)±B^/2^r(/gy = Vec(Xpi10)+^gyg/xDi; 7=1 7=1 Stanislav Katina Relative Warp Analysis Generalized PCA—Generelazed PCA for paired data Definition (RWA for paired data, cont.) Visualization of interpolated shape changes can be done • via thin-plate spline (TPS) deformation grids, 9 field of vectors (within the convex hull of reference shape XPiW, where longer vectors show stronger deformation in the specific direction of the shape change) superimposed with the grid of gray-scale rectangles with colors corresponding to the Procrustes distances (regions showing milder deformation are lighter, regions with stronger deformation are darker; the surface does not show the direction—but only the size—of some shape change) Stanislav Katina Relative Warp Analysis Generalized PCA—Generelazed PCA for paired data Definition (RWA for paired data, cont.) • to find the affine component we use linear regression model kp,10 * dj + x%> = x^/f > + ef \ d = 1,2; / = 1,2,n, • then x{d] = xpiio/3/C° and X*,,- = (xyj:x^J) is the affine component of Xp,; • in affine subspace, SDA stands for sample covariance matrix of xda,i = Vec(XAj) - Vec(Xpiio); then PCA of SD^ is called affine-subspace PCA 9 let XDp = (XD:xs;ze), be an n x (2k + 1) matrix with the rows equal to xdf,; = (xTDJ, \n(CS,))T ,; = 1,2,n, and = (\n(CSi),ln(CS„)). Let SDF be the covariance matrix of the data xDFj,; then PCA of SDF is called form-space PCA 9 the first PC represents allometry—shape change during growth ,(1):„<2h Stanislav Katina Data 2D lateral X-rays—growth after surgery (paired data) Velemínská J., Katina, S., Smahel, Z., Sedláčková, M., 2006: Analysis of facial skeleton shape in patients with complete unilateral cleft lip and palate: Geometric morphometries. Acta Chirurgiae Plasticae, 48,1: 26-32 Velemínská J., Šmahel, Z., Katina, S., 2006: Development prediction of sagittal intermaxillary relations in patients with complete unilateral cleft lip and palate during puberty. Acta Chirurgiae Plasticae, 49,2: 41-46 Katina, S.,2008: Detection of shape outliers with an application to complete unilateral cleft lip and palat in humans. In S. Barber, P.D. Baxter, A. Gusnanto & K.V.Mardia (eds), The Art & Science of Statistical Bioinformatics, pp. 33-37. Leeds, Leeds University Press Katina, S.,2011: Detection of shape outliers for matched-pair shape data. Tatra Mountains Mathematical Publication (accepted) 48 boys, complete unilateral cleft of lip and palate (UCLP), without symptoms of other associated malformations, Clinic of Plastic Surgery in Prague homogenously operated by the same team of surgeons (cheiloplasty according to Tennison, periosteoplasty without the nasal septum repositioning patients monitored during puberty, at the ages of 10 and 15 (born between 1972 and 1978) 22 landmarks (x-rays of the patients' heads, under standard conditions, SigmaScan Pro 5 software) Stanislav Katina Geometric Morphometries 2D lateral X-rays—growth after surgery (paired data) 48 boys, 10 - 15yrs old, 22 landmarks Obrázok: Cleft patients and Design of lateral X-ray (semi)landmarks [Dpt. of Anthropology, Charles University, Prague, Czech Republic] Stanislav Katina Data—15yrs old boys after operation 2D lateral X-rays—growth after surgery (paired data) Stanislav Katina Data—10yrs old boys before operation 2D lateral X-rays—growth after surgery (paired data) Stanislav Katina Geometric Morphometries 2D lateral X-rays—searching biological signal in the data Obrázok: All PCA/RWA models—RW-, ,RW2 subspace Stanislav Katina Results of RWA—form space Results of RWA—form space 10yrs 15yrs 0.0 0.1 -0.2 -0.1 RW1 scores (59.01%) form-spacd PCA 0.2 Stanislav Katina Results of RWA—form space • ,^ . • • Blllllll ^^ffi^^^M IIS • f , • ■SI * * / 1 " I •-'•/ II HIB film WM 1111111ÍI1 " / III IBBllli •'/' V/ ■ Stanislav Katina -0.10 -0.05 0.00 0.05 0.10 0.15 0.2C RW1 scores (36.74%) form-space PCA Obrázok: Form-space space PCA—RWA of SF (RW^ ,RW2 subspace) Stanislav Katina Results of RWA—bending patterns in small scale Stanislav Katina Outlier relaxation using PRM3 Obrázok: Relaxation in Procrustes shape coordinates; TPS deformation grids and field of vectors superimposed with the surface of Procrustes distances of mean shape XP)10 to the shape XP)10)29 (left) and to the final relaxed shape XP,i0,29 (right); 'curve décolletage' of the shape XP,10,29 (x-mean shape XP)10, big »-shape XP,10,29, small ^-relaxed shapes XP,10,29; middle) Stanislav Katina Data—human faces in 2D 2D Facial Analysis—two group differences 20 young women, 19 - 31yrs old, 46 + 26 (semi)landmarks Obrázok: Design of facial (semi)landmarks [Dpt. of Anthropology, University of Vienna, Vienna, Austria] Stanislav Katina Data—human faces in 2D Oberzaucher, E., Katina, S., Holzleitner, I.J., Schmehl, S.F., Mehu-Blantar, I., Grammer, K., 2011: The myth of hidden ovulation: Shape and texture changes in the face during the menstrual cycle. PNAS (submitted) Pfliiger, L.S., Oberzaucher, E., Katina, S., Holzleitner, I.J., Mehu-Blantar The Signal of Fertility. Evidence from a Rural Sample. Evolution and Human Behaviour (accepted) 20 young women (aged between 19 and 31) who reported to have a regular menstrual cycle and did not take any hormonal contraceptives standardized facial photographs-one taken in the ovulatory and one in the luteal phase in a forced choice task, 50 male and 50 female subjects were presented with these photographs of each participant-to pick out the more attractive, healthy, sexy, and likeable, of the two skin patches sized 150 x 150 pixels from the cheek and subjected them to the same forced choice task with slightly modified adjectives 46 landmarks and 26 semilandmarks Stanislav Katina Data—human faces in 2D Stanislav Katina Geometric Morphometries 2D Facial Analysis—searching for biological signal in the data Geometric Morphometries 2D Facial Analysis—searching for biological signal in the data <7f£~ RW 1 scores (28.47%) t t t- » T ^ Obrázok: SAiape space PCA—RWA of S (RW-i ,RW2 subspace) Stanislav Katina Geometric Morphometries 2D Facial Analysis—searching for biological signal in the data Obrázok: Nonaffine space PCA—RWA of San (RW-i,RW2 subspace) Stanislav Katina RW 1 scores (87.63%) Obrázok: Affine subspace PCA—RWA of SA (RW-i ,RW2 subspace) Stanislav Katina Geometric Morphometries 2D Facial Analysis—searching for biological signal in the data Obrázok: Nonaffine space PCA—RWA of sL1) (RWi ,RW2 subspace) Stanislav Katina Geometric Morphometries 2D Facial Analysis—searching biological signal in the data I RW 1 scores (20.76 %) Obrázok: Nonaffine space PCA—RWA of s£~1) (RW-i ,RW2 subspace) Stanislav Katina Results of RWA—shape space RW 1 scores (32.60%) :::::: :m II» Stanislav Katina Results of RWA—estimated shapes, RW1 and RW2 * *.* * V^' Sllll IBäBli V::.' . Art O ■ •*.*. vr/ 11111 SBI1 . *.». • <.» VT-' vT../ 11ÉÉÍ1ÉI mm llliBI lIBSBii i 1 .•* *■. lllllili Bill • «*» ■ .■s x-. IBIlillill ■III Slill msus SHH lllilll ■Bii vr; 111111 j lljlplll V.r.-' ■Hill 11BUÍ vc./ vr./ Blili •..««■/ IlIsfflIÉlS iÍtfflSsllÍP Stanislav Katina Results of RWA—bending patterns with large scale Stanislav Katina Results of RWA—bending patterns with small scale Stanislav Katina Data—human skulls 3D (semi)landmarks • example re-uses part of a Vienna data set of 372 skulls from various collections • 106 human crania (38 adult females, 54 males, 3 juvenile females, 11 juvenile males, 14 unknown sex; from newborns to adults) • Dept. of Archaeological Biology and Anthropology, Natural History Museum, Vienna, Austria • Dept. of Anthropology, University of Vienna, Vienna, Austria • Weisbach collection - acquired and exhumed skeletons of soldiers of the Austro-Hungarian monarchy, sex and age of these crania are known from military records • Hallstatt collection from ossuary in Hallstatt, sex and age are known from the church-books • data - 347 landmarks and semilandmarks - 32 landmark points, 7 ridge curves totalling 161 semilandmarks and 154 surface semilandmarks [5 - base, 184 -face, 158 - neurocranium] • landmark points on both sides of every cranium and semilandmarks (on curves and surface) on the left side of every cranium were digitalized using a MicroScribe 3DX (Mitteroecker et al, 2004, Gunz, 2005) • Katina, S., Bookstein, FL, Gunz, P., Schaefer, K., 2007: Was it worth digitizing all those curves? A worked example from craniofacial primatology. American Journal of Physical Anthropology Suppl. 44: 140. Stanislav Katina Geometric Morphometries 2D Facial Analysis—searching biological signal in the data full shape space pc2+ global bending pc1+ local bending pc1+ local bending pc2+ ovulatory face full shape space PC2 ovulatory face global bending pc1- ovulatory face local bending pc1- ovulatory face local bending pc2- Obrázok: Summary of RWA/PCA analyses in all subspaces of paired shape differences [statistically significant RWs/PCs] Stanislav Katina Data—human skulls 6 norms: norma frontalis, lateralis dex. a sin., occipitalis, verticalis, basilaris Stanislav Katina Data—human skulls (Semi)landmarks of three skull regions Design of the experiment Homo sapiens • „ • • • • f\j* • • • x-coordinates MEAN SHAPE: all specimens, from juveniles to adults Stanislav Katina 3D Form Space PCA » • • 1» • cor(ln(CS), PC2 scores)=0.1248) cor(ln(CS), PC1 scores)=0.9894) -D.2D -D.15 -D.1D -D.D5 D.DC PC 1 scores (18.246%) 3.05 D.1C juvenile-to-adult growth: left-to-right (all 347 landmarks, 3D) Obrázok: PC1 and PC2 scores Stanislav Katina Data—human skulls (Semi)landmarks of three skull regions and euryon variability *»*v"%-'v-. JT'" ' JSl-.-. ■Jw'.'SY'-i-it Stanislav Katina 3D Form Space PCA - ^ *.*•".• * ■ i * * •*."•*» /Ä Obrázok: PC1 and PC2 Stanislav Katina 2D Skulls Předmostí skulls • professionally digitised glass plate negatives of fossil skulls (Předmostí 1 - P1, Předmostí 3 - P3, Předmostí 4 - P4, Předmostí 9 -P9, Předmostí 10- P10) • in the accessible norms: frontal, lateral sin., occipital, basal, and vertical views • the skulls in question are those determined by Matiegka to have been females (P1, P4, P10) and males (P3, P9) • 17 landmarks in the right lateral view • the recent population collection — 103 skulls of known sex (51 males and 52 females) and age from the first third of the 20th century • Katina, S., Šefčáková, A., Velemínská, J., Bružek, J., Velemínský, P., 2004: A Geometric approach to cranial sexual dimorphism in the upper palaeolithic skulls from Předmostí (Upper Palaeolithic, Czech Republic). Journal of the National Museum, Natural History Series 173, 1—4:133-144 • Šefčáková, A., Katina, S., 2008: Geometrical analysis of adult skulls from Předmostí, In: Veleminská, J, Bružek, J, (eds), Fossil hominids from Předmostí nr. Přerov : Old documentation and new reading. Academia, Praha, 87-101 Stanislav Katina Skulls Norma lateralis -0.4 -0.2 0.0 0.2 0,4 Stanislav Katina 2D Skulls Norma frontalis Stanislav Katina 2D Skulls Example of skull from Pachner reference sample [Pachner collection at the Department of Anthropology and Human Genetics of Charles University in Prague (Czech Republic)] Stanislav Katina 2D Skulls PCA Summary RWA in Paleoanthropology—Shape Space Legenda: F-Pachner females (n=52), M-Pachner males (n=51), Předmostí crania-P1 o, P3 ■, P4 •, P9 P10 ♦ O TPS deformation grids and RW scores (RW\ and RW2) - in shape space (identical to PCA in shape space) O TPS deformation grids and RW scores (RW\ and RW2) - in shape space for local changes with large scale (a = 1) Q TPS deformation grids and PC scores (PC1 and PC2) - in form space Q TPS deformation grids and PC scores (PC1 and PC2) - in form space with 95% tolerance ellipses for males and females Q TPS deformation grids and RW scores (RW\ and RW2) - in shape space for local changes with small scale (a = -1) Stanislav Katina RWA in Paleoanthropology-Global Bending Patterns ■ -0.02 0.0C Stanislav Katina -0.04 0.00 0. RW1 scores(17.154% WĚĚm WĚĚĚm ilffiBSSral li iiinmBB Stanislav Katina RWA in Paleoanthropology—Form Space mm H11 .05 0.00 0.05 Stanislav Katina RWA in Paleoanthropology—Form Space m ■ill H Hi ■1 Tis -0.05 0.00 0.05 RW1 scores (28.' 11111 Stanislav Katina 3D laser-scan capture 3D facial shape—VCFS data, differences between cases and controls (paired data) 42 pairs of laser-scanned faces, 23 landmarks, 1664 geometrically homologous semilandmarks on curves and surfaces, 59242 mesh-points triangulated with 117386 faces Ml 1 41 Obrázok: VCFS face, laser-scan, and surface meshes [Royal College of Surgeons in Ireland, Dublin; Face 3D data] Stanislav Katina RWA in Paleoanthropology-Local Bending Patterns „Mm ■JHP lilii -0.5 0.0 0.5 Stanislav Katina Geometric Morphometries 3D facial shape—VCFS data, differences between cases and controls (paired data) Obrázok: Design of facial (semi)landmarks—symmetrized mean shape Stanislav Katina 3D face—first steps of the analysis Data analysis 1 3D face—first steps of the analysis Data analysis 2 Data analysis: • with respect to the analysis of object asymmetry (in our case, facial shape asymmetry), the original coordinates were relabelled and reflected (RR) with respect to midsagital plane (MP) • MP was estimated as an ordinary least square plane of unpaired midsagital landmarks and rotated into (x,y)-plane • for paired (semi)landmarks, the sign and labels were reversed across the left-hand and right-hand side of the head shape • the original PSC together with their RR counterparts were jointly submitted to GPA to register both into the same shape space • both configurations were centered with respect to original and RR Procrustes mean shape, respectively, resulting in original and RR centered PSC • fluctuating asymmetry (FA) expresses how the difference between the original and RR shapes fluctuate in the sample; it is calculated as the sum of squares of individual asymmetry scores, i.e. Procrustes distances between original and RR centered PSC of each shape Stanislav Katina 3D face—first steps of the analysis Standardized views 4$tt^lt?^ W Stanislav Katina Data analysis: • the asymmetry of the means (AM) is calculated as the sum of squares of the Procrustes distances between the original and RR Procrustes mean shape; AM multiplied by sample size is called directional asymmetry (DA) • the PSC were adjusted for age and sex by linear regression model of the form centered PSC,y = sex+age+sex : age + e,j,; = 1,2,... 1664;y = 1,2,3; for further analysis, residuals of this model were used • the direction of case-control difference was found based on the projection of "null shape" to particular PC subspaces; if this fails to negative side of the PC axes cases are on the negative part of the axis as well; if this fails to positive side of the PC axes cases are on the positive part of the axis as well Stanislav Katina 3D face—first steps of the analysis Data analysis 3 PCA for reversible 3D images: • 21 RR centered case-control (semi)landmark differences at the same time • PC scores for original and RR data are equal in absolute values • in this setting, symmetric and asymmetric PCs are separated which simplifies the interpretation • the symmetric PCs are these where PC scores of original and RR data do not have the same sign (they are equal only in absolute value) • the asymmetric PCs are these where PC scores of original and RR data have the same sign (they are equal) Stanislav Katina Geometrie Morphometries 3D facial shape—VCFS data, differences between cases and controls (paired data) 1 cases 1 controls lililillílllíilllíiiilllilllllllillílílll I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I rO^rincOr-QDO)-: "-i-i-CNl-: ^CNCOCOr^eO->-C\IC'0^rincOr---eO—: W Ü3 Ü3 Ü3 ^ ^ W oqoo ü3 Ü3 CO co ^ W ■ ■_i_i,-->-,-->-Q-n_Q-n_,-->-Ü. CL Ü. CL LU LU LU UJÜ. D. U_Lj_U_u_U_u_ wíť„«^.^oc,,™™„tc —1—■—1—■ WW II II II II II II ^ÍTJ^t-J^t-J linear measurements relative scale Ííf J I ÍŤftÍÍíŤÍÍttí-C\l—: : ■ ^CNCOCOr^eO^C\ICO^rincOr--eO—: : : ■ IU_U_U_U_U_W^C\l^cN^cNoOllllll|^ "-2:2:2:2:2:2: 0P0o0° linear measurements Obrázok: Procrustes shape distances Stanislav Katina TPS deformations in topo-colors PCA estimates—control-to-case 3D Euclidean distance, signed distance; x-, y-, and z-axis direction (shape space) Stanislav Katina Geometric Morphometries 3D facial shape—VCFS data, differences between cases and controls (paired data) PC6 PC5 - PC4- PC3 PC2 PCI shape space affine subspace nonaffine subspace Obrázok: PCA of reversible images (original, and relabeled and reflected faces) with projection of shape of zero difference—testing case-control mean difference in particular PC subspace Stanislav Katina TPS deformations in topo-colors and wireframes PCA estimates—control-to-case 3D signed Euclidean distance, wireframes, and transparent visualization (shape space) Stanislav Katina 2-block PLS asymmetric 2-block PLS—traditional focus of the PLS methods—find the directions in Xi that best describe X2 in some way—prediction of dependent variables X2 from independent variables X1 (Martens & Naes 1989, Joreskog & Wolf 1982) symmetric 2-block PLS—low-dimensional linear relationship between two high-dimensional measurement blocks by adapting one single SVD (Sampson et al. 1989, Bookstein 1994, Mcintosh etal. 1996) Stanislav Katina 2-block PLS 9 adapting the SVD to S12 we get S12 = UÄV7, where U is the estimate of dk-\ x dk-\ orthogonal matrix of left singular vectors with the columns jy (J = 1,2,... d/c-i) and V is the estimate of dk2 x dk2 orthogonal matrix of right singular vectors with the columns 72/ (J = 1,2,... dk2) and A is the estimate of dk^ x dk2 matrix of singular values Av on the diagonal (J = 1,2,... dk^). • latent variables (scores) are defined as XSiU,L2 xS2v Stanislav Katina 2-block PLS Let Sx = TjXgXs be the sample covariance matrix and then Sx = S11 S21 S-I2 S22 where k-\ + k2 = k (k-\ < k2) is number of landmarks, k-\ is number of landmarks in the first block and k2 in the second block, Sbb is dkb x dkb sample covariance matrix of the bth block, S12 = Sji is d/ci x dk2 sample cross-block covariance matrix and is equal to »12 —Xq -iX 5,1*5,2 Stanislav Katina 2-block PLS • covariance between y'th column \y of Li and yth column \2j of L2 is COV (\y,\2j) = Ay, the maximum for any pair of such linear combination • each column of U is proportional to the covariances of the block of Xs -i with the corresponding column of the matrix L2 • each column of V is proportional to the covariances of the block of XS2 with the corresponding column of the matrix Li Stanislav Katina 2-block PLS The additional graphical structure becomes available beyond U.v and y.j vectors—scatter-plots of the latent variable scores or TPS grids (2D), or the arrows and TPS morphs (3D) of the form, where we visualise Vec(XP)1) ± cvU.y, Vec(Xp,2) ± c2yV, for the various values of Cy, c2j E M+ (in the range of the particular SW scores or reasonable magnification of this range) Stanislav Katina 2-block PLS SW summary for any q-subset of SWs {SWj,,...SWjq},q > 1 can be written as 71, -Jq Vec (Xp,2,/) SW(/i,...;,) Vec{Xp,2)+ "£l2,ijV.j,i=1,...n, h,-Jq and then Xp^1'"'^ are the matrices of all Vec (Xp ,/>,/) sw(/i Stanislav Katina 2-block PLS A SW summary of the data (from each block separately) in the shape space Vec (XP,V) = Vec (XP)1) + J> A", ;'=1 Vec (XP)2)/) = Vec (XP,2) iJfeA, ;'=1 where (L6)j- = (fa = 1,2) Stanislav Katina 2-block PLS • to visualize a composite shape (both blocks together) we have to scale the singular vectors properly (Mitteroecker & Bookstein 2007) • block-wise matrix of common factor scores I, 'v:|2/ • necessary scaling factor—eigenvectors from SVD of the matrix lyTlv- are ft = (vyi,^)7 • composite singular vectors / £V1U.;- \ 1 V ^ ) Stanislav Katina 2-block PLS (sr) • composite shape Vec(XP) ± c;-f ■ ; for the various values of Cj e M+ (in the range of the particular SW scores or reasonable magnification of this range) • let matrix of composite latent variables (composite scores) be l_(sr) = XsF(sr), {iSs% = liJt the columns of F(sr) be fjsr) (Katina 2008) • SW summary of the data in the shape space Vec (XP)/) = Vec (XP) + 1^ and SW summary for any q-subset of SWs {SWjv ...SWjq},q > 1 can be written as (Xp,i)SWUu...Jq) = Vec (XP) + E/1v..A lrfsr) and then X -k) ' SW(ju...jq) is the matrix of all Vec {^-p,i)SWg^ ■k) Stanislav Katina 2-block GPLS • letSe • then • let be weighted cross-block covariance matrix, (B-y «/2ř(12) (B-y t/2 e(S) a11 e(S) a12 i(S) c(S) '21 '22 (Be)°^2 = a/Z7,T7y (Moore-Pen rose generalized inverse of Bg/2) Stanislav Katina 2-block GPLS let Be be bending energy matrix of XP Bf Bii B21 B12 B22 where k-\ + /C2 = k (/ci < kz) is number of landmarks, k^ is number of landmarks in the first block and /C2 in the second block, Bbb = 0 is the kb x kb bending energy matrix of the £>th block, B12 = Bji is the /ci x /C2 cross-block bending energy matrix dk x d/c matrix B = \dxd ® Be, d = 2,3 Stanislav Katina 2-block GPLS • large scale variability, a • small scale variability, a ■1, • a = 0, then B° = I, the k x k identity matrix then SVD of e(S) a12 UÄV7, let L ) = XgS^Fg^ be the matrix of weighted composite latent variables (composite scores), (Lss^)y the columns of Fg^ be f^rJ Ib,ij, let Stanislav Katina 2-block GPLS Then a SW summary of the data in the shape space is eft-, Vec (XP)/) = Vec (XP) + BQ/2 ^ lB/ff ;'=1 and SW summary for any q-subset of SWs {SWjv ...SWjq},q > 1 can be written as Vec (X*V = VeC (Xp) + B°/2 E W£?>'' = 1> 7lr-Vq i Xp^0'1'-7^ is the matrix of all Vec (XP)/) sw(/i,...;,) Stanislav Katina Symmetric GPLS summary • two shape blocks • one shape block and one block of external variables • shape space • affine subspace • non-affine subspace • non-afine subspace with global and local bending • one or more external variables Stanislav Katina Two shape subspaces and two different 2-block GPLS Following Katina (2008) • affine contribution to the variability—affine subspace PLS on n x kb matrices X^^ with the rows x^,, /' = 1,2,...n;o = 1,2 • non-affine contribution to the variability—non-affine subspace PLS on n x kb matrices XNAjb with the rows *NA,bi, i = 1,2, ...n Two different 2-block GPLS • if we have two shape blocks—Procrustes shape coordinates are pre-multiplied with (Bg )°^2 of XP (Procrustes mean of the composite shape) • if we have one shape block and one block of external variables—Procrustes shape coordinates of the shape block are pre-multiplied with (Bg )a^2 of XPl (Procrustes mean shape) Stanislav Katina Results of GPLS TPS grids of shape block vs "attractiveness" block in all shape subspaces most attractive face least attractive face Stanislav Katina Results of GPLS 3D warps of shape block vs SOFA scores in three different shape subspaces Statistical inference in shape analysis Outline Statistical inference in shape analysis One-sample Multivariate Inference Definition (One-sample tests) Let Vec(Xpi,),; = 1,2, ...n, be the random sample from population with vectorized Procrustes mean shape Vec (fiP) estimated by xP and covariance matrix ZP estimated by Sx. Let Vec(XPJ) ~ Ndk (Vec (Mp), ZP), / = 1, ...n. The null hypothesis is defined as: the Procrustes mean shape fiP is equal to the Procrustes mean shape fi0, so H0 : £iP = fi0, Hi : fiP ^ fi0. If H0 holds, Hotel ling T2 test statistic is equal to ST2 Fs,n- where s = min(dk, n - 1), and s -2 T2H = (xP - Vec (Mo))r Sx (xP - Vec (/*„)) = J2f is square of Mahalanobis distance between xP and Vec (n0), where is Moore-Penrose generalized inverse of Sx; Stanislav Katina For one-, two-sample, and paired hypotheses about shapes, there are the following tests O one-sample Hotel ling T2 test, one-sample Goodall F test Q two-independent sample Hotelling T2 test, modification of Nel-Van derMerwe test for the multivariate Behrens-Fisher problem, and two independent sample Goodall F test, Q paired Hotelling T2 test and paired Goodall F test Q Mardia test of object symmetry Moore-Penrose generalized inverse of symmetric square matrix A, let say A , is inverse, where following equation holds A AA = A, so 7=1 where 7- are eigenvectors of matrix A corresponding to eigenvalues \j > 0, where j = 1,2...s < kd. Stanislav Katina Statistical inference in shape analysis One-sample Multivariate Inference Definition (One-sample tests; cont.) Tj0 = 7. (xP - Vec (no)) is they'th PC score for the difference (xP - Vec (£i0)), j =1,2,s. High values of?20/\j indicates the direction of high shape variability associated with xP in y'th PC. The test statistic T„ can be modified with respect to any subset of PCs as T2H = (xP - Vec (Mo))r (Sf (/l-J')) (xP - Vec (Mo)) = ^ 1 Jl,---Jq Ay pc(Ji,...jq) J2pc(jf jq) ^npj's *ne covariance matrix estimated by where S any q-subset of PCs {PCh, PCh,PCjq} ; q > 1. If covariance matrix ZP = a2\ and if H0 holds, Goodall test statistic FG = n(n- 1) cř|(xP,/x0) Fs,n- ELi^(xP,,-,xP) which is the special case of Hotelling T2 under the isotropy. Stanislav Katina Statistical inference in shape analysis Two-sample Multivariate Inference Statistical inference in shape analysis Two-sample Multivariate Inference Definition (Two-sample tests) Let Vec(XPj), i = 1,2, ...rij ,be the random sample from population /,/ = 1,2, I with vectorized Procrustes mean shape Vec (mpj) estimated by xPj- and covariance matrix ZP estimated by common covariance matrix Su = (niSx,i + n2Sx,2) / + n2 - 2), where sample covariance matrices SXj = ^XpyHXpj, Xpj is rij x (dk) matrix of Vec(XPji) as the rows. Let Vec{XPJ) ~ Ndk (Vec (Mpj) , ZP) ;/ = 1,2; / = 1, ...n. The null hypothesis is defined as: the Procrustes mean shape fiP^ is equal to the Procrustes mean shape pP2, soHo : MP 1 = Mp 2> ^1 : Mp 1 7^ Mp 2- 'f Ho holds, Hotel ling T2 test statistic is equal to ř7iř72 ("i + n2 - s (ni + n2) (ni + n2 - 2) s where s = min(dk, ri\ + n2 - 2), and 1) j2 i H s,n-|+n2 —s—1, s -2 Th = (Xp,i — Xp_2) Sy (Xp_i — Xp_2) = ^2 -ár, Stanislav Katina Statistical inference in shape analysis Paired Multivariate Inference Definition (Paired tests) Let Vec(XPji)J = 1,2,; = 1,2, ...n, be the random sample from population with vectorized Procrustes mean shapes Vec (mpj) estimated by xPj- and covariance matrices ZPj- estimated by SXj. Let Vec(XPJ) ~ Ndk (Vec (nPJ) , ZPj) ,/ = 1,2, / = 1, ...n. Let Vec(XDj) = Vec(XPiy - XPi2;), / = 1,2,n, be a random sample of the coordinate differences of one object with coordinates measured two times and then Vec(XDj) ~ Ndk (Vec (fiD), ZD). The estimates of parameters are xD and SD. The null hypothesis is defined as: the Procrustes mean shape nD is equal to the Procrustes mean shape fi0, so H0 : £iD = fi0, Hi : fiD ^ fi0. If H0 holds, Hotel ling T2 test statistic is equal to ST2 /"s,n- where s = min(dk, n - 1), and Stanislav Katina Definition (Two-sample tests; cont.) Tjo = 7/ (xPj1 - xpj2) is the /th PC score for the difference (xP - xPj2), / = 1,2,s. High values of?20/\j indicates the direction of high shape variability associated with observed group difference xPj1 - xPj2 in /th PC. The test statistic T„ can be modified with respect to any subset of PCs as ?2 ~I~H = (xP1 — xP2)7 (S[j (xP1 — xP2) = ^2 Jr- where S pc(Jl,-iq) pc(Ju-Jq) \fíp] is the covariance matrix estimated by any q-subset of PCs {PCh, PCh,PCjq} ; q > 1. If covariance matrix ZPj- = a2\ and if H0 holds, Goodall test statistic n-l +n2-2 ď|(XP,1 -XP,2) G «r1+"2_1 E"2idl(xp,i/.xp,i)+i:"Jidl(xp,2/,xp,2) which is the special case of Hotelling T2 under the isotropy s.("1+"2-2)s> Stanislav Katina Statistical inference in shape analysis Paired Multivariate Inference Definition (Paired tests; cont.) T2 = (xD - Vec (Mo))' SD (xD - Vec (Mo)) = Ey=i f: /}o = 7/ (xd - Vfec (/x0)) is the /th PC score for the difference (xd - Vec (m0)),/ =1,2,s. High values oi?2Q/\j indicates the direction of high shape variability associated with xD in /th PC. The test statistic T„ can be modified with respect to any subset of PCs as T2 = (xD - Vec(„0))T (S^1-••*>)" (xD - Vbc(/x0)) = Eyi,...yQ |, where Epc^ yQ) ^npj's *ne covariance matrix estimated by any 5pc(/i,...y,) °d q-subset of PCs {PCyi, PC,2,PC,,} ; q > 1. If covariance matrix ZD = o2\ and if H0 holds, Goodall test statistic FG = n(n-1) dp (xD,Mo) s,n- E"=1Cf| (XD,;,XP) which is the special case of Hotelling T2 under the isotropy. Stanislav Katina Statistical inference in shape analysis Confidence and Tolerance Ellipsoids Example 27 PCA a testovanie hypotéz Definition (Confidence and Tolerance Ellipsoids) If k > 1, then the generalization of (1 - a)100% confidence interval (CI) for H is (1 - a) 100% confidence set (CS) for fj. CS Jmo:(X-Mo) S 1 (X - M0) < ^ _ ^ FM_fc (q)| Then Pr [CS n {/^} ^ 0] = 1 — a. We can calculate realization of (1 - c CS. It is confidence ellipsoid (CE) centered in x. The direction of ellipsoid-axes is parallel to eigenvectors 7y of S (Ay are particular eigenvalues). The length of ellipsoid-axes visualized from the center x is equal to ±4 A. '{n-k)n (1 -a),j= 1,2,...*. These CEs (in one-, two-sample, and paired case) can be applied to: (semi)landmark coordinates and PC scores. Multiplying Fk,n-k («) by n we get tolerance ellipsoid (TE). \ Stanislav Katina Example 27 PCA a testovanie hypotéz 9.2) Vypočítajte vlastné čísla a vlastné vektory kovariančnej matice Sx centrovaných procrustovských tvarových súradníc. Použite funkciu eigen (). Skontrolujte, či majú všetky vlastné vektory jednotkovú dĺžku. Škálujte vlastné čísla ich sumou, vynásobte 100 (zaokrúhlite na dve desatinné miesta) a kumulatívne ich zosumujte. Použite fukncie sum () a cumsum (). Zobrazte skóre PC,- vs PCy, j = 1,2,3; /' < j v rozptylových grafoch (rozsahy všetkých grafov škálujte rovnako) spolu s 95% tolerančnými elipsoidmi. Vypočítajte priemerné procrustovské súradnice pre samice a samcov v podpriestore PC1 (spätnou projekciou skóre do tvarového priestoru -viď. slajdy o klasickej alebo zovšeobecnenej PCA), extrapolujte 3x. Porovnajte obrázky s (9.1) a interpretujte použitím matematicko-štatistického pojmového aparátu. f * "•*^ • females Stanislav Katina Example (DÚ 9) Majme dáta gorf .dat a gorm.dat, ktoré sú v knižnici shapes a predstavujú súradnice k = 8 landmarkov na lebkách n = 30 samíc a n = 29 samcov goríl (Gorilia gorilia). Pokrač. príkladu 7. 9.1) Registrujte súradnice landmarkov gorf. dat a gorm. dat do spoločného tvarového priestoru pomocou GPA a aplikujte algoritmus výpočtu rotácie do smeru najväčšej variability z DÚ7. Použite funkciu procGPA (...) $rotated (GPA, kde výstupom je pole rozmeru 8 x 2 x 59 procrustovských tvarových súradníc). Vypočítajte priemerné procrustovské súradnice pre samice a samcov, deformujte súradnice samíc na samcov a naopak, extrapolujte 3x. Stanislav Katina Example 27 PCA a testovanie hypotéz Obrázok: TPS deformácie samcov na samice a naopak samíc na samcov; priemerné procrustovské tvary (horný riadok), odhadnuté priemerné procrustovské tvary v podpriestore PC1 (dolný riadok); extrapolované 3x Stanislav Katina