M7777 Applied Functional Data Analysis 1. Introduction Jan Koláček (kolacek@math.muni.cz) Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University, Brno n Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 1/22 Course Outline and Requirements Outline 1. Introduction 2. Basis Systems 3. Basis Smoothing 4. Smoothing Penalties 5. Constrained Smoothing 6. Exploratory Data Analysis, FPCA 7. Scalar-on-function Regression 8. Functional Data Simulation 9. Function-on-scalar Regression 10. Function-on-function Regression 11. Registration 12. Sparse FDA Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 2/22 Course Outline and Requirements Assumed Knowledge This class will focus on the application of functional data analysis techniques to real-world problems and is not intended to be mathematically technical. However, we will make use of linear algebra and I assume a background in applied statistics on the level of M5120. Computing Software The course will be taught using the fda library in <®. I do not assume knowledge of©, but some programming experience will be helpful, ©is freely available from www.r-project.org. Assessment • Attendance • Homework • Final Project Students are expected to work individually on homework. The project may be undertaken in small groups. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 3/22 What is Functional Data? What are the most obvious features of these data? Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4/22 What is Functional Data? What are the most obvious features of these data? Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4/22 What is Functional Data? What are the most obvious features of these data? sample Time [ms] quantity frequency (resolution) similar trends Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4/22 What is Functional Data? What are the most obvious features of these data? sample quantity frequency (resolution) similar trends the same domain (not necessary) Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4/22 What is Functional Data? What are the most obvious features of these data? sample quantity frequency (resolution) similar trends the same domain (not necessary) smoothness Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 4/22 What is Functional Data? 6 replications, 200 observations within replications FDA involves repeated measures of the same process 3 100 200 300 Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 5/22 What is Functional Data? 6 replications, 200 observations within replications „.••:::m,",,!»££* ..„ "::s.. •V .••vy" sample FDA involves repeated measures of the same process 1 observation = 1 function Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 5/22 What is Functional Data? 6 replications, 200 observations within replications „.••:::m,",,!»££* ..„ "::s.. •V .••vy" sample FDA involves repeated measures of the same process 1 observation = 1 function FDA = Analysis of data that are functions Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 5/22 What is Functional Data? 6 replications, 1401 observations within replications Functional data is often com plicated: • not easily described by mathematical formulae 1000 1500 Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 6/22 What is Functional Data? 6 replications, 1401 observations within replications Functional data is often complicated: • not easily described by mathematical formulae • variation between replications even harder to describe Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 6/22 What is Functional Data? 6 replications, 1401 observations within replications, 2 dimensions ^-0.04-.2 0.04- Functional data is often complex: • often a large number of related quantities Time [ms] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 7/22 What is Functional Data? 6 replications, 1401 observations within replications, 2 dimensions ^-0.04-.2 0.04- Time [ms] Functional data is often complex: • often a large number of related quantities • viewing each replication as a single observation can make the data easier to think about What are these data? Let us plot one component against another! Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 7/22 What is Functional Data? Measures of position of nib of a pen writing "fda". 6 replications, measurements taken at 200 hertz. 0.04 H -0.04 -0.02 0.00 0.02 Position [mm] Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 8/22 What is Functional Data? Data may be measured more noisily St. Johns -A ^—1--- ./ < -v. / • • • • > Mr V* m_/ • • • • ♦ %r * \ .•v • • / Ar*'* • %\ • • >ř •• • * • • < * • • 0 100 200 300 Days Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 9/22 What is Functional Data? Data may be measured more noisily St. Johns 10.0- • • 7.5- • • • • • • • • • • • t • • • • • • • • • Precipitation b < • • • • • : • • • • • • • • • • • • • • • " • •• • • J * •• • • • • • • • • • • • • • —a»^sJ •••: • • • • • • • • •• • • • • • • •/ • • •• • • • • • • • • • - • * • Mi • • • • 2.5- •• • • • • • • • • • * » • • • " • • • • • /•• • • • • • • M • • • • • • • •• • • • • • • • • • ... . • * • " •• > • • • • • • • • • • ( ) 1( 10 200 300 Days Jan Koláček (SCI MUNI) Fall 2019 1 0/22 What is Functional Data? Data may be measured more sparsely Berkeley Growth Data 200 175 150 X 125 100 75 • • • • . • • T * . * . • ' : , . • : « i » i • • • i . ; 5 • • .'"ti • • i « : i • • ! * i > • • • " • . :; • • i • t : : • ■ • • 4 4 • : t s > » • • ■ * • •• • • • • • : • 10 15 sample • 1 • 2 • 3 • 4 • 5 • 6 Age Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 11/22 What is Functional Data? Longitudinal Data 6 patients 1500- -17 -15 -13 -11 -9 -7 -5 -3 -1 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Time since seroconversion Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 12/22 What is Functional Data? We may not have repeated measurements year • 1964 • 1965 • 1966 ■ i i i i i i i i i i i 1 2 3 4 5 6 7 8 9 10 11 12 Time Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 13/22 What is Functional Data? From discrete to functional data - intuition • The term functional in reference to observed data refers to the intrinsic structure of the data being functional; i.e. there is an underlying function that gives rise to the observed data. • Advantages of representing the data as a smooth function: • allows evaluation at any time point • allows evaluation of rates of change of the underlying curve • allows registration to a common time-scale Main idea in FDA: treat the observed data functions as single entities, rather than sequence of individual observations. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 14/22 References • Ferraty, F.f Vieu, P., 2006. Nonparametric functional data analysis: theory and practice. Springer Science & Business Media. • Ramsay, J. 0., Silverman, B. W., 2005. Functional data analysis, 2nd Edition. Springer, New York. • Ramsay, J. 0., Silverman, B. W., 2007. Applied functional data analysis: methods and case studies. Springer. • Ramsay, J. 0., Wickham, H., Graves, S., Hooker, G., 2019. fda: Functional Data Analysis. R package version 2.4.8. https:// CRAN. R-project.org/package=fda • Giles Hooker's course BTRY 6150 http:// faculty bscb. Cornell, edu/^ hooker/ • Kokoszka, P., Reimherr, M., 2017. Introduction to functional data analysis. Taylor & Francis Group. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 15/22 Problems to solve O Install the fda package. © Berkeley Growth Data • load the variable growth from the fda package • plot the first 6 samples for boys and girls separately (see Figure 1) • plot the first 6 samples for boys and girls into one plot (see Figure 2) © Canadian Weather Data • load the variable Canadian Weather from the fda package • plot temperatures measured in Edmonton, Halifax, Montreal and Ottawa (see Figure 3) • plot precipitations observed in Edmonton, Halifax, Montreal and Ottawa (see Figure 4) • plot temperatures for all the places in dependence on regions; into one plot and separately (see Figures 5 and 6) O (optional) Plot other data from this presentation. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 16/22 Problems to solve Berkeley Growth Data 200 175 150 E o 125 100 75 boys girls •< • • 11 !!!!!! ü • ...... • • • < • i 1 • • ! • ! : ! • • 1 • < • • i • • • J ! * / • W sample • 1 • 2 • 3 • 4 • 5 • 6 10 15 10 15 Age [yrs] Figure 1 Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 17/22 Problems to solve Berkeley Growth Data 200 175 150 e ü CD X 125 100 75 • • • • • • •: • Hi !!! i • • • • • i • 1 •:; i • • A 1 !!!'" S » • ill! • ! 1 I • ll iÜ!! i [ i 0 1 5 Age [yrs] Figure 2. Sex • boys • girls Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 18/22 Problems to solve Figure 3. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 19/22 Problems to solve Canadian Weather Data Place • Edmonton • Halifax Montreal • Ottawa 0 100 200 Days Figure 4. 300 Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 20/22 Problems to solve Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 21/22 Problems to solve Canadian Weather Data 20 -20 CD C\5 CD Arctic Atlantic & 20 -20 Continental Pacific Region • Arctic • Atlantic • Continental • Pacific 100 200 300 100 200 300 Days Figure 6. Jan Koláček (SCI MUNI) M7777 Applied FDA Fall 2019 22/22