PV251 Visualization Autumn 2024 Study material Lecture 1: Introduction to visualization This course aims to present the general rules of visualization and techniques for their design and implementation. The first lecture is focused on basic definitions and understanding of the complexity of the visualization field. Then it presents a brief history of visualizations, its relation to other research fields, and the visualization pipeline. The last part contains basic information about human perception and its relation to visualization. Definition There are several possible definitions of visualization. The general one can be: Displaying a given information using a graphical representation. Other possible definitions: „Transformation of symbolic into geometric“ [McCormick et al., 1987] • „… finding the artificial memory that best supports our natural means of perception.“ [Bertin, 1967] • „The use of computer-generated, interactive, visual representations of data to amplify cognition.“ [Card, Mackinlay, Shneiderman, 1999] • „The purpose of computing is insight, not numbers“ [R. Hamming, 1962] • „…to form a mental vision, image, or picture of something not visible or present to the sight, or of an abstraction; to make visible to the mind of imagination“ [Oxford Engl. Dict., 1989] • Tool to enable a User insight into Data The goal is to convey the given information in the most informative and intuitive way. Visualization surrounds us everywhere, on a daily basis. Therefore, we perceive it mostly as something natural. Perceiving visualization is mostly based on individual experience and knowledge, however, visualization design should follow some basic rules which will form the content of this course. We need to understand what is understood as visualization and how to use it in an efficient way. Why creating visualizations There are many reasons for that. We will focus only on those closely related to our field. Visualization enables: • Enhance the decision process • View onto data in a broader context • Interpret the data • Present ideas and results, attract attention • Inspire the others • Entertain, educate, … There are three main functions of visualization: • Information storage – recording given data (e.g,. photos, images, paintings, blueprints, …) • Analysis of information – data processing and evaluation, interaction quality evaluation • Conveying information – sharing data between communicating parties, their mutual cooperation, highlighting important aspects of the data Visualization is very important namely because it utilizes sight as one of the main senses to understand the conveying information. Visualization is everywhere – on streets, in public transportation, television, newspapers. We can play a role of a passive observer, or we can actively search for and interact with visualization – maps, weather forecast, stock market exchange, etc. Visualization helps to improve the decision processes and more precisely, correctly, and quickly understand the data content and context. Examples of importance of visualization We can find huge number of examples, one of the most typical ones is the following: Classical table representation of individual data items is hard to understand the trends in the data. If the table is much larger, it is even impossible. But when we plot the data to a graph representation, the interpretation can be done instantly. Moreover, the data size growth does not influence the interpretation of the data. Another example can be the management structure in a big corporate. This information can be conveyed using textual description, but its understanding will be complicated and very long, with many possible mistakes in interpretation. But when the same information will be displayed as a connected graph, the orientation in the structure of the company management is easy and straightforward. Big data nowadays One of the main reasons for studying visualization is the data growth every year. Such amount of data has to be somehow processed and analysed, otherwise there is no reason for generating and keeping them. For illustration, in 2002 there were 5 exabytes (1018 bytes) of new data generated, in 2006 it was already 161 exabytes. According to a research study from University of South California, published in 2011 in Science, in 2002 for the first time the amount of digital data exceeded the amount of analogous data. In 2007 there was 94% of all data in the whole planet in a digital form. Goals of visualization research The main goals are: • To understand how a person perceives the visualization and how this is related with his or her mindset. • To design and create principles and techniques corresponding to the understanding. This helps us to create “efficient” visualizations targeting the processes in the human brain and so increasing the speed of perceiving and understanding of the conveyed information. Wrong data interpretation Wrongly selected data representation can cause wrong perception of the information. Here is an example: All four graphs are showing the same information, only the scale in the x and y axes is changing. Graph (a) represents uniform distribution on both axes. But the scale is selected wrongly, as it does not correspond to the range of the displayed data. Therefore, the data items are heavily overlapping, and the user cannot interpret the content of the dataset correctly. If we change the scale with respect to the data range only on one axis (graphs (b) and (c)), the interpretation will be completely wrong because the graph is misleading. Finally, graph (d) shows the correct representation of this dataset, with reasonable scale with respect to the data range. This example shows that simple change of parameters of the same data can lead to completely different interpretations. History of visualization Visualization is a very old discipline. More than 30 years ago, it was stated as a new research discipline and first visualization conferences appeared in 1990. First remarks related to visualization (based on intuition) can be dated to the period 15 – 130000 B.C., when first cave paintings were created in the Lascaux cave in France. The advantage of image representation is that it does not need any formalization, as it is in the written representation, where we must have some preliminary set of rules. Visualization comes from the natural human perception. The participants of the visual communication do not have to set the rules at the beginning, this is completely intuitive. Of course, this does not stand for the abstract paintings of the modern art era. ☺ Images were projected to the first types of writing as well. The oldest written document is considered to be the Kish limestone tablet coming from Mesopotamia (3500 B.C.). One of the most famous image-based writing systems are hieroglyphs (3000 B.C.). The main reasons for creating visualizations were mostly practical – travel routes, religion, communication. One of the pieces of evidence of that is the Peutinger map of the Roman empire: In 1137 in China, there was the first geographic map created, which used the Cartesian coordinate system. Lines are representing longitude and latitude. One of the most famous examples of successful usage of visualization is the case of cholera epidemic in London in 1663. John Snow created the following map where each rectangle stands for one victim in a given house on the street. This map helped to reveal that the highest number of victims was located close to the city water pump on Broad Street. Closing the pump led to solving the epidemic, which caused death of more than 500 people. Details can be found in the book of John Snow On the Mode of Communication of Cholera (available online at http://books.google.cz/books?id=- N0_AAAAcAAJ&printsec=frontcover&hl=cs&source=gbs_ge_summary_r&cad=0#v=onepage &q&f=false). The appendix of this book also contains the names and addresses of all victims, including the description of the progress of their disease. This confirms the importance of using the map representation for their interpretation. This story was so catchy that in 2011 they created a movie: (http://www.imdb.com/title/tt2061801/, http://www.snowthemovie.com/crew.html). There is also another book related to this story: http://en.wikipedia.org/wiki/The_Ghost_Map. (Information from Tomáš Marek) One of the most typical usage of visualization has been astronomy. The observers were visualizing the moon phases or movements of planets: Visualization was successfully used for conveying the progress of the Napoleon troops when invading Moscow. The map shows the progress of the army towards Moscow and losses on the way. The color represents the direction, the bottom part of the visualization contains the important information about the temperature in given stages of the march – in fact low temperatures were the main reason of deaths of French soldiers as they were not prepared for such freeze. Another interesting example is the graph produced by Florence Nightingale (1820 - 1910), English social reformer and statistician, and the founder of modern nursing. Her graph shows the mortality level in army within one year (April 1854 – May 1855), along with the causes of deaths, marked by colors (blue = sickness, red = injury, black = other). Nightingale based her work on the graphs designed by Playfair. Blue parts represent deaths caused by diseases, which could be eliminated by improving the healthcare. This graph was presented to the Queen Victoria and Florence was the first pioneer who managed to convince about the necessity of change using visualization. Visualization today Nowadays, visualization serves namely as a practical tool for conveying desired information. For that it is necessary to use different levels of abstraction of data representation, both from qualitative and quantitative point of view. A typical example of this is a Tokyo metro network map: Even though this map is highly abstracted, it serves well for its purpose and any additional information (e.g., highlighting of streets) will be misleading. On the other hand, if we are planning a walking route from site A to site B, the classical map representation will be more feasible. Such maps, showing individual streets and their names, crossings, rivers, parcs, etc., help us to understand the surface information and make the correct route planning. Here we should be aware of the fact that maps represent a special case of visual representation with certain degree of inaccuracy according to their scale. This is caused by the spherical shape of the planet and its projection to plane. It is obvious that the smaller area the map covers, the smaller distortion it has. Data can be visualized very precisely, as in the following example: One can argue that this cannot be considered as visualization. But on the other hand, text, and numbers can be taken as visual representations as well, similarly to tables and graphs. In fact, they represent given data. This particular “image” shows the US national debt on January 22nd 2006. Another very useful example of utilizing visualization is the record of heart beats (electrocardiogram). On the left side is the record of a healthy adult person, on the right side is the record of 83 years old man with high blood pressure. Nowadays, visualization is used in a variety of areas. Visualization enables to show different types of objects, such as different datasets, algorithms, results of computations, processes, etc. More and more often visualizations are interactive, when the user can react on the displayed information and individually navigate himself or herself in the scene. This interaction is most often performed as direct interaction with the graphical interface of the application, instead of using traditional menu. Visualization plays a crucial role in the following fields: • Medical data (VolVis) • Flow data (FlowVis) • Abstract data (InfoVis) • GIS data • Historical data (archeology) • Microskopic data (molecular physics) • Macroskopic data (astronomy) • Big data Relationship between visualization and computer graphics Originally, visualization was considered to be the subfield of computer graphics, because it uses the CG principles to display the information. Computer graphics here serves as the communication channel. This relationship can be viewed from the other side as well. In all types of visualization, we can find basic graphical primitives, such as points, lines, polygons, or volumes. Computer graphics focuses solely on processing these primitives, but visualization goes beyond – it takes into account the content of the data visualized and their properties, such as spatial position, physical properties, etc. This leads to the definition of visualization as the application of computer graphics to data representation when we are mapping data to graphical primitives and render the resulting images. On top of that, visualization integrates many other research disciplines, such as human-computer interaction, perceptual psychology, databases, statistics, data mining, machine learning, etc. To summarize, we can claim the following: Computer graphics focuses primarily on creating interactive images and 3D objects and the primary goal is to get a realistic result. Typical CG fields are art and entertainment (games, movies, advertisement, etc.). Visualization, more than on realistic view of data, focuses on effective communication of information. Computer graphics and visualization share a variety of concepts, tools, and techniques, but differ in the basic model (the information to be displayed) and in particular in the goal (what the user expects as the output). The process of visualization The basis of the new visualization design is to analyze the available input data and the user's expectations from the resulting visualization (output requirements analysis). The goal of the result is to explore the data, confirm the hypothesis, present the result (conference, ...) etc. Interesting results are usually the various anomalies occurring in data, clusters of data (defining their similarity) or trends (predictive models). To display the data, it is necessary to define its mapping on the screen. One important aspect is the possibility of interactive manipulation at all stages of the process. This is especially important because of the subjective perception of visualization and its "quality". There is no definition to ensure that the rendering is "effective". It is therefore important to allow the user to influence the outcome of the process whenever possible. CG Pipeline The classic pipeline in computer graphics consists of the following phases: Modeling – in the first phase a 3D model consisting of graphic primitives is created and is located in the global coordinate system. Viewing – defines the position, direction and orientation of the virtual camera in the global coordinate system. All vertices of the 3D model are then converted into the coordinate system of the given by camera parameters. Clipping – here the boundaries of the intended image are specified, and objects beyond these boundaries can be removed. Objects that cross the border can be trimmed. Additionally, objects can be converted to normalized view coordinates, which greatly simplifies the trimming process. Hidden surface removal – removing hidden parts (polygons) that are not visible from the viewpoint of the camera (back faces, polygons hidden behind other ones). Projection – in the projection phase, 3D polygons are projected onto the 2D projection plane using, for example, a perspective transformation. The result is displayed in a normalized 2D coordinate system of the screen. Rendering – the rendering phase assigns to each pixel the corresponding color - depending on the color of the polygons, their transparency, luminosity, position, etc. This is solved, for example, by raytracing. Data entering the visualization process can be obtained in various ways, such as CT / NMR data, various types of simulation (e.g., flow simulation), modeling, and other methods. This data is then processed (filtering, oversampling, selecting a specific part, or derivation, interpolation, ...). The data is then mapped into a viewable form, such as a geometric model. In the last phase the principles of computer graphics are used, and the result is displayed on the screen. Visualization pipeline The visualization pipeline is similar to a graphical one at a higher level of abstraction. But it has its own specifics. The phases are as follows: Data modeling – preparing data (from file, database, ...) for visualization. This means, for example, preparing data in a format that allows quick access to those data. Data selection – the data selection is similar to the CG pipeline clipping phase, where we select a subset of the data that should be visualized. This phase can be controlled automatically, can be left fully on the user, or these approaches can be combined. Data to visual mappings – the most important phase is mapping data to graphical entities or their attributes. Some parts of the data can control the size of the object, for example, while others can define the position or color of the object. This phase often integrates additional pre-processing of data that precedes self-mapping, such as scaling, shifting, filtering, interpolation, etc. Scene parameter settings (view transformations) – here we can set scene parameters such as color scheme selection, lighting, or sound. These parameters are relatively independent of the data. Rendering or generation of the visualization – in the final stage, the visualization itself is created. The selected projection depends on the mapping performed, may include, for example, shading or texture mapping. Most visualization techniques are sufficient only with drawing lines and uniformly shaded polygons. In addition to displaying data itself, most visualizations provide a variety of additional information enhancing the interpretation of data, such as displaying graphs or general annotations. Human perception A proper understanding of human perception is the foundation of every good visualization design. The first study of human perception focused on the visual system and its capabilities and limitations. Further research has focused on the area of cognitive senses and the ability to recognize (that is, the involvement of psychology in the whole process of visualization). One definition of human perception: The process of interpreting the surrounding world and shaping its internal representation. It is due to internal representation that there are many inaccuracies and misinterpretations. These can be of dual origin - they are a mistaken perception or targeted misinterpretation. The second option leads to popular optical illusions. Optical illusions are basically incorrect or confusing perceptions of reality. It is due to the poor interpretation of the brain when one sees something that is not in the picture at all. The rectangle in the middle has the same shade of grey in its entire width. When you move your head from and to the picture, the circles appear to rotate. In fact, boxes A and B have the same color. Users interact with visualization based on what they see themselves and how they interpret it. Therefore, a proper understanding of the vision process helps to produce a better view. About half of the brain works with visual perception, which is mostly processed in parallel and continuously – e.g., color, texture, movement. Approximately 8 percent of men are color blind (Dalton) or have a similar visual defect, suggesting that high quality visualization software should allow to change the color of the display data. The image of the monkey on the left is an example of red-green color blindness, on the right, normal color vision. One of the major problems that visualization faces is the limited ability of the human eye. Therefore, this must be taken into account in the visualization process. A high-quality image can be stimulating, but if it contains ambiguities, it is almost useless. The main finding is that it is not worth to map the data values to graphical attributes which the human eye cannot properly process and quantify (unless we do not directly visualize the optical illusion ☺). Perception in the context of visualization We will now focus on the influence of color, texture and movement on the visualization process. Color Color is one of the most common parts of visualization design. More sophisticated visualization methods allow the user to control the difference between the individual colors according to their subjective perception. It includes: o Color balance - uniform color distribution throughout the scale used. o Distinction - in a discrete color collection, each color is equally well distinguishable from others (no color is "easier" or "worse" identifiable). o Flexibility – colors can be selected from anywhere in the color space (i.e., the technique is not limited to selecting green or red shades only). There are several basic color spaces that are widely known – e.g., RGB, RGBA, CMY, CMYK, HSV (hue, saturation, value), HLS. The CIE LUV space corresponds to the subjectively perceived differences in intensity between the shades of color. CIE Lab perceived color is determined by coordinates in the 3D color space. But let's focus on the less common and familiar color space where Healey and Enns have shown that it is important to control the distance of colors, linear separation and color categories. An example of use is shown in the figure showing historical climate records over the eastern part of the United States where the color represents temperature (blue and green = winter, red and pink = heat). Luminosity refers to wind speed (lighter = stronger wind), orientation is mapped to collisions (greater deflection corresponds to stronger precipitations), size indicates cloudiness (greater = greater cloudiness), frost frequency is mapped to density (denser = higher frost). Texture Texture is often perceived as one of the features of visualized data. However, similar to color, it is possible to break the texture into more parts perceived by users. Computer vision distinguishes texture properties such as regularity, directionality, contrast, size, and roughness. Texture can be used in many interesting and unconventional ways. One technique is to introduce human perception into individual dimensions of data attributes. This will result in the visual appearance of such a texture depending on the input data. Examples: Grinstein et al. used for visualization of multidimensional data a simple character sketch, whose limbs encode the values of the attributes stored in the data elements. When we place these figures on the entire display, they create textured patterns whose spatial arrangement, clustering, and boundaries correspond to correspondence between attributes. The so-called Chernoff faces share a similar concept (http://graphics8.nytimes.com/images/2008/04/01/science/0401-sci- PROFILE.lg.jpg). Ware and Knight designed Gabor's filters that change their orientation, size and contrast based on three independent data attributes. Movement Movement is the third visual feature that can be perceived very well. Motion is used in many areas of visualization, such as particle animation, color change animation, or pictograms to display the direction and size of vector fields. Like color and texture, we are interested in the identification of perceptual dimensions of motion and its effective use. The following four motion features have been extensively studied by psychophysical experts: vibration, flickering, direction, and speed of motion. From the visualization point of view, we are interested in flickering about F frequencies, which are perceived by the observer as discrete flashes. Many studies have been conducted on the utility and usefulness of visualization, such as Nakayama and Silverman, Driver et al. and many others. In general, studies have shown that various changes in the image attract attention and improve the perception process. Of course, the use of motion during visualization must be governed by certain laws to fulfil its function. For example, changes in shape, color and speed are used to remind observers of the remarkable fact they should notice. Also important is the position of the animated object in the scene - we have a different perception of such object in the centre of interest and the object perceived by the peripheral vision. Part of the studies included the assessment of the disturbance of so-called "secondary" movements in the scene. It was found that blinking is the least disturbing, followed by oscillating motion, the divergence of objects and the most disturbing movement of objects over long distances.