‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Communicative Images Radek Ošlejšek oslejsek@fi.muni.cz Faculty of Informatics Masaryk University, Brno Czech Republic ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Motivation uMotivation: Looking at a photo from a holiday ten years ago: It is apparently somewhere in mountains, but which mountains and what place? What is that tower in the background? ... uCommunicative image = an “intelligent” image that is able to discuss its content with the user in natural language. Fig. Photo from a holiday ten years ago ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Example of dialogue S: This is a photo of a group of your friends standing before the Pernstejn castle. Now, you can open and view the photo or start the communication with image. U: What is the resolution of this picture? S: The resolution of this picture is 8 MPix. Enter your next query, please. U: Who is in the picture? S: There is a group of your friends. George, John and Elisabeth. There are also four other groups of tourists. U: Give me the detailed information about this caste. S: Pernstejn Castle is a castle located on a rock above the village of Nedvedice and the rivers Svratka … U: Thank you, bye. S: Good bye! ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Key concepts behind this idea Requirements: uWell defined semantics suitable for automated processing uIntegration of semantics with graphical data –i.e. a graphical format supporting semantics –EXIF, SVG, ... uInteractive communication by means of natural language –written questions/answers, i.e. no speech recognition, no voice synthesis. –Information filtering uWeb environment, social networks, … uImage recognition techniques ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Ontology-based annotation uUnstructured annotation –Textual description, keywords, etc. –Adequate for some tasks, e.g. full-text search for relevant images from huge collection –Insufficient for dialogue-based image investigation uOntology-based structured annotation –Ontology defines semantics of real object –An image classifies concrete graphical elements in the ontology ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno OWL – Ontology Web Language uClasses, properties and individuals. uShared knowledge stored in the ontology vs. annotation data stored in the image uProblem of abstraction: dangerousness vs. species uProblem of granularity and accuracy of semantic data –an Object with description "Boeing 747 of Korean airlines that carried us to Seoul", –an Airplane with type set to "Boeing 747" and description "Airplane of Korean airlines that carried us to Seoul", –an Airplane with type set to "Boeing 747", airlines set to "Korean" and description "The airplane that carried us to Seoul"... ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno OWL Features ·OWL brings mathematical formalism with automatic inference ·Structured knowledge prevents chaos in terminology ·Shared multilingual knowledge ·Choice of suitable abstraction of the ontology ·Building and extending the ontology ·Laborious annotation process ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno SVG and OWL Integration ... scene graph definition continues here ... ... classification continues here ... SVG fragment: ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Graphical Ontology uHandles common visual characteristics. uPrescribed properties are based on the principles of 3D image synthesis. ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Navigational Ontology uIntegrated into the Graphical Ontology. uNavigational backbone based on Recursive Navigation Grid. uAbsolute and relative locations with inference. uLocation: fuzzy description, points, silhouettes ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Domain-specific Ontologies uFamily handling family relationships – useful for family photo albums. uSights handling important places of interest. uGoF handling „Gang of Four“ design patterns – a pilot e-learning application (under construction). ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Dialogue Systems uCommunication modes –Information retrieval mode –Image information supplementing mode –Free communication mode uCommunication analysis –Domain-specific small fragment of natural language –Relatively simple grammars –Frames technology –Standard techniques for misunderstanding solving –Example: WWL, What-Where Language How far is it from this hotel to the nearest beach? How far is it from to ? ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno GATE system – server side ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Client – an idea uThe project focuses on web technologies and direct interaction with images on web pages Dialogue plug-in to web browsers l handles initial interaction action, e.g. clicking on a picture l handles the dialogue window l communicates with server Server side: l JavaEE, EJB web services l stores the knowledge base l applies auto-detection and image recognition algorithms ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Challenges uCreating domain-specific ontologies –Manually, i.e. for e-learning – laborious and exhausting –Dynamically from dialogue – correctness, abstraction uDialogues definition –Manually create grammar from ontology and then create frames –Automatically generate dialogues from ontologies •User's behaviour formally modelled by ontology/logics uInformation gathering –Manually, i.e. semantic data are provided by annotator and they are fixed. –Learning from dialogues •Direct: „I probably depict mountains. Confirm it, please.“ •Indirect: e.g. the user question „Who is the lady next to the car“ notifies the image that there is a lady and a car in the image. uGetting users involved into the using communication images –Specialized application, e.g. e-learning, –Integration to social networks, ... ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Possible applications uPersonal photo albums –Organize, search uE-learning study materials linked to domain-specific knowledge base uApplications for people with special need –Visually impaired people –Older adults Fig. Antioxidant resveratrol S: In the picture there is the chemical structure of antioxidant resveratrol. U: What is antioxidant resveratrol? S: Antioxidant resveratrol consists of two benzene nucleus and three hydroxyl groups. U: What is benzene nucleus? S: Benzene nucleus … /The system uses definition from the chemical ontology/ ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Implemented services uWWL investigation of annotated pictures –Web services for the investigation of graphical content by means of What-Where language –http://andromeda.fi.muni.cz/gate/picture-viewer ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Implemented services (cont.) uPainting by dialogue –Web services for asking objects from database and placing them in desired position of target picture –http://andromeda.fi.muni.cz/gate/picture-generator U: Put a comet in the sector 9. U: Put a snowman into the bottom left corner. U: Write the text „Merry Christmas and Happy New Year“ into the horizontal center, color yellow. U: Write the text „PF 2010“ into the bottom right corner, color blue. U: Set background to snowflakes. U: Generate. Fig. The Chrismas card generated by a blind user ‹#›/19 PV226: LaSArIS seminar 2012 © R. Ošlejšek, FI MU Brno Thank you for your attention!