‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Communicative Images
Radek Ošlejšek
oslejsek@fi.muni.cz
Faculty of Informatics
Masaryk University, Brno
Czech Republic
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Motivation
uMotivation: Looking at a photo from a holiday ten years ago: It is apparently somewhere in
mountains, but which mountains and what place? What is that tower in the background? ...
uCommunicative image = an “intelligent” image that is able to discuss its content with the user in
natural language.
Fig. Photo from a holiday ten years ago
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Example of dialogue
S: This is a photo of a group of your friends standing before the Pernstejn castle. Now, you can
open and view the photo or start the communication with image.
U: What is the resolution of this picture?
S: The resolution of this picture is 8 MPix. Enter your next query, please.
U: Who is in the picture?
S: There is a group of your friends. George, John and Elisabeth. There are also four other groups
of tourists.
U: Give me the detailed information about this caste.
S: Pernstejn Castle is a castle located on a rock above the village of Nedvedice and the rivers
Svratka …
U: Thank you, bye.
S: Good bye!
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Key concepts behind this idea
Requirements:
uWell defined semantics suitable for automated processing
uIntegration of semantics with graphical data
–i.e. a graphical format supporting semantics
–EXIF, SVG, ...
uInteractive communication by means of natural language
–written questions/answers, i.e. no speech recognition, no voice synthesis.
–Information filtering
uWeb environment, social networks, …
uImage recognition techniques
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Ontology-based annotation
uUnstructured annotation
–Textual description, keywords, etc.
–Adequate for some tasks, e.g. full-text search for relevant images from huge collection
–Insufficient for dialogue-based image investigation
uOntology-based structured annotation
–Ontology defines semantics of real object
–An image classifies concrete graphical elements in the ontology
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
OWL – Ontology Web Language
uClasses, properties and individuals.
uShared knowledge stored in the ontology vs. annotation data stored in the image
uProblem of abstraction: dangerousness vs. species
uProblem of granularity and accuracy of semantic data
–an Object with description "Boeing 747 of Korean airlines that carried us to Seoul",
–an Airplane with type set to "Boeing 747" and description "Airplane of Korean airlines that
carried us to Seoul",
–an Airplane with type set to "Boeing 747", airlines set to "Korean" and description "The airplane
that carried us to Seoul"...
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
OWL Features
·OWL brings mathematical formalism with automatic inference
·Structured knowledge prevents chaos in terminology
·Shared multilingual knowledge
·Choice of suitable abstraction of the ontology
·Building and extending the ontology
·Laborious annotation process
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
SVG and OWL Integration
... scene graph definition continues here ...
... classification continues here ...
SVG fragment:
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Graphical Ontology
uHandles common visual characteristics.
uPrescribed properties are based on the principles of 3D image synthesis.
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Navigational Ontology
uIntegrated into the Graphical Ontology.
uNavigational backbone based on Recursive Navigation Grid.
uAbsolute and relative locations with inference.
uLocation: fuzzy description, points, silhouettes
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Domain-specific Ontologies
uFamily handling family relationships – useful for family photo albums.
uSights handling important places of interest.
uGoF handling „Gang of Four“ design patterns – a pilot e-learning application (under construction).
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Dialogue Systems
uCommunication modes
–Information retrieval mode
–Image information supplementing mode
–Free communication mode
uCommunication analysis
–Domain-specific small fragment of natural language
–Relatively simple grammars
–Frames technology
–Standard techniques for misunderstanding solving
–Example: WWL, What-Where Language
How far is it from this hotel to the nearest beach?
How far is it from to ?
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
GATE system – server side
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Client – an idea
uThe project focuses on web technologies and direct interaction with images on web pages
Dialogue plug-in to web browsers
l handles initial interaction action, e.g.
clicking on a picture
l handles the dialogue window
l communicates with server
Server side:
l JavaEE, EJB web services
l stores the knowledge base
l applies auto-detection and
image recognition algorithms
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Challenges
uCreating domain-specific ontologies
–Manually, i.e. for e-learning – laborious and exhausting
–Dynamically from dialogue – correctness, abstraction
uDialogues definition
–Manually create grammar from ontology and then create frames
–Automatically generate dialogues from ontologies
•User's behaviour formally modelled by ontology/logics
uInformation gathering
–Manually, i.e. semantic data are provided by annotator and they are fixed.
–Learning from dialogues
•Direct: „I probably depict mountains. Confirm it, please.“
•Indirect: e.g. the user question „Who is the lady next to the car“ notifies the image that there
is a lady and a car in the image.
uGetting users involved into the using communication images
–Specialized application, e.g. e-learning,
–Integration to social networks, ...
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Possible applications
uPersonal photo albums
–Organize, search
uE-learning study materials linked to domain-specific knowledge base
uApplications for people with special need
–Visually impaired people
–Older adults
Fig. Antioxidant resveratrol
S: In the picture there is the chemical structure of antioxidant resveratrol.
U: What is antioxidant resveratrol?
S: Antioxidant resveratrol consists of two benzene nucleus and three hydroxyl groups.
U: What is benzene nucleus?
S: Benzene nucleus …
/The system uses definition from the chemical ontology/
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Implemented services
uWWL investigation of annotated pictures
–Web services for the investigation of graphical content by means of What-Where language
–http://andromeda.fi.muni.cz/gate/picture-viewer
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Implemented services (cont.)
uPainting by dialogue
–Web services for asking objects from database and placing them in desired position of target
picture
–http://andromeda.fi.muni.cz/gate/picture-generator
U: Put a comet in the sector 9.
U: Put a snowman into the bottom left corner.
U: Write the text „Merry Christmas and Happy New Year“ into the horizontal center, color yellow.
U: Write the text „PF 2010“ into the bottom right corner, color blue.
U: Set background to snowflakes.
U: Generate.
Fig. The Chrismas card generated by a blind user
‹#›/19
PV226: LaSArIS seminar
2012 © R. Ošlejšek, FI MU Brno
Thank you for your attention!