14/10/2018 1 PV182 Human Computer Interaction Lecture 5 Evaluating Interfaces Fotis Liarokapis liarokap@fi.muni.cz 15th October 2018 Evaluation Methods Importance • Tied to the usability engineering lifecycle • Pre-design – Investing in new expensive system requires proof of viability • Initial design stages – Develop and evaluate initial design ideas with the user design implementationevaluation Importance . • Iterative design – Does system behavior match the user’s task requirements? – Are there specific problems with the design? – What solutions work? • Acceptance testing – Verify that system meets expected user performance criteria – 80% of 1st time customers will take 1-3 minutes to withdraw $50 from the automatic teller Overview • Evaluation tests the usability, functionality and acceptability of an interactive system • Evaluation may take place – In the laboratory – In the field • Some approaches are based on expert evaluation – Analytic methods – Review methods – Model-based methods • Some approaches involve users – Experimental methods – Observational methods – Query methods • An evaluation method must be chosen carefully and must be suitable for the job Naturalistic Approach • Observation occurs in realistic setting – Real life • Problems – Hard to arrange and do – Time consuming – May not generalize 14/10/2018 2 Experimental Approach • Experimenter controls all environmental factors – Study relations by manipulating independentvariables – Observe effect on one or more dependent variables – Nothing else changes • There is no difference in user performance (time and error rate) when selecting an item from a pull down or a pull right menu of 4 items File Edit View Insert New Open Close Save File Edit View Insert New Open Close Save Validity • External validity – Confidence that results applies to real situations – Usually good in natural settings • Internal validity – Confidence in our explanation of experimental results – Usually good in experimental settings • Trade-off: Natural vs Experimental – Precision and direct control over experimental design versus – Desire for maximum generalizability in real life situations Usability Engineering Approach • Observe people using systems in simulated settings – People brought in to artificial setting that simulates aspects of real world setting – People given specific tasks to do – Observations / measures made as people do their tasks – Look for problem areas / successes – Good for uncovering ‘big effects’ Usability Engineering Approach . • Is the test result relevant to the usability of real products in real use outside of lab? • Problems – Non-typical users tested – Non-typical tasks – Different physical environment – Different social context • motivation towards experimenter vs motivation towards boss • Partial Solution – Use real users – Task-centered system design tasks – Environment similar to real situation Usability Engineering Approach .. • How many users should you observe? – Observing many users is expensive – But individual differences matter • best user 10x faster than slowest • best 25% of users ~2x faster than slowest 25% • Partial solution – Reasonable number of users tested – Reasonable range of users – Big problems usually detected with handful of users – Small problems / fine measures need many users Discount Usability Evaluation • Low cost methods to gather usability problems – Approximate: capture most large and many minor problems • Qualitative: – Observe user interactions – Gather user explanations and opinions – Produces a description, usually in non-numeric terms – Anecdotes, transcripts, problem areas, critical incidents… • Quantitative – Count, log, measure something of interest in user actions – Speed, error rate, counts of activities, etc 14/10/2018 3 Discount Usability Evaluation . • Methods – Inspection/cognitive walkthrough – Extracting the conceptual model – Direct observation • Think-aloud • Constructive interaction • Query techniques – Interviews and questionnaires • Continuous evaluation – User feedback and field studies Inspection • Designer tries the system (or prototype) – Does the system “feel right”? – Benefits • Can catch some major problems in early versions – Problems • Not reliable as completely subjective • Not valid as introspector is a non-typical user • Intuitions and introspection are often wrong • Inspection methods help – Task centered walkthroughs – Heuristic evaluation Cognitive Walkthrough • Given: – a specification of the system (not neccesarily complete, but fairly detailed) – a description of the task the user is to perform on the system (representative for most users ...) – a complete, written list of the actions needed to complete the task – an indication of who the users are and what kind of experience and knowledge the evaluators can assume about them Cognitive Walkthrough . • Step through the action sequence and critique the system using questions: – Is the effect of the action the same as the user’s goal at that point ? – Will users see that the action is available ? – Once users found the correct action, will they know it is the one they need ? – After the action is taken, will users understand the feedback they get ? Conceptual Model Extraction • How? – Show the user static images of • The prototype or screens during use – Ask the user explain • The function of each screen element • How they would perform a particular task • What? – Initial conceptual model • How person perceives a screen the very first time it is viewed – Formative conceptual model • How person perceives a screen after its been used for a while • Value? – Good for eliciting people’s understanding before & after use – Poor for examining system exploration and learning Direct Observations • Evaluator observes users interacting with system – In lab: • User asked to complete a set of pre-determined tasks – In field: • User goes through normal duties • Value – Excellent at identifying gross design/interface problems – Validity depends on how controlled/contrived the situation is 14/10/2018 4 Simple Observation Method • User is given the task • Evaluator just watches the user • Problem – Does not give insight into the user’s decision process or attitude Think Aloud Method • Users speak their thoughts while doing the task – What they are trying to do – Why they took an action – How they interpret what the system did – Gives insight into what the user is thinking – Most widely used evaluation method in industry – May alter the way users do the task – Unnatural (awkward and uncomfortable) – Hard to talk if they are concentrating Hmm, what does this do? I’ll try it… Ooops, now what happened? Constructive Interaction Method • Two people work together on a task – Monitor their normal conversations – Removes awkwardness of think-aloud • Co-discovery learning – Use semi-knowledgeable “coach” and novice – Only novice uses the interface • novice ask questions • coach responds – Gives insights into two user groups Now, why did it do that? Oh, I think you clicked on the wrong icon Recording Observations • How do we record user actions for later analysis? – Otherwise risk forgetting,missing, or misinterpreting events – Paper and pencil • Primitive but cheap • Observer records events, comments, and interpretations • Hard to get detail (writing is slow) • 2nd observer helps… – Audio recording • Good for recording think aloud talk • Hard to tie into on-screen user actions – Video recording • Can see and hear what a user is doing • One camera for screen, rear view mirror useful… • Initially intrusive Coding Sheet Example • Tracking a person’s use of an editor Time 09:00 09:02 09:05 09:10 09:13 ErrorsGeneral actions text scrolling image new delete modify correct miss editing editing node node node error error Graph editing x x x x Interviews • Good for pursuing specific issues – Vary questions to suit the context – Probe more deeply on interesting issues as they arise – Good for exploratory studies via open-ended questioning – Often leads to specific constructive suggestions • Problems: – Accounts are subjective – Time consuming – Evaluator can easily bias the interview – Prone to rationalization of events/thoughts by user – User’s reconstruction may be wrong – Sometimes difficult to find people! 14/10/2018 5 How to Interview • Plan a set of central questions – A few good questions gets things started • Avoid leading questions – Focuses the interview – Could be based on results of user observations • Let user responses lead follow-up questions – Follow interesting leads vs bulldozing through question list Retrospective Testing Interviews • Post-observation interview to – Perform an observational test – Create a video record of it – Have users view the video and comment on what they did • Clarify events that occurred during system use • Excellent for grounding a posttest interview • Avoids erroneous reconstruction • Users often offer concrete suggestions Do you know why you never tried that option? I didn’t see it. Why don’t you make it look like a button? Critical Incidence Interviews • People talk about incidents that stood out – Usually discuss extremely annoying problems with passionate feeling – Not representative, but important to them – Often raises issues not seen in lab tests Tell me about the last big problem you had with Word I can never get my figures in the right place. Its really annoying. I spent hours on it and I had to… Questionnaires and Surveys • Questionnaires / Surveys – Preparation “expensive,” but administration cheap • Can reach a wide subject group (e.g. mail) – Does not require presence of evaluator – Results can be quantified • But – Only as good as the questions asked Questionnaires and Surveys . • How – Establish the purpose of the questionnaire • What information is sought? • How would you analyze the results? • What would you do with your analysis? – Do not ask questions whose answers you will not use! – Determine the audience you want to reach – Determine how would you will deliver / collect the questionnaire – On-line for computer users – Web site with forms – Surface mail • Pre-addressed reply envelope gives far better response Styles of Questions • Open-ended questions – Asks for unprompted opinions – Good for general subjective information • But difficult to analyze rigorously • Can you suggest any improvements to the interfaces? 14/10/2018 6 Styles of Questions . • Closed questions – Restrict respondent’s responses by supplying alternative answers – Makes questionnaires a chore for respondent to fill in – Can be easily analyzed – Watch out for hard to interpret responses! • Alternativeanswers should be very specific Do you use computers at work: O often O sometimes O rarely vs In your typical work day, do you use computers: O over 4 hrs a day O between 2 and 4 hrs daily O between 1and 2 hrs daily O less than 1 hr a day Styles of Questions .. • Scalar – Ask user to judge a specific statement on a numeric scale – Scale usually corresponds with agreement or disagreement with a statement Characters on the computer screen are: hard to read easy to read 1 2 3 4 5 Styles of Questions ... • Multi-choice – Respondent offered a choice of explicit responses How do you most often get help with the system? (tick one) O on-line manual O paper manual O ask a colleague Which types of software have you used? (tick all that apply) O word processor O data base O spreadsheet O compiler Styles of Questions .... • Ranked – Respondent places an ordering on items in a list – Useful to indicate a user’s preferences – Forced choice Rank the usefulness of these methods of issuing a command (1 most useful, 2 next most useful..., 0 if not used __2__ command line __1__ menu selection __3__ control key accelerator Styles of Questions ..... • Combining open-ended and closed questions – Gets specific response, but allows room for user’s opinion It is easy to recover from mistakes: disagree agree comment: undo facility is really helpful 1 2 3 4 5 Continuous Evaluation • Monitor systems in actual use – Usually late stages of development • i.e. beta releases, delivered system – Fix problems in next release • User feedback via gripe lines – Users can provide feedback to designers while using the system • Help desks • Bulletin boards • Email • Built-in gripe facility – Best combined with trouble-shooting facility • Users always get a response (solution?) to their gripes 14/10/2018 7 Continuous Evaluation . • Case/field studies – Careful study of “system usage” at the site – Good for seeing “real life” use – External observer monitors behavior – Site visits Ethics • Testing can be a distressing experience – Pressure to perform, errors inevitable – Feelings of inadequacy – Competition with other subjects • Golden rule – Subjects should always be treated with respect Ethics - Before the Test • Don’t waste the user’s time – Use pilot tests to debug experiments, questionnaires etc – Have everything ready before the user shows up • Make users feel comfortable – Emphasize that it is the system that is being tested, not the user – Acknowledge that the software may have problems – Let users know they can stop at any time • Maintain privacy – Tell user that individual test results will be completely confidential • Inform the user – Explain any monitoring that is being used – Answer all user’s questions (but avoid bias) • Only use volunteers – User must sign an informed consent form Ethics - During the Test • Don’t waste the user’s time – Never have the user perform unnecessary tasks • Make users comfortable – Try to give user an early success experience – Keep a relaxed atmosphere in the room – Coffee, breaks, etc – Hand out test tasks one at a time – Never indicate displeasure with the user’s performance – Avoid disruptions – Stop the test if it becomes too unpleasant • Maintain privacy – Do not allow the user’s management to observe the test Ethics - After the Test • Make the users feel comfortable – State that the user has helped you find areas of improvement • Inform the user – Answer particular questions about the experiment that could have biased the results before • Maintain privacy – Never report results in a way that individual users can be identified – Only show videotapes outside the research group with the user’s permission What you Now Know • Debug designs by observing how people use them – Quickly exposes successes and problems – Specific methods reveal what a person is thinking – But naturalistic vs laboratory evaluations is a trade-off • Methods: – Conceptual model extraction – Direct observation • Think-aloud • Constructive interaction – Query via interviews, retrospective testing and questionnaires – Continuous evaluation via user feedback and field studies • Ethics are important 14/10/2018 8 Questions Acknowledgements • Prof. Ing. Jiří Sochor