Paper Example Undergraduate 5,036 words

Usability evaluation methods and applications

Last reviewed: October 9, 2011 ~26 min read

Usability Evaluation

Concept of Usability Evaluation

Heuristic Method

Issues in Usability Evaluation

Heuristic Evaluation Dimensions

The Evaluator

User Interfaces

Usability Problem Formats

Heuristic Evaluation Process

Inspection Phase 15

Identifying Usability Problems

Usability Problem Preparation Phase 16

Aggregation Phase 17

Procedure of Evaluation

Participants

The Static Web Interface

Observing and Quickly Visiting the Interface

Elaborating (Problems) and Revisiting (Interface and Materials)

Navigating the Interface

Annotating the Interface

Usability Evaluation

As part of the Web development process, Web developers are confronted with evaluating the usability of Web interfaces (i.e. Web sites and applications). Typically, a combination of manual methods and automatic tools are used for an effective Web site evaluation -- e.g. manual inspection is needed to supplement automatic validation tool results (Rowan 2000). However, Web projects are highly affected by their fast paced life cycles, leaving little room for full evaluations. Other major factors contributing to this situation are low budgeting assigned for testing and availability of usability experts.

Web developers need effective and cheap approaches to Web usability evaluation. Available automatic Web usability evaluation tools such as LIFT online and LIFT onsite (UsableNet 2002) and WebXACT (WatchFire 2007) have proven to be useful in finding syntactic problems. These include problems of consistency, verification of broken links, if pages contain links to the home page, alternative description of images (with use of the ALT tag in HTML), among others (Brajnik 2000). Other problems of semantic and pragmatic nature are left out by automatic evaluation tools (Farenc, 1996), and need to be handled. Farenc and collaborators (Farenc et al. 1996) explored the limitations of automatic usability evaluation tools. In analyzing 230 rules for their ERGOVAL automatic usability evaluation tool for Windows systems they found that a maximum of 78% of the rules could be automated "whatever the implemented methods are." The other 22% require input from humans to provide information and resolve semantic and pragmatic conflicts.

Usability problems that are not handled by automatic evaluation tools can be handled with semi-automatic and manual approaches. In semi-automatic approaches, the identification of usability problems start by the analysis of source files and completed with human intervention to provide information, make decisions or confirm problems. There are three manual methods that are typically used to find usability problems in user interfaces (Preece, 2002): a) usability testing where testers observe users performing tasks and report usability problems based on their observations, b) with questionnaires and interviews users are asked about their experience in using a system, missing features, and overall satisfaction, among other matters, c) in inspection methods experts examine user interfaces and report usability problems based on their judgment and expertise. Current paper is a report of usability evaluation that was conducted by the author

2. Methodology

The first step was to characterize the inspection process in Heuristic Evaluation to understand it better and come up with different ways to support it. A user study in the laboratory was conducted to understand how evaluators apply Heuristic Evaluation on Web interfaces. The output of this step is a rough characterization of the process and tool requirements.

Tool requirements were identified from the literature, Study findings, and experience. Evaluators in Study were found spending time in observing, annotating, and navigating the interface, as well as elaborating usability problems. Tools for inspection are proposed based on these activities.

Literature Review

Concept of Usability Evaluation

The concept of usability was defined in the field of human computer interaction (HCI) as the relationship between humans and computers. The International Organization for Standardization (ISO) proposed two definitions of usability, ISO 9241 and ISO 9126. ISO 9241 defines usability as "the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use" (ISO 9241-11, 1998). In ISO 9126, usability compliance is one of five product quality categories, in addition to understandability; learn ability, operability, and attractiveness (ISO/IEC 9126, 2001). Usability depends on the interaction between user and task in a defined environment (Abran, Khelifi, Suryn, & Saffah, 2003; Bennett, 1984). Therefore, ISO 9126 defines usability as "the capability of the software product to be understood, learned, used and attractive to the user, when used under specified condition" (ISO/IEC 9126, 2001). While this definition focuses on ease of use, ISO 9241 uses the term "quality in use" to describe usability more broadly (Abran, et al., 2003; Bevan, 2001) (Figure 1-1). " quality in use" is defined as "the capability of the software product t enable specified users to achieve specified goals with effectiveness, productivity, safety, and satisfaction in specified contexts of use" (ISO/IEC 9126, 2001).

This term then "quality of use " because of the weaknesses in ISO 9126, such as unclear architecture at the detail level of the measures, overlapping concepts, lack of a quality requirement standard, lack of guidance in assessing the results of measurements, and ambiguous choice of measures (Abran, et al., 2003).

Usability of a technology is determined not only by its user-computer interactions, but also by the degree to which it can be successfully integrated to perform tasks in the intended work environment. Thus, usability is evaluated trough the interaction of user, system, and task in a specified setting (Bennett, 1984). The socio-technical perspective also indicates that the technical featured of health IT interact with the social features of a healthcare work environment (Ash, J.S., et al., 2007; Reddy, Pratt, Dourish, & Shabot, 2003). The meaning of usability should therefore comprise four major components: user, tool, task, environment (Figure 1-2) Bennett, 1984).

Heuristic Method

This section discusses Heuristic Evaluation in detail. It supplements other major Heuristic Evaluation surveys (Cox 1998; Dykstra 1993; Woolrych, 2001). It focuses is on Heuristic Evaluation process and tool support, however. The reader will find software requirements highlighted throughout the section.

Heuristic Evaluation is an inspection method proposed by Nielsen and Molich (1990). It follows the "discount" philosophy, in which simplified versions of traditional methods are employed (e.g. discount usability testing not requiring elaborate laboratory setups). It consists of having a small number of evaluators independently examine a user interface in search for usability problems. Evaluators, then, collaborate to aggregate all usability problems. During interface inspection evaluators use a set of usability principles as guide, known as "heuristics," to focus on common problem areas in user interfaces. An example of such heuristics is "Help users recognize, diagnose, and recover from errors (Nielsen 2005b)." Interface features that violate the heuristics are reported as usability problems.

There have been just a couple of tools developed for assisting evaluators in Heuristic Evaluations. Problem aggregation has been supported (Cox 1998). There was no intent for automating the aggregation process but rather supporting evaluators in manual processes in problem aggregation. These include identifying unique problems, discarding duplicates, and merging descriptions using the affinity diagrams (Snyder 2003). There has been some effort in semi-automating problem identification in Heuristic Evaluation, but it is a formal, application-dependent approach. Loer and Harrison (2000) developed a system for querying a model checker for searching potential usability problems in user interfaces.

Heuristic Evaluation is an inspection method proposed by Nielsen and Molich (1990). It is a simple method used to discover usability problems in user interfaces. It consists of having a small set of evaluators individually examine a user interface and judge for compliance with recognized usability principles called "heuristics." The lists of potential usability problems are aggregated in a single usability report. Members of the development team are presented with the report to agree on the usability problem fixes and priorities. Figure 1 depicts the overall Heuristic Evaluation process

Figure 1-Heuristic Evaluation Overview

Nielsen makes recommendations to conduct a Heuristic Evaluation (Nielsen 2005a, 1994a). A typical Heuristic Evaluation session lasts 2 hours. The evaluation can start with 2 passes of the user interface. A pass to get a general idea of the user interface design and overall interaction. Evaluators focus on particular parts in a second pass. Heuristics are meant to be used to help identify usability problems. With heuristics in mind evaluators carefully examine an interface and report interface features that were noticed to have violated them.

The output of a Heuristic Evaluation is a list of potential usability problems. Lists generated by all evaluators are aggregated. Evaluators meet and identify duplicates, combine problem descriptions, suggest solutions to problems and possibly rate their severity so they can be prioritized. Nielsen recommends using a 0-4 severity rating scale (Nielsen 1995b).

Table 1-Nielsen's Severity Rating Scale Borrowed from [Nielsen 1995b]

"0 = I don't agree that this is a usability problem at all"

"1 = Cosmetic problem only: need not be fixed unless extra time is available on project"

"2 = Minor usability problem: fixing this should be given low priority"

"3 = Major usability problem: important to fix, so should be given high priority"

"4 = Usability catastrophe: imperative to fix this before product can be released"

Several Heuristic Evaluation dimensions can be identified from the description above: the heuristics that are used to guide the inspection, evaluators performing the inspection, the user interface that is being evaluated, and the process that is followed. These are discussed immediately below.

Issues in Usability Evaluation

The first idea of a tool for Heuristic Evaluation looked like a combination of a logging tool to keep track of usability problem, and a system that guides evaluators throughout the entire process from entering usability problems to generating problem reports. However, this was not enough. Other ways to support Heuristic Evaluation in inspection needed to be proposed. This was the challenge.

Cox (1998) studied the usability problem aggregation process in Heuristic Evaluation in depth and developed groupware based on his findings. Similarly, the Heuristic Evaluation inspection process was studied in depth and a tool was development based on findings. Once there was a better understanding of the inspection process, the process was characterized, software tool requirements were identified, and a tool for inspection based on those requirements was developed.

Heuristic Evaluation Dimensions

Heuristics are general usability principles that "seem to describe common properties of usable interfaces (Nielsen 2005a)." Nielsen and Molich (1990) initially proposed nine heuristics, which were defined based on their experience of common problem areas in interfaces and consideration of guidelines. The results of a factor analysis of 249 usability problems (Nielsen 1994b) lead to 10 heuristics (Table 2). These are commonly used to evaluate interfaces in general. Instone (1997), for example, explained Nielsen's 10 heuristics for the Web, emphasizing more on navigational aspects.

Table 2-Nielsen's Ten Usability Heuristics [Nielsen 1994b, 2005b]

1. Visibility of system status

2. Match between system and the real world

3. User control and freedom

4. Consistency and standards

5. Error prevention

6. Recognition rather than recall

7. Flexibility and efficiency of use

8. Aesthetic and minimalist design

9. Help users recognize, diagnose, and recover from errors

10. Help and documentation

Some alternatives have been proposed for specific domains to provide evaluators with domain knowledge they can use in evaluations. For instance, Dykstra (1993) developed calendar-specific heuristics based on results of user testing different commercial calendar systems. It was found that evaluators performed better when using calendar-specific heuristics. More usability problems were found by evaluators and more were severe than those performing a standard Heuristic Evaluation. Notice, however, that Dykstra's proposed heuristics had sub-headings. Dykstra's 9 heuristics had an average of 6.6 sub-headings describing a high-level heuristic, including a heuristic with 19 sub-headings. This may appear to be more like a Guideline Review with 60 guidelines than a Heuristic Evaluation with 9 high-level heuristics.

Nielsen recommends keeping the list short (about 10) for easy remembering (Nielsen and Molich 1990) (p. 249), although some may be added if they are domain specific (Nielsen 2005a). Muller et al. (1998) reformatted the list and added four more heuristics for his participatory approach to Heuristic Evaluation. In their approach they call for the participation of "work-domain experts" (users) to evaluate the targeted interface and added heuristics about human goals and experience.

The role of heuristics is not quite established. Heuristics are meant to help evaluators identify usability problems (Nielsen 2005a). However, it is not clear that heuristics support the discovery and analysis of usability problems (Cockton and Woolrych 2001; Cockton et al. 2003). In usability problem analysis, heuristics as analysis resource have not proven to be effective in eliminating false alarms and confirming actual usability problems (Cockton and Woolrych 2001).

Evaluators should not only report likes and dislikes, but they should explain problems with reference to violated heuristics or other usability principles or guidelines (Nielsen 2005a). Cockton and Woolrych's (Cockton and Woolrych 2001) extended usability problem format (introduced in Woolrych 2001), for example, require evaluators to "hypothesize likely difficulties in context, rather than to just focus on problem features." The extended format encouraged evaluators to be more "reflective and less likely to propose problems with little justification (Cockton and Woolrych 200, p.175)." In fact, in an updated version of the form (Cockton et al. 2003) an entry for providing evidence of heuristic non-conformance was added, encouraging evaluators to reflect on their choose for violated heuristics.

Solutions to fix problems can be suggested based on violated heuristics (Nielsen 2005a) or some other taxonomy such as the User Action Framework (Andre. 2000) for classifying usability problems based on Norman's seven-stage theory of action (Norman 2002, pp. 45-53).

The Evaluator

Typically 5 (Nielsen 1992; Bevan et al. 2003) to 8 (Nielsen and Landauer 1993) evaluators are used in Heuristic Evaluation (although the number is still in debate (Bevan. 2003).

Novice evaluators seem to perform poorly in Heuristic Evaluation (Nielsen 1992; Jeffries et al. 199); Desurvire et al. 1992]. Evaluator performance is attributed in part to inexperience with usability and application domain arenas. Nielsen (1992) classifies evaluators as "novice," "regular specialists" (those with usability expertise), and "double specialists" (those with both usability and application domain expertise). In his study regular specialists found 75% of the problems when aggregating individual problem lists. To achieve the same success rate, it was required fourteen novice evaluators.

Users can become part of the evaluation force. Muller et al. (1998) incorporated users to take into account user's work-domain expertise in evaluations.

User Interfaces

The user interface format (paper vs. computer based) and interactivity (simulated or supported, may influence the way user interfaces are evaluated. Nielsen (1990) found that evaluating paper and computer mockups may influence the types of usability problems that are found. The author of this report argues that "physical" characteristics of user interfaces have an effect on how they can be used and evaluated. When evaluating interactive interfaces, for example, evaluators interact with the interface, entering information, going from one screen to another, trying functionality, and so on. This at the same time enables evaluators to experience problems directly and, hence, providing a way for identifying problems.

Another aspect of user interfaces that may affect how interfaces are evaluated is its complexity. Slavkovic and Cross (1999) performed some initial studies on more elaborated and complex interfaces than those in the initial work of Heuristic Evaluation (Nielsen and Molich 1990). Their results indicated that novice evaluators tend to focus on certain parts of the (Palm Pilot) user interface.

Usability Problem Formats

Evaluator's performance may be impacted by usability problem formats used to capture problem details in evaluation sessions. Cockton and collaborators (Cockton et al. 2003) designed an extended form and found unexpected improvement on evaluator's performance compared with a previous study (Woolrych 2001; Cockton and Woolrych 2001). Results showed a 19% reduction on the number of false alarms and a 26% increase on appropriateness of heuristic application when using the extended form.

Heuristic Evaluation is known to produce not only a large number of problems (Jeffries, 1991; Bailey, 1992; Tan, 2009), but also a large number of false alarms (Bailey, 1992). False alarms are identified problems that are not actual problems in the interface. A major risk of having a large number of false alarms is making changes to an interface design based on them. Hence, we want to keep false alarms to a minimum.

Heuristic Evaluation Process

The Heuristic Evaluation process can be separated in three major phases: An inspection phase, in which evaluators independently evaluate the user interface; a preparation phase where evaluators independently prepare their list of identified problems for aggregation; and an aggregation phase, in which evaluators together collaborate to generate a single report of usability problems. Figure 2 shows Heuristic Evaluation phases and activities.

Figure 2-Heuristic Evaluation Phases

Inspection Phase

Several activities can be depicted in this phase. Evaluators are involved in exploring the interface, identifying usability problems, and elaborating problems. Nielsen (2005a) recommends exploring the interface at least twice. A first pass is to get a general idea of the interface. A second pass is to analyze individual interface elements in context.

Exploration is dependent of the interface format. The format defines affordances (i.e. Characteristics objects have that determine how they can be used (Norman 2002) that allows particular ways of exploration. For example, several paper screenshots can be compared at once by positioning them side by side. Computer mockups (Nielsen 1990), on the other hand, allow exploring the interface via interaction and experiencing situations (e.g. Feeling entrapped and not being able to exit to the "main system" (Nielsen 1990).

Problem search influence how interfaces are explored. Cockton et al. (2003) introduced four (4) discovery methods: a) System Scanning: it consists in examining the interface without following any particular approach; b) System Searching: it involves some kind of strategy such as focusing in certain interface elements; c) Goal Playing: it consists in setting up goal and trying to achieve it; and d) Method Following: is similar to Goal Playing, but a step-by-step procedure is established and executed. These can be used in deciding how to approach problem search while illustrating different ways of exploration. Work needs to be done to look deeper into exploration patterns in terms of discovery methods.

Identifying Usability Problems

There are other factors than interface format and search strategies that may induce evaluators to notice potential problems. Inspection guidelines (Mack and Montaniz 1994; Zhang et al. 1999), for example, are intended to "stimulate inspectors to notice things about the software that might lead, on further reflection, to identifying a potential problem (Mack and Montaniz 1994)." Inspection guidelines give details of how to proceed, what to focus on, if post-meetings are needed, among others (Zhang. 1999). Heuristic Evaluation is a method and few guidelines are given to evaluators.

Elaborating Usability Problems: "Once a potential problem is suspected, the inspector must develop the specifics of the problem description. (Mack and Montaniz 1994)." Evaluators can draw on different sources to elaborate problems (Mack and Montaniz 1994): a) experiencing a problem directly, b) remembering having a similar problem, c) remembering others having a similar problem, d) simulating usage scenarios or exploring the interface further.

Usability Problem Preparation Phase

Evaluators may need to format, edit, and reevaluate usability problems before aggregation. Cox (1998) talks about formatting problems to facilitate aggregation of problem lists.

Aggregation Phase

Cox elaborated more on what problem aggregation is. It involves not only arranging, selecting, or categorizing identified raw problems (Cox 1998, p. 3), but other activities. He renames problem aggregation as "results synthesis":

"Results synthesis is the process of transforming the entire collection of raw problem descriptions into a coherent, complete, and concise statement of the problems in the evaluated interface along as well as recommended actions to address the problems identified." (Cox 1998, p. 139)

1. Procedure of Evaluation

The study consisted in survey where 4 participants were involved and asked to download the tool, try it, and answer a questionnaire about their background and experience using the tool. Training was involved for those participants not familiar with Heuristic Evaluation. All procedures were done online. A Web site for the study was selected and participants were guided through the process and download the tool and training materials.

Then user testing was conducted. Evaluators met at the laboratory. They were asked to perform a Heuristic Evaluation on the static Web interface for 20 minutes. After the evaluation session they were interviewed about their approaches to finding problems for 35 minutes. The interview session was recorded.

Participants

A group of 7 people participated in the study: 4 Computer Science graduate students, 2 people with Human-Computer Interaction background -- specifically, people who have taken 3 related courses, and a Web developer with 2 years of experience.

The Static Web Interface

The term "static" is used to emphasize that the interface is a paper-based Web interface with no simulated interactivity. This is different from paper prototyping (Snyder 2003) where a person plays "computer" and simulates interactivity by presenting screens based on user's actions.

The static Web interface consists of a set of printed screenshots and a storyboard created for navigational purposes. Six (6) Web pages from the Zen Cart (2006) Web site were selected, and printed in full (i.e. from top to bottom), color, and with a comparable width as they would appear on screen. Zen Cart is a customizable shopping cart package for e-commerce Web sites. It is an open source Web site that comes as an online store of hardware, software and DVD Movies. Zen Cart release version 1.3.6 was used.

The static Web interface is considered to be of low-fidelity. The fidelity of a prototype is defined by Virzi (1989) as "a measure of how authentic or realistic a prototype appears to the user when it is compared to the actual service." The static Web interface is far from being seen as the actual system as it is. It consists of only six (6) paper Web pages and a storyboard with no interactivity.

Observing and Quickly Visiting the Interface

You’re 80% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2011). Usability evaluation methods and applications. PaperDue. https://www.paperdue.com/essay/usability-evaluation-116916

Always verify citation format against your institution’s current style guide requirements.