Module 4: User Testing and Think-aloud Protocols
Overview of Module 4/Week 4 of Class
Overview of Topics
Ideally, all items are designed so the members of the intended audience (i.e., users) can successfully use the item to accomplish a task. Usability involves assessing how well users can engage in such activities. Usability testing is the process associated with assessing such factors, and it focuses on evaluating how effectively the intended audience can use an item to achieve certain objectives.
Ideally, all items are designed so the members of the intended audience (i.e., users) can successfully use the item to accomplish a task. Usability involves assessing how well users can engage in such activities. Usability testing is the process associated with assessing such factors, and it focuses on evaluating how effectively the intended audience can use an item to achieve certain objectives.
Participant Observation
At its most basic, usability testing involves a process called participant observation that works as follows: One person (a “participant” or a “subject”) is asked to use an item to perform a specific task (e.g., “Using MOODLE, find the technical writing course offerings at Louisiana Tech University for Winter Quarter 2020.”).
A second person (the “observer” or “tester”) watches as the participant tries to perform this task. This observer records what s/he observes as the participant performs the activity. The observer continues to record participant activity until either the participant completes the task or s/he gives up (is unable to complete the assigned activity).
The observer reviews her or his observation notes (taken while watching the participant performs the related task), and tries to determine when the design of a feature might have caused a problem in performing the assigned task. The objective is to use these observations to identify problematic features of an item’s design and then revise these problematic items to avoid similar usability problems in the future.
At its most basic, usability testing involves a process called participant observation that works as follows: One person (a “participant” or a “subject”) is asked to use an item to perform a specific task (e.g., “Using MOODLE, find the technical writing course offerings at Louisiana Tech University for Winter Quarter 2020.”).
A second person (the “observer” or “tester”) watches as the participant tries to perform this task. This observer records what s/he observes as the participant performs the activity. The observer continues to record participant activity until either the participant completes the task or s/he gives up (is unable to complete the assigned activity).
The observer reviews her or his observation notes (taken while watching the participant performs the related task), and tries to determine when the design of a feature might have caused a problem in performing the assigned task. The objective is to use these observations to identify problematic features of an item’s design and then revise these problematic items to avoid similar usability problems in the future.
Follow-Up Questions
Ideally, the observer will confer with the subject to determine what aspects of an item’s design might have caused problems or confusion. To do so, the observer will often ask the participant to answer a series of questions once the participant has completed the related task (or given up on it). This interaction often takes place after the end of the activity and as the observer reviews her or his notes. Such questions can be
The purpose of this questioning is to understand factors affecting the use of an item and then revise problematic features to make an item more user-friendly (or easier for the audience to use to achieve a task).
Ideally, the observer will confer with the subject to determine what aspects of an item’s design might have caused problems or confusion. To do so, the observer will often ask the participant to answer a series of questions once the participant has completed the related task (or given up on it). This interaction often takes place after the end of the activity and as the observer reviews her or his notes. Such questions can be
- General questions to assess the participant’s perception of the usability of the item to perform a task (e.g., “Now that you have completed the task, what are your impressions of item X?”)
- Clarifying questions to better understand a behavior observed during the process (e.g., “When you arrived at the page titled ‘Log In,’ you clicked on the ‘Information’ button. Why did you do that?”)
- Overall assessment questions to identify general perceptions of the effective and ineffective aspects of the item (e.g., “Now that you’ve completed this process, what do you think was effective about the design of the interface? What do you think was ineffective or problematic about the design of the interface? What suggestions would you have for how to improve these problematic features?”)
- Final assessment questions (e.g., “Do you have anything you’d like to add or suggest at this time?”)
The purpose of this questioning is to understand factors affecting the use of an item and then revise problematic features to make an item more user-friendly (or easier for the audience to use to achieve a task).
Timing Tasks
In many cases, the observer will also record the time at which subjects started and completed (or abandoned) a process to determine how efficient task completion is (e.g., “Started the task of logging into MOODLE to find syllabus at 10:00am CT – Completed task at 10:15am CT.”). If possible, observers might record how long it takes subjects to perform specific activities associated with completing an overall task (e.g., “Started log in to MOODLE at 10:00am CT – Completed log in at 10:02am CT.") to determine if certain activities within a process require revision to be more effective.
In many cases, the observer will also record the time at which subjects started and completed (or abandoned) a process to determine how efficient task completion is (e.g., “Started the task of logging into MOODLE to find syllabus at 10:00am CT – Completed task at 10:15am CT.”). If possible, observers might record how long it takes subjects to perform specific activities associated with completing an overall task (e.g., “Started log in to MOODLE at 10:00am CT – Completed log in at 10:02am CT.") to determine if certain activities within a process require revision to be more effective.
Think-Aloud Protocols
It is one thing to observe what individuals are doing; it is another to determine why they are doing it. Ideally, the participant observation process can provide insights on the “what” and the “why” of how audiences use materials. Unfortunately, observation – by itself – can only note what individuals are doing when they perform a task. To understand why they are performing an activity in a certain way, the observer needs to have participants provide commentary on their activities. One method for achieving this objective is called a think-aloud protocol.
Think-aloud protocols work as follows: Subjects are instructed to use an item to achieve an objective and to verbally (out loud) explain what they are doing and why as they perform an activity. So, as a subject logs into email, that subject would need to state
This combination of noting what participants do (i.e., observing their actions) and why they do it (i.e., recording what they say is motivating those actions) can help determine how a particular design – or an aspect of a design – might be causing problems. Observers can then use this combined observational and talk-aloud information to ask more specific follow-up questions once an activity is completed (e.g., “You noted you kept clicking on the ‘Exit’ button to get out of the system, why did you click on it? What could we change – in the design of the interface – to make it clearer that you should not click on that button at that time in the process?”).
It is one thing to observe what individuals are doing; it is another to determine why they are doing it. Ideally, the participant observation process can provide insights on the “what” and the “why” of how audiences use materials. Unfortunately, observation – by itself – can only note what individuals are doing when they perform a task. To understand why they are performing an activity in a certain way, the observer needs to have participants provide commentary on their activities. One method for achieving this objective is called a think-aloud protocol.
Think-aloud protocols work as follows: Subjects are instructed to use an item to achieve an objective and to verbally (out loud) explain what they are doing and why as they perform an activity. So, as a subject logs into email, that subject would need to state
- What she/he is doing (e.g., “I’m now clicking my cursor in the prompted entitled ‘username’.”)
- Why he/she is doing it (e.g., “I’m placing the cursor here so I can type my username into this prompt.”)
This combination of noting what participants do (i.e., observing their actions) and why they do it (i.e., recording what they say is motivating those actions) can help determine how a particular design – or an aspect of a design – might be causing problems. Observers can then use this combined observational and talk-aloud information to ask more specific follow-up questions once an activity is completed (e.g., “You noted you kept clicking on the ‘Exit’ button to get out of the system, why did you click on it? What could we change – in the design of the interface – to make it clearer that you should not click on that button at that time in the process?”).
Numbers and Trends
How many times should one do observational testing to determine what aspects of an item need revision? The answer can vary (and seems to be “more is better”) with observers needing to balance the ideal (i.e., user testing a design with as many persons as possible) with the practical (i.e. the actual number of persons one can do users tests with based on available time and money). The key, however, is to use different persons each time a test is done to make sure the behaviors indicating a problematic design are an aspect of the item vs. the individual. (If one person does the same thing over and over, that’s an aspect of the individual. If multiple persons do the same thing independently, then something outside of the individual is probably causing that parallel behavior.)
In terms of the ideal number of different persons to observe, many individuals default to five (5) based on Jakob Nielsen’s foundational essay "Why You Only Need to Test with 5 Users" (read this entry for this Module). In his later essay "How Many Test Users in a Usability Study?" (also read for this Module) Nielsen modified his initial claim to note that different usability tests can require different numbers of subjects depending on the nature of the test. Essentially, the ideal number to test involves how much time and resources (persons and money) one has to do such testing.
How many times should one do observational testing to determine what aspects of an item need revision? The answer can vary (and seems to be “more is better”) with observers needing to balance the ideal (i.e., user testing a design with as many persons as possible) with the practical (i.e. the actual number of persons one can do users tests with based on available time and money). The key, however, is to use different persons each time a test is done to make sure the behaviors indicating a problematic design are an aspect of the item vs. the individual. (If one person does the same thing over and over, that’s an aspect of the individual. If multiple persons do the same thing independently, then something outside of the individual is probably causing that parallel behavior.)
In terms of the ideal number of different persons to observe, many individuals default to five (5) based on Jakob Nielsen’s foundational essay "Why You Only Need to Test with 5 Users" (read this entry for this Module). In his later essay "How Many Test Users in a Usability Study?" (also read for this Module) Nielsen modified his initial claim to note that different usability tests can require different numbers of subjects depending on the nature of the test. Essentially, the ideal number to test involves how much time and resources (persons and money) one has to do such testing.
For Module 4, we will examine ideas, practices, and approaches for doing usability testing. As you review the entries for this Module (Module 4), consider
- If they represent a commonly accepted approach to what usability is and how usability testing is done
- How the nature of the item determines the testing (i.e., methods) one uses
- How time and resources (e.g., persons and funding available) affect the testing approaches one uses
- If the technologies one uses affect the efficacy of the testing one does (if so, how; if not, why not)
Other Module 4/Week 4 Materials
To access other materials for Module 4/Week 4, click on the related link below