Special Edition:
SAP User Experience

back Edition Overview

Leading Article

New SAP UX Projects

User-Centered Design

Design and Visual Design

Accessibility

More Project Reports

Events

For Your Reference

 

 

The PeP Project: Evaluating the Responsiveness of SAP Applications from a User-Centered Perspective

By Gerd Waloszek, SAP User Experience, SAP AG, and Ulrich Kreichgauer, oCTO, SAP AG – October 13, 2009 • original article (story)

Ulrich Kreichgauer and Gerd WaloszekOn this Website, we have published quite a few articles discussing performance and responsiveness issues in software applications over the past two years (see the Human Performance at the Computer highlight topic for a compilation of these articles). These more general articles are the by-products of a project that was initiated by SAP User Experience at the beginning of 2008 – the Perceived Performance project, or "PeP" project for short. This article is devoted to the PeP project itself; it reports briefly on the project's goals and methodological approach, work and cooperation with other groups within SAP, and possible future directions.

 

Background to the Perceived Performance (PeP) Project

In his editorial What Matters Most? one of the authors expressed his belief performance issues are the number one usability issue. Even if you find this statement too strong, there is general agreement that solving performance, or better, responsiveness issues is of utmost importance for software companies (see the appendix or Human Performance at the Computer – Part 1: Introduction for the difference between performance and responsiveness): Poor performance degrades user efficiency and thus the efficiency of the business processes that depend on the software. The usual step is, of course, to approach these issues from a technical perspective. At SAP AG, for example, there are dedicated technical teams that measure the responsiveness of SAP applications in clearly defined test environments. For this purpose, they have created standardized step-by-step scenarios that allow them to compare different software versions and thus evaluate the effects of technical fine-tuning to improve the system's responsiveness.

The problem with a purely technical approach, however, is that these measurements tell us little about how users experience an application's responsiveness, which areas require a greater investment of effort from a user's perspective, and where the system is already responsive enough. In order to gain a better overall understanding of these issues, SAP's User Experience team initiated the Perceived Performance (PeP) project at the beginning of 2008. The primary goal was to devise a user-centered evaluation method that could be applied to the scenario-based measurements made by the technical teams. Further goals were to apply this methodology to dedicated SAP applications, roll out the insights gained within the company to increase awareness of responsiveness issues, and to publish them externally via channels such as the SAP Design Guild Website and conference appearances.

 

Toward an Evaluation Methodology: Finding User-Related Criteria for Evaluating Responsiveness

When the technical performance teams at SAP evaluate the responsiveness of applications they monitor a large number of parameters – one of which is the overall response time for user-initiated user interface (UI) events. For this parameter, the teams apply a one-second threshold as a criterion for whether an application achieves SAP's performance goals. However, this rule does not reflect the full variety of user expectations and behaviors: Some actions should take much less than a second, while others may take longer without annoying users. Thus, the PeP team's challenge was to develop an evaluation method that provides better insight into the actual user experience and helps identify areas that need improvement. The concept of human time ranges that originates from Allen Newell's time scales of human action looked promising as a starting point for developing such a methodology, because these time scales refer to the psychological dimensions of perception, operations, and cognition (thinking, attention, motivation) (see Table 1 below; for details see Human Performance at the Computer – Part 2: Making Applications More Responsive and Waloszek and Kreichgauer, 2009). In their most basic and cited form, the time ranges are defined as follows:

  • 0.1 sec.: Perception – cause-and-effect, animation > direct manipulation tasks
  • 1 sec.: Operation – focused man-machine dialog > simple tasks
  • 10 sec.: Cognition – focus on task lost > complex or compound tasks

PeP Application of Time Ranges

The PeP team integrated two further categories into its adoption of time ranges:

  • Shneiderman and Plaisant (2004) mention an additional category of "common tasks" of about three seconds, which marks two effects that "waiting" has on users: After three seconds, (1) users start to feel that the system is slow and (2) they lose their task focus ( they can maintain a degree of focus until up to 10 seconds).
  • The authors also report that after waiting 15 seconds, users become annoyed.

This leads to the following table of time ranges (see Table 1; a more extended version of the time range table can be found in the appendix):

Time
Range
Human
Aspect
Application / User Interface (UI):
Acceptable Response
User: Response When Feedback
Does Not Meet Time Range
0.1 sec.
(0.0-0.2)
Perception Acknowledges user input Perception of smooth animations and cause-and-effect relationship breaks down
1.0 sec.
(0.2-2.0)
Dialog,
action
Presents result of simple task Engaged user-system dialog breaks down
3 sec.
(2.0-5.0)
Cognition,
attention,
motivation
Presents result of common task User has time to think – the system is perceived as slow, the user's focus starts to wander, and the user may turn to other tasks
10 sec.
(5.0-15)
Presents result of complex tasks User loses focus on task and may turn to other tasks
>15 sec. Presents result of very complex task User becomes annoyed – the system is detrimental to productivity and motivation

Table 1: PeP adaptation of human time ranges table, including variations in parentheses

The next question was how the time ranges could be utilized for a user-centered evaluation of response times. The PeP team's answer to this question was to classify observed response times according to time ranges, and thus the psychological effects on users of waiting. This required switching from discrete times to ranges by extending and connecting the time ranges from 0 to beyond 15 seconds, without leaving any gaps (see the graphic in the appendix). To define the ranges, the PeP team adopted Shneiderman's and Plaisant's (2004) values for the variation of the time ranges wherever possible, but a few decisions could not be backed up with data from the literature. We therefore initially set fairly conservative upper limits for the time ranges (see the first column in Table 1 or the graphic in the appendix).

PeP Assignment of UI Events to Time Ranges

Measuring response times and classifying them according to the time ranges does not, however, provide the complete picture. As already mentioned, some UI events need to be blazingly fast, while others may take longer without annoying users. Thus, to derive guidance from the evaluations, it is also necessary to know, which response (or waiting) time users expect (and tolerate) for certain types of UI events. Assigning UI events to time ranges makes it possible to compare and evaluate observed and expected response times and to identify which events conform to users' expectations and which do not (and thus require improvement). As there was very little guidance in the literature, the PeP team drew up the following list for practical use in its evaluations:

  • Level 0: 0.1 (0-0.2) seconds – Perceptual Level: Feedback after UI input involving direct manipulation/hand-eye coordination, such as mouse-click, mouse/pointer movement, key-press, button-press, menu open/close.
  • Level 1: 1 (0.2-2) seconds – Dialog Level: Finishing simple tasks, that is, most user-requested operations and ordinary user commands, finishing unrequested and system-initiated operations, opening a window (navigation) or dialog box, closing a window, completing a simple search.
  • Level 2: 3 (2-5) seconds – Cognitive Level: Finishing common tasks, such as logging on to a system.
  • Level 3: 10 (5-15) seconds, Level 4: >15 seconds – Cognitive Level: Completing complex tasks, that is, one task or one step of a multi-step task, completing one step in a wizard, completing a complex search or calculation.

 

The PeP Methodology in Short

Finally, we put together the ingredients of an evaluation method for response time. In short, the PeP methodology is based on three steps:

  1. Preparation: We break the use scenarios into task steps, or technically, UI events. We then categorize them according to what response time would be tolerable for users. This (preliminary) assignment is based on the complexity of interactions, that is, the workload for the computer that experienced users would expect.
  2. Measurement: We time the UI events and assign them to the time ranges. This assignment is based on the events' actual duration, and thus on the users' perception, not their expectations.
  3. Evaluation: This data leads to a frequency matrix of tolerable versus observed time ranges (see Table 2), which can be interpreted from a user’s perspective.

The time ranges have distinct implications (directness, appropriateness, slowness, waning or lost focus, annoyance) for users' perceptions and reactions. Therefore, the PeP evaluation matrix provides a more refined picture of how users perceive the performance of a software application than checking response times against one fixed time limit. The PeP evaluation is particularly valuable if an application is considerably slower than expected or exhibits wide response-time variations.

The PeP team measured many standardized scenarios, the data for which was provided by the technical performance teams. The (fictional) example in Table 2 below shows a scenario with a fulfillment rate of 30.1% for simple tasks; this is assumed to have a strong negative impact on user satisfaction.

Tolerable Range
Observed Range
(Number of Times Measured)
Total Fulfillment
Rate (%)
Type of Interaction
0.2-2.0 s
2.0-5.0 s
5.0-15 s
> 15 s
 
 
Simple Tasks (0.2-2.0 s.)
22
26
20
5
73
30.1
Common Tasks (2.0-5.0 s.)
3
13
9
9
34
47.1
Complex Tasks (5.0-15.0 s.)
0
1
2
1
4
75.0
Overall
25
40
31
15
111
36.9

Table 2: Example of a PeP evaluation matrix (fictional data)

 

Short Overview of the PeP Team's Work

One of the PeP project's major tasks was, of course, to learn about and gain an understanding of responsiveness issues from a user's perspective. As shown, this was essential for developing an evaluation method, and it was also the basis for consulting other teams at SAP. But the PeP team also had to gain a basic understanding of the technical constraints underlying application responsiveness issues. For this purpose, the team attended the internal SAP Performance Focus Days, for example. In addition, cooperating closely with the technical performance teams at SAP was mandatory for the PeP team; we already mentioned that the technical teams provided the data for most of the PeP evaluations.

After the PeP team had developed a user-centered evaluation approach and performed a number of evaluations based on data provided by the technical teams – about 10 evaluation reports were delivered by the PeP team in 2008 – the team was also able to roll out information within SAP: PeP members gave a number of presentations to other teams that were interested in the topic, prepared a presentation for an internal SAP Developers conference, and took part in numerous discussions and several work groups, consulting the teams from a user-centered perspective. Issues that arose during these discussions were, for example: When should feedback be given and what form should it take? Should pages load incrementally or completely? Further topics included application startup time, speed of autocomplete, and the influence of server roundtrips and WANs. It turned out that the PeP team's time ranges provided a good heuristic for answering such questions, thus extending their usefulness beyond their sole application to the PeP evaluation method.

At the beginning of this article, we mentioned that the PeP team also had the goal of making information available outside of SAP. This was accomplished by publishing articles on the SAP Design Guild Website. These are additionally compiled in the Human Performance at the Computer highlight topic for easier access. They contain general information and are largely independent from the PeP project. However, Ulrich Kreichgauer and Gerd Waloszek presented the PeP team's user-centered evaluation method at the INTERACT 2009 conference in Uppsala, Sweden. This method will also form part of a keynote that Dan Rosenberg will give at the 20th FQS-Forschungstagung ( research congress of the German Quality Research Community ) in October 2009 in Frankfurt am Main, Germany. Last but not least, this article delivers some details of the PeP team's work to the public.

 

Future Directions

Because it was initiated as a project, the PeP project has a limited time scope. Many questions were addressed and answered during the project time span, while others need further research and clarification and may be beyond the current project scope. First of all, the assumptions underlying the PeP evaluations need to be validated further. UI events are currently assigned to time ranges on a heuristic basis and call for more thorough investigation. In addition, the transition points between the time ranges rely on data from the literature and on heuristic assumptions. Systematic experiments involving users who rate the timeliness of selected UI events could help to define the points more reliably.

The PeP team's research could also be the starting point for addressing responsiveness issues through the UI design itself: (1) Performance-oriented guidelines, that is, high-level rules on top of UI guidelines for specific applications, could make UI designers aware of human performance issues and provide guidance (see also Have You Ever Heard of Performance-Oriented (UI) Design? and Human Performance at the Computer – Part 4: On the Way to Performance-Oriented UI Guidelines); (2) Measuring the time costs of UI controls and suggesting alternative designs would make it possible to reduce screen rendering times at the design stage.

 

References

  1. Card, S. K., Robertson, G. G., and Mackinlay, J. D. (1991). The information visualizer: An information workspace. Proceedings of ACM CHI'91 Conf., 181-188.
  2. Alan Cooper, Robert M. Reimann & Dave Cronin (2007). About Face 3.0: The Essentials of Design. John Wiley & Sons (Chapter: Optimizing for Responsiveness, Accommodating Latency; p.220-221).
  3. Jeff Johnson (2007). GUI Bloopers 2.0: Common User Interface Design Don'ts and Do's. Morgan Kaufmann Publishers (Chapter 1: First Principles; Basic Principle 8: Design for Responsiveness; Chapter 7: Responsiveness Bloopers).
  4. A. Newell (1994). Unified Theories of Cognition. Harvard University Press.
  5. Ben Shneiderman & Cathérine Plaisant (2004). Designing the User Interface (4th Edition). Pearson Addison-Wesley (Chapter 11: Quality of Service, p. 453ff).
  6. Jakob Nielsen (1993). Usability Engineering. San Diego, CA: Academic Press. (Chapter 5: Usability Heuristics).
  7. Robertson, G., Card, S., Mackinlay, J. (1989). The Cognitive Co-Processor Architecture for Interactive User Interfaces. Proceedings of the ACM Conference on User Interface Software and Technology (UIST’89), November 1989, ACM Press, p. 10-18.
  8. Robertson, G., Card, S., Mackinlay, J.(1993). Information Visualization Using 3D Interactive Animation.Communications of the ACM, 36(4), 56-71.
  9. Waloszek, G., Kreichgauer, U. (2009). User-Centered Evaluation of the Responsiveness of Applications. In: T. Gross et al. (Eds.), INTERACT 2009, Part I, LNCS 5726, pp. 239–242.

 

To top top