Recently, I visited my brother's family. During a walk through the colorful autumn forests, we discussed a variety of topics. Naturally, one of the many subjects that came up was my job, and my brother asked me which projects I was currently involved in. I told him that I was co-writing accessibility guidelines because SAP is trying to make its software accessible to everyone. I added that one of the challenges we are facing is to provide blind or visually impaired users with information equivalent to that available to those without visual impairments. "Oh, you will never achieve that. That is impossible!" he exclaimed. Since then, I have heard this opinion over and over. Of course, we cannot provide disabled people with the same experience as people without disabilities. But we can, and must, take measures to "ensure that users with disabilities have access to and use of information and data that is comparable to the access of those without disabilities." (Cited from Section 508 with slight modifications) Information equivalence is one of the cornerstones for achieving comparable access and use.
To the citation from Section 508, I would also like to add: "this includes comparable performance". Nielsen and Norman recently published data based on a study comparing users of screen readers, users of magnifier software, and a control group who had no visual impairments. This data (taken from the press release; the report has to be bought from the Nielsen-Norman group) clearly demonstrates that there is currently a performance gap between users with disabilities and those without. Therefore, I decided to broaden the scope of my article – which originally was intended to cover only information equivalence – in order to include performance aspects as well.
But there is more to equivalence: it can be discussed on an even higher level, namely with respect to social equivalence. To my opinion, the real issue is the social emancipation of people with disabilities. Computer technology offers the chance to improve the social status of these people – currently, the opposite seems to be true. I am confident that due to Section 508 and similar legislation in other countries, this situation will change for the better in the near future.
When I started to write this article, I searched for information about "information equivalence" and was astonished to find that there is little written on the subject. For example, a Web search using Google on "information equivalence" did not provide any useful results. A second search on "equivalent information" was slightly more successful and led me to Websites from the W3C or Section 508, or sites that cited those guidelines. Thus, I was effectively "back to square one" because these guidelines just demand that equivalent information is provided. They offer little help regarding what "information equivalence" actually means and how it can be achieved – I will elaborate further on this issue below. I was also able to find a presentation by a doctoral student on human cognitive abilities, which had a slide comparing equivalent information – using both a floor plan and a tree structure that described the same building. This gave me a pointer, but nothing more...
This pointer directed me towards cognitive psychology. As I had once worked in this field, I thought it might be worth taking a look at my old text books. I remembered that the "text versus image" issue had been subject to heated debates and much research. However, I again ended up empty-handed because, although the text books discussed advantages and disadvantages of images versus text, or stated that it is useful to offer information in different modalities, such as visual and auditory, I found very little information on how to actually create this equivalent information. Moreover, our focus here is on people with one or more modalities of information processing missing, not on how the different modalities can complement each other.
It could be that others have been more successful than myself in searching for "information equivalence," my results did not provide much enlightenment, although I was at least able to collect a few statements on the "picture versus text" issue and on the use of pictures.
There is the famous saying that "a picture is worth a thousand words." We also know that it is often hard to transform a picture into words. One of the reasons for this difficulty is that images can have an infinite number of interpretations and inferences. No text description could possibly cover all these interpretations, and nobody knows which inferences people will draw. Another reason is that images often lead to immediate comprehension, something which is much more difficult to gain from a text – images have analog properties that "reveal" certain relations automatically. This is why charts are often used instead of tables, or maps instead of long text-based route descriptions.
However, you could also argue that "a word is worth a thousand pictures", which may be of some consolation for those who are unable to see. Also, many pictures, figures, or diagrams have little information content or even conceal what they should clarify – see below for some examples. Of course, texts can also be interpreted in different ways. We all know that certain texts have to be read "between the lines" but in this respect there is no difference between visually impaired and unimpaired users.
When writing text descriptions for images, it is important to know the reason or purpose for using them. This knowledge helps us to identify the primary goal of an image. For example, an image may be primarily used for making a Web page look nicer, or for explaining a complex relationship. The text description can then focus on this primary aspect and either neglect the secondary aspects or cover them in short.
According to Levin, Anglin & Carney (1987) images serve the following purposes (see also Recommendations for Charts and Graphics on the SAP Design Guild):
This list is incomplete with respect to the use of images on Web pages and application screens. Images, especially icons, often combine the "image" aspect with a "function" aspect, meaning that they act as links to certain pages or page elements (the functional aspect) as well as indicating the nature of the function, the status of objects or processes, and so on. Therefore, both function and information are combined in the one image.
Now, let's return to "field one" and take a look at the W3C's Web Content Accessibility Guidelines 2.0 (WCAG 2.0), which are still in the working draft state, however. My original article referred to WCAG 1.0 and found at least a few hints:
Guideline 1. Provide equivalent alternatives to auditory and visual content.
Provide content that, when presented to the user, conveys essentially the same function or purpose as auditory or visual content.
The equivalent information must serve the same purpose as the visual or auditory content. Thus, a text equivalent for an image of an upward arrow that links to a table of contents could be "Go to table of contents". In some cases, an equivalent should also describe the appearance of visual content (e.g., for complex charts, billboards, or diagrams) or the sound of auditory content (e.g., for audio samples used in education).
From WCAG 1.0, highlighting by the author
Regrettably, WCAG 1.0 provides authors with no information on how such descriptions have to be written, what they should contain, and what can be left out. (See also Information Equivalence – References for more information about WCAG 1.0 and information equivalence.)
The new guidelines are even shorter with respect to information equivalence. Below, there is the only reference to "equivalent information" (or "information equivalence") that I could find:
Principle 1: Perceivable - Information and user interface components must be perceivable by users
Guideline 1.1 Provide text alternatives for any non-text content so that it can be changed into other forms people need such as large print, braille, speech, symbols or simpler language Understanding Guideline 1.1
1.1.1 Non-text Content: All non-text content has a text alternative that presents equivalent information, except for the situations listed below. (Level A)
Admittedly, some details can be found in the Techniques and Failures for Web Content Accessibility Guidelines 2.0, but not a concise definition of what "equivalent information" means (apart from that it typically has to be provided in text form so that it can be easily converted into other formats). Such recommendations may exist, but I – probably including many others – am not aware of them. Therefore, I cannot offer here "approved" recommendations. Instead, I would like to provide some examples that may help to clarify certain issues.
In the following examples, I have focused on images, that is, on vision. I have looked at each example in terms of the following aspects:
The first item refers to the "purpose" aspect of the WCAG 1.0, while the other items refer to the "description" aspect. For the latter it is important to check whether a page already provides descriptions for its images to see if there are none or only incomplete ones. It is also important to check how extensive the descriptions have to be considering the page text and figure captions, if existent.
Let me start with a few simple examples, similar to the one given by the WCAG 1.0. On my private Web site, visitors will find the following navigation icons:
|Icon||Purpose||Text Description According to Purpose||Text Description According to Icon "Look"|
|Go to the same page in German language||German version||A German national flag containing three horizontal stripes of equal height; the stripes are black, red and yellow from top to bottom|
|Go to the same page in English language||English version||A Union Jack flag containing a red cross with thin white border stretching over the whole flag; there are also two thin read diagonal lines behind the red cross. The background color of the flag is blue.|
|Go to the homepage||Homepage, Go to homepage||A small gray house seen from the front. It has a door and on the right side of the roof a chimney.|
|Go to the top of the page||Top of page, Go to top of page||A gray arrow pointing upwards with light shade on the left side and top and a darker shade on the right side and bottom.|
While these descriptions may not be "perfect", they should clarify the difference between descriptions based on purpose and such based on the look of the images evident.
The next example refers to charts. Charts are visual, analog representations of data tables. Thus, there should already be a text description behind the chart – although it may not be shown to the user. Even if it is not shown, this text description can be utilized and offered to blind users. Of course, there is a reason for using charts: For users with no visual impairments, charts offer an immediate insight that may be obscured within a data table.
Figure 1: This pie chart clearly shows that parties 1 and 5 together have the majority in parliament and can form a government – a table requires calculations to provide the same information
On the other hand, charts are inherently imprecise and often distorted because authors use three dimensional (3D) representations. These may look fancy but often hide or distort relations between the data.
Figure 2: 3D Column Chart – the 3D representation makes it hard to compare data columns; for example, which column is higher, column 3 or column 6?
Figure 3: 3D pie chart – the perspective view and the cylinder effect make it hard to compare sections; for example, note that the gray and blue sections both represent the same fraction
Many images are shown in order to explain or clarify complex processes, structures, or relationships. Typically, they are provided in addition to text and as an aid for better understanding of the text.
Figure 4: Diagram explaining a complex structure – a Web page based on frames consisting of an index area to the left, which allows to change the content of the frame to the right
When writing text descriptions for images that explain something, there is little to describe because a description is already given in the body text. As the image is shown to clarify things that cannot be sufficiently explained textually, an additional text description for the image cannot achieve that either. So, just stick to the basic facts and describe what the image is supposed to explain.
I personally have the problem – and I know that other people also experience this – that many diagrams or pictures are of little value to me. Either I do not understand the picture, or it does not offer any additional insights. In this case, it is even harder to formulate an appropriate description for the image.
There are many reasons why people use images on Web pages. One aspect is to transport information with an image. For example, images may show a certain landscape, house, or room. Let's take the room example first: The room may be in a motel or hotel, where people want to stay during their vacation. Therefore, they want to know what it looks like and what furniture is there, for example. Another example may be a house that people want to buy. They would definitely want to have an impression of what the house looks like before buying it and to know where it is situated, for example. There is also the need on the realtor's side for the house to look as favorable as possible.
Figure 5: Photo of an autumn landscape in Southern France
Figure 6: Photo of a French house – would you like to live there?
Figure 7: Interior of a house where you might have a vacation
Images that transfer information can be described textually but the question is where to start and where to stop. Ask three different people and you will get three different descriptions or "interpretations" of the scene. This phenomenon is well known to cognitive psychologists and, for example, poses problems when witnesses are interviewed at court.
And what about information that you add yourself, effectively by reading "between the lines." Some people will pick certain details on a photo and draw their personal conclusions, such as, that there is a highway nearby, or that the house is in bad shape.
Figure 8: Reading an image "between the lines" leads to a very personal interpretation...
For writers who have to provide text descriptions, this is a tough issue. I already mentioned that it is possible to draw an unlimited number of inferences from an image, and nobody can predict which inferences another person will draw. Therefore, I would recommend that you stick to the facts, and if you want to offer inferences, stick to the ones most people will draw.
At school I learned how to describe works of art, such as paintings. These descriptions are more than simple descriptions of the "facts". They are personal interpretations of a work of art and of the feelings it creates within a spectator. Typically, artworks are described in the body text of a Web page. Here, the text description can be kept short and should offer facts only.
Figure 9: A drawing by the author – not a work of art, but appropriate for demonstration purposes...
Websites often use images for decoration purposes, and to attract visitors. This is similar to magazine front pages presenting beautiful women or men to attract readers. Nowadays, it seems as though the Web has become a huge, worldwide magazine... Decorative images are a mixed blessing that many seeing users would prefer to turn off. Some browsers make that easy, others make it hard, and some older ones simply cannot display images. Often, decorative images clutter the appearance of pages and also increase page loading times. As we are living in an "advertising" world, we are unable to alter this trend for adding decoration to pages. Even private homepages, such as my own, follow this pattern...
Figure 10: Decoration on a Web page (surrounding blurred)
Why should this be a problem for blind users? You might argue that they are better off without all the superfluous decorations. On the other hand, it is often hard to tell whether an image adds information to a page or serves decorative purposes only. Usually both of these aspects are intermingled.
The general recommendation is not to add descriptions to decorative images because these descriptions would – like the images themselves – make the content harder to understand. Some conformity checkers, such as Bobby 3.2, require that decorative images be identified by alt="" instead of no alt attribute at all. In my opinion, this requirement unnecessarily bulks out the HTML code of a page.
If an image is used for both decoration and information purposes, try to extract the core information content when describing the image.
As already mentioned in the introduction, Nielsen and Norman published data based on a recent study demonstrating that there is a considerable performance gap between seeing users and users that have screen readers or magnifier software to assist them in their daily work. This data stirred up controversial discussions in CHIWEB, which I shall not go into here, but I will say that these results came as no surprise to me. The differences in performance are due to typical characteristics of vision and hearing. People using magnifier software, even though they are still able to see, may miss out on certain visual details, which inevitably slows them down. To add some background to my statements, let's first take a closer look at the characteristics of vision and hearing.
Vision is primarily a "spatial" or "parallel" sense. Each eye perceives a two-dimensional image, which our brain then combines to create a three-dimensional image. In addition to this, we are also able to perceive a pseudo three-dimensional space (sometimes called 2 1/2D): The three-dimensional space is "projected" onto a two dimensional plane, such as a piece of paper (a photo or book page, for example), or computer screen. Monocular visual cues, such as perspective, occlusion, or motion perspective enable us to partially reconstruct the three-dimensional world from two dimensions. In this reduced three-dimensional space we partly lose the ability to estimate distances and the size of objects. Finally, there is the time dimension, which allows you to view and record scenes in time – the basis for films and videos.
The spatial nature of vision enables us to recognize objects in parallel. We are able to decipher the global features of a scene almost instantaneously. In the case of a computer screen, this might be the absence or presence of certain screen areas or elements. We are also very sensitive to movement, especially in the periphery. This explains why animation on a Web page distracts us from our work.
We can only see a small area of our field of vision in full sharpness. Visual acuity is restricted to an area of about the size of a stamp on a computer screen. We compensate this by moving our eyes quickly around a scene, often we do not even notice that we are doing that. If we are forced to look more thoroughly at a screen, we have to switch to a serial scanning mode. For Western cultures, the movement is typically from left to right and top to bottom, as we use when reading. People with full-sight are able to scan very quickly when searching for specific features only, for a plus symbol ("+") inside minuses ("-"), for example, or for certain key words in a text. For example, Web designers can support this behavior by displaying important words in bold font, as has been suggested for Web pages. On the other hand, scanning can be very slow if people have to carefully read text or examine objects. While reading, we are in a similar state as while listening to speech. For example, we often need prior information to be able to interpret and understand the current piece of information – the interpretation process stretches across a certain period of time.
Another characteristic of vision is that we are only able to see one thing per location. If an object is in front of other objects and is not transparent, it obscures all other objects. If these objects were people and all of them were speaking, we would clearly hear them (although the farther away they were, the less clearly we would hear them). This proves that there is no such "spatial" masking in hearing.
Hearing is primarily a "time-based" or "sequential" sense. Speech, sound, or any noise is a "flow in time" and takes a certain amount of time to be interpreted. When listening to other people or a screen reader, our own comprehension speed depends on the speed of the speaker. For example, we might get nervous if people are speaking too slowly for us. Sometimes, we already know what they are about to say and we would like to stop them. The same is true with screen readers.
While hearing is primarily sequential, it also has a "spatial" dimension. Because most people listen with both ears, we can tell the direction of a sound and distinguish between front, rear, or above. Often, we are unaware of these capabilities and need further support through vision because vision is so dominant. For example, people have problems with distinguishing whether a sound is originating in the front of them or behind them (although there are noticeable differences in the frequency spectrum). Hearing also includes a 2 1/2D phenomenon, that is, a spatial effect that can be recognized with one ear. This effect is comparable to projection in vision: People can distinguish the distance of similar sound sources because loudness decreases with distance (let's call it "sound perspective"...).
While vision is "spatially exclusive," that is there can only be one piece of information in one place (one colored pixel, for example), hearing is not. Sounds can overlay each other, as happens when an orchestra plays, a choir sings, or a lot of people talk at the same time. However, there are limits to our ability to distinguish different sound sources. Orchestras and choirs even build on these limitations and blend the single sources to form a new "sound experience". The cocktail party phenomenon, as psychologist name it, also tells us that our ability to listen to different conversation threads is limited: We can only listen to one conversation at a time but we are able to catch certain "key" words, such as our name, from other conversations. These limitations are probably caused by the general limitations of our consciousness – we can usually only do one thing at a time. This limitation can only be relaxed if activities become automatic through extensive training. For example, whereas driving a car takes all of our consciousness in the beginning, we are able to perform it more a less automatically after some training.
In the preceding two paragraphs I have detailed the differences between vision and hearing. Basically, hearing is primarily a sequential modality, while vision is parallel. If we change the focus of our discussion from modalities to operation modes, we can easily include magnifier software. Magnifier software forces its users into a sequential and local mode of operation because users have to move a virtual magnifying glass across the screen that only allows them to view a small section of it. According to the Nielsen & Norman figures, users of magnifier software only have a slight advantage over screen reader users; On the minus side, they commit twice as many errors as screen reader users.
The basic problem with the sequential mode is that it takes time and places a heavy burden on short-term memory. The parallel mode, on the other hand, utilizes the spatial dimension, which enables users to save time and relieve short-term memory – the screen can act as an external memory that can be easily accessed by seeing users.
Both the narrow scope of the magnifier window and the way the screen reader reads sequentially are reasons why users can only focus on a small piece of information at a time. All the other information on the screen has to be kept in memory. Screen reader and magnifier users may utilize the screen as an external memory by having the screen read over and over again or move the magnifier glass repeatedly over the screen. However, this is cumbersome and time-consuming (the magnifier may be a little bit better in this respect).
Blind users, in particular, have well-trained memories, often much better than those of fully sighted users. But even though the typical short-term memory span of 7 +/- 2 items can be extended by training and mnemonic techniques, short-term memory is very fragile. It can be wiped out by any interruption – a phone call or a colleague asking a question can force you to start over.
In my opinion, these characteristics account for most of the performance differences found by Nielsen and Norman. Why users of magnifiers committed more errors than users of screen readers, however, remains unresolved. Perhaps these users had problems with moving the magnifier to the exact target on the screen.
According to Nielsen & Norman, users of screen readers and magnifiers took more than double the time to complete a task than the control group. Therefore, one important issue is to speed work up for these users. The second, even more important result, showed that both user groups had immense problems in completing a task at all. While the control group completed more than three quarters of all tasks, users of magnifiers completed a little more than a fifth of all tasks and those of screen readers only about an eighth of all tasks (about half of the magnifier group). Were the users of magnifiers and especially screen readers less intelligent than the other users? Definitely not, but their overall task, that is, the task itself, which was identical for all groups, plus the way they had to master it, had been much more difficult.
I do not know the details of the study, such as, whether the screen reader was implemented well. But I conclude from this data that – all other things being equal – the way the tasks had to be performed placed a heavy burden on the users of magnifiers and, even more so, on the users of screen readers. This burden not only originated from speed and memory issues but also was due to cognitive aspects, such as the organization and structure of information and functions. If we consider this from a more "local perspective", we could surmise that these users generally have much more difficulty discovering the purpose of a screen and its high-level structure than other users. In addition, navigation between high-level elements is cumbersome because this sort of navigation is either not supported, or if it is, users are confused because there are no well-established standards.
I know that I am on shaky ground if I make proposals without the resources to check whether they work or not. Nevertheless, I would like to offer some ideas for improving the work conditions of blind and visually-impaired users. My proposals refer to three areas: (1) Speeding work up, (2) revealing the high-level structure of Web pages and screens, and (3) accessing the high-level structure. Many of the proposals originate from discussions with colleagues. Perhaps these ideas will inspire further research.
The basic problem with the sequential mode is that it takes time and places a heavy burden on short-term memory. I propose the following approaches for tackling these problems:
As there is some overlap between this section and the following ones, let me focus on the "depth versus width" issue. Research on menu structures has shown that users are faster if shallow menus with many options are offered compared to deep menu structures presenting the famous 7 +/- 2 items on each level. This approach has been promoted by Ben Shneiderman, for example; it uses a function similar to that of fully-sighted users to quickly scan a page for certain key words. Yahoo! is an example for pursuing this approach – it presents pages loaded with links. For blind users, it may be very tedious to have hundreds of links read aloud and remembering all those links is also not an easy task. A faster approach might be to use categories that help users to quickly find their way through a taxonomy and not exceed 7 +/- 2 items on each category level, so that users can remember the categories. Technically, a binary search would be fastest, but this would require too many intermediate categories.
One of the evergreen usability issues is that of context. When users are working with an application or visiting a Web page, they need to know the context. They want to know where they are, what the purpose of the current location is (what they can do there), where they can go, what they can do in those places, and, maybe, where they came from.
Often this information is unavailable, in other cases it is incomplete. In addition to this, the information may be distributed all over the page or screen and therefore be hard to collect and present using a screen reader. Therefore, it is important to establish standards that help identify the high-level structure of a screen or Web page. For Web pages, a possible approach would be to utilize the header styles plus an additional description of the page and its function or purpose. The next issue is to make it possible for screen readers to extract and use this information. For example, this information could be collected using a script and presented in a secondary window that can be read by the screen reader. A more preferable solution would be for screen readers to offer functions for presenting high-level information based on certain structural tags.
Currently, keyboard-based navigation (also called tab chain or accessibility hierarchy) does not recognize hierarchical structures on a Web page. This means that, if the user enters a container, such as an iFrame, the next press of the tab key takes the user into the container, instead of taking him/her to the next container on the same level. A more efficient navigation would move a user around the elements on the same level. A certain key combination would then allow the user to go into a container element, if needed.
In such a scenario, the user would first identify the high-level information, or information areas, on a screen or page. Then the user would have options to quickly access these high-level elements. Following the principle of progressive disclosure, once the user has accessed a certain element, he or she would be provided with detailed information about it. If this element is itself structured hierarchically, the user could again "dive" into it and move around.
As already mentioned in "Speeding Up the Sequential Mode", standardized key combinations and dedicated hardware keys can speed up hierarchical navigation considerably.
Finally, I would like to add a few remarks on the social or "emancipating" aspects of equivalence. As the situation currently stands, information technologies have changed the situation for the worse for people with disabilities because most software applications and Web pages do not conform to the Section 508 or W3C standards. On the other hand, information technology offers new opportunities for these individuals and could help them to play an equal part in society.
One of these promising technologies is digitized text. Virtually any text information, such as literature (see the project Gutenberg, for example), fact books, text books, magazines and newspapers, or scientific publications can be digitized and presented on computers. This is a huge step forward. For example, visually impaired students had to wait until a volunteer read scientific publications or text books to a cassette recorder so that they could use it. If this text were digitized and the student had a computer with the appropriate hardware and software equipment installed, he or she could immediately use it. Clearly, people without visual impairments would also profit from digitized texts that are read out loud - they could take advantage of them, that is listen to the texts, even in environments where using a computer is impossible, for example, while driving a car. We should, perhaps, invent small digital devices with inbuilt speech soft- and hardware, which utilize memory cards - the same cards that are used in digital cameras. Such an option could also be added as a feature to portable CD and MP3 players...
Another promising technology is the Web. It fulfills – or at least, could fulfill – a prominent role with respect to accessibility: It is a vast, worldwide, hypertext-based information network that comes close to the original ideas once proposed by hypertext pioneers, such as Vannevar Bush and Ted Nelson. Once a piece of information is put into electronic format, it can be made universally accessible through the Web. This includes international and local news, entertainment, shopping, message boards, mailing lists and electronic mail, as well as all the "traditional" information sources. In this respect, the Web could have an emancipating influence for those with disabilities: It offers them new and easier ways of accessing information, and helps them to catch up with people that do not have disabilities. Application and Web designers have the responsibility to remove as many access barriers as possible and help to make this dream come true.
Equivalence has been the thread running through this article. Stumbling over the term "information equivalence" in the W3C's guidelines, I soon learned that equivalence comprises more than just information – it encompasses performance and also the status of those that have disabilities within our society. Information technology offers a chance to improve the status of individuals with disabilities but there is a long way to go. Achieving information and performance equivalence would be an important step in the right direction.