|Fighting (with) Hierarchies – Part I: Basics|
|Fighting (with) Hierarchies – Part II: Presentation|
|Fighting (with) Hierarchies Presentation - Techniques I|
|Fighting (with) Hierarchies – Presentation Techniques II|
|Review of "Information Visualization" (Robert Spence)|
|Review of "The Craft of Information Visualization" (Bederson & Shneiderman)|
By Michael Hatscher, SAP AG – Updated: January 20, 2004
Human beings are limited in many ways. Beside others, one limitation lies within the amount of information they can process or hold in working memory for a time given. That’s where Miller’s famous "seven plus/minus two" rule comes from – you can only have up to nine chunks of unrelated information readily accessible in short term memory. (You’ve probably already heard people telling you, "Don’t put more than seven items on a PowerPoint slide", or "Don’t build your menu structures more than seven items wide". While it’s certainly a valid recommendation to restrict yourself and not to overload your slides, Miller’s rule has been misunderstood quite often: Miller states people have great difficulty keeping more than seven or nine unrelated elements in mind. If they try to concentrate on more, access performance decreases rapidly. This doesn’t mean you cannot work with more than nine elements – if you find a way to organize those elements or to find a relation between them, it’s easy to keep lots of items readily accessible. See Don Norman’s excellent books on this subject.)
To overcome this shortcoming, people tend to reduce complexity by constructing groups, or categories, of related and similar elements, excluding elements of less similarity. Categories can be nested and are mutually independent, and that’s how hierarchies come into play: basically, they are just more or less deeply nested systems of categories, getting more specific with every layer you descend, yet making it harder not to lose orientation the deeper you dive down into them.
Hierarchies have several advantages. Most of all, their order is very clear – every item that is within one category is more specific and less high-level (or abstract) than the category itself, nearer to the action or the subject matter. (You can see this within the organization of military: a captain is given high-level orders by his major and gives more concrete commands to the corporal, who hands very specific tasks on to the private. Everyone knows who has to follow whom. That’s very efficient if things must be processed fast, top-down and without discussion.)
Categories and hierarchies by definition have one major drawback: every categorization is arbitrary, making sense to the person designing the system of categories, but not necessarily to others. There are ways to check a categorization’s sensibility: usually you design your categories inductively, deriving the category names from clusters of items, then take experts from the knowledge domain, and let them sort the items into your categories – afterwards you compute the degree of how good the experts’ assignment of items to categories matches. This tells you about the validity of your categories. (It’s rather exhausting.) That’s why it is very important to always check back category descriptions with others – preferably with users (i.e. domain experts). The wording of the category label must give them an idea of what is to be found inside.
Moreover, people differ with respect to how they organize their knowledge. E.g. if you want to build a system of categories of species of animals, you can sort them according to certain physical characteristics or behaviors, e.g. into orders of bird, fish, and mammals (let’s forget about insects and reptiles for a moment), following the usual zoology classification. You could also classify them according to their living space – i.e. sea, air, or land. If you want to match those two classification schemes, you realize there is no one-to-one correlation, but you will have to live with a one-to-many connection: there are mammals that live in the sea (e.g. whales, dolphins) whereas most of them are land-borne; as well as there are birds never flying (e.g. the ostrich).
Even more importantly, by reducing complexity, sometimes valuable information is lost. The above example allows you to assign one element to exactly one group of a one-to-many matrix. See the following image to visualize this point.
Figure 1: Hierarchical network with some additional nonhierarchical connections
Now how about sea turtles that spend nearly all their lives in the sea, yet come on shore to lay their eggs? Or what’s on with the duckbill – it doesn’t even fit properly into the category of mammals! Look at it from a behavioral point of view: it lays eggs, but it also suckles its young. Okay, you can say: let’s classify them according to what they do or where they live most of the time – whichever criterion is prevalent. By saying this, you select one piece of information as more important than others, and by assigning them to one group, you lose all the information you have on that item. Actually, you’d like to allocate this item to two groups at a time (which is not possible due to the character of independence of the categories. You could open up another category with a mixture of the characteristics. Unfortunately, this way you’d lose the advantage of complexity reduction which lead you to start all this.)
The dilemma of losing information through item allocation is known in Psychology from Factor Analysis and questionnaire scale construction. If you look at a knowledge domain, you find several groups (or clusters) of items which are rather near to one another as far as their content is concerned. (E.g. if you want to know what things contribute to intelligence, you’d take a look at the literature on intelligence, ask experts from the domain… collect lots of items, statements and indices and thus construct the knowledge space. Unfortunately, no one can handle this vast amount of data; you’ll need some means to reduce the complexity: Factor Analysis. Factor Analysis helps you to reduce the information within a domain to a small number of highly condensed entities, the so-called factors. In the example, possible factors could be: processing capability, mathematical intelligence, musical intelligence, verbal intelligence, and social intelligence, just to name a few.) Factors are used to structure a domain; they are laid through those clusters of items in order to map as much of the items’ information on as few factors as possible. You start by laying one factor through the biggest cluster, then applying a second factor orthogonally to the first, aimed through the second biggest cluster of items. Then comes the third factor, equally orthogonally to the first and the second one and trying to beam through the next biggest cluster. When it comes to the forth factor, our imagination collapses ?. Factors are mutually independent, building an n-dimensional orthogonal grid through the domain space. Items live near the factors; if the factor solution is good, most items are very near to just one factor. (In our example, you’d find items like “knows most of a word’s different meanings,” “takes very little time to solve anagrams,” or “thinks puns are fun” within the “verbal intelligence” factor’s neighborhood.) Nevertheless, they also “load” on the other factors just a little bit. When you want to build a questionnaire from a factor analysis’ results, this little load on the other factors is lost (and the factors’ mutual orthogonality or independence with it), as you have to decide on where – on which scale – to place an item: You act as though the item belonged to the main factor alone.
Using a traditional file system (with sheets of papers in metal file cabinets), if an item belongs to several categories, you have to put one copy of the item in each corresponding folder. This is rather cumbersome for you have to update all the copies in case anything gets changed; you have to think of placing new copies whenever some new item of the same class is added, and so forth. In the most prominent and ubiquitous example of hierarchies in the computer industry, the file systems, this problem can be worked around dirtily by placing “aliases” (MacOS) (or “links” in Windows, “symlinks” in Unix) of documents (or folders, or volumes...) into several folders: You could place the element “duckbill” in the folder “mammals” of your “classical zoology categorization system” and put aliases to the file in your “lays eggs” folder and your “suckles young” folder of your “behavioral categorization system”. The alias allows you to have the up-to-date information from wherever you encounter the document (or its alias), thus seemingly resolving this problem.
What is not yet cared for is: in case you have a new document of the same class, you have to go about and put aliases all over the place as well; and: there is no way (except a search, what a nuisance) to tell you where the aliases are and how the different categories relate to each other. The inter-category information on the item is hidden and can only be extracted by taking a look at where the aliases and the original document have been placed. Due to the file systems’ design (and limited screen real estate), you can’t open up all the folders at once; you only get a small cutting from the file system’s content, rendering a manual search for something a real pain (see Windows Explorer or Mac OS Finder list view – or, even worse, DOS’ “dir” or Unix’ “ls” command). Folders tend to displace one another’s content from perception; you just focus on what is in one specific folder. This might not be so bad when you browse through your file system opening one window for every folder you open, thus filling your screen with windows as you go, thereby concealing the other windows – but you’re not given any information on the relationship between the folders, except maybe the system path (which is very abstract). A little help is brought about by systems supporting a hierarchical browser (NeXTstep, Mac OS X), though even this doesn’t help the underlying problem of the file system.
For a person searching another one’s file system (and sometimes even to oneself) trying to grasp the rationale behind the documents’ organization can be a very hard task. In fact, search functionality as well as the Windows Start Menu (or the Mac OS’ Apple Menu) are just a crutch: They allow for easy and fast access to information (files, programs, folders) deeply hidden within the file system, offering a possibility of horizontal access to a vertical, tree-like organization of files and folders.
Several research units have come up with possible solutions to these problems; some of them are rather ripe already. Nearly all of them focus on displaying information "leaves" (the actual documents) spatially within a space opened up by knowledge vectors (the former "nodes"), thus allowing for items being placed between two categories – sometimes even showing that an item belongs within Category1 with 70% and could be assigned to Category2 with 30%. One of the most important changes is: The categories shift from being mutually exclusive classes (containing objects) to becoming content vectors through the domain space, allowing you to organize your content in relation to them and in relation to other objects / leaves / documents.
My favorite implementation is Project X (or HotSauce, an abandoned project by Apple’s unfortunately equally discontinued Advanced Technology Group). It allows you to fly through a 2 1/2D space of nodes (categories) and leaves (html pages); the spatial information on each node and leave is held in a Meta Content Format (MCF) File describing the knowledge space. Take a look at the Yahoo! picture showing how you can fly through their categories, diving deeper and deeper into their knowledge space as you go, thereby seeing more and more specific nodes and leaves while you dive down. More information on MCF can be found on http://www.w3.org/TR/NOTE-MCF-XML/MCF-tutorial.html. You can install the browser plug-in from this site (www.xspace.net/download/index.html) and take a trip through one of the X-Spaces on this site: www.alvitec.ch/hotsauce/welcome.html).
Figure 2: Project X
Another similar example is the u.s.u. Knowledgeminer (see picture); here you can dive into the classes as well. Clicking on the lines (nodes) between the leaves shows you the relationship between the items ("is child of", "belongs to"). It’s rather impressive!
Figure 3: The USU KnowledgeMiner
There was an IBM project as well, but I don’t remember its name. I think it went into Lotus kStation; it featured a three-D (or 2 ½-D) approach into a knowledge domain, but switched to a hyperbolic tree (see also the article "Fighting (with) Hierarchies – Part II: Presentation") as soon as you decided you were deep enough.
Even these solutions and prototypes have severe shortcomings; they are better than the traditional approach, but due to their being at the utmost 2 1/2 dimensional, they have great difficulty in visualizing situations when an object belongs to more than two categories. A real progress would be made by an n-dimensional solution (where n equals the number of vectors, or categories). How to visualize that still remains the question...
To wrap up the article, I’d like to point out the shortcomings of the present approaches to hierarchies and to stress the advantages of the new, spatial ones. Hierarchies with folders and sub-folders are great as long as your documents can be organized in a strictly hierarchical, mutually exclusive way, when there’s no overlap of content. As soon as you have to think about where to place the document, you run into trouble. The best way today is to place aliases in every place appropriate. Due to the file system’s nature, most of the information (where a file is, what other files there are nearby) is hidden from the user – except s/he opens this specific folder, which mostly causes all the other information to get out of view. The wording of the folders’ names contributes to the hierarchies’ complicatedness – if the name doesn’t make sense to you, you have to open the folder to see what’s in, or you don’t open it at all.
If we had a way to organize our documents spatially, in relation to, but not within categories, we could use all the information there is within the document, sometimes choosing five or more categories the items has something to contribute to. Moreover, all the document’s context could be visible at a glance; you could dive into the knowledge space as you (supposedly) dive into your domain memory, using a chaotic and associative way rather than an artificially cleanly and clearly defined one.