Abstract: This paper describes the advances of the software developed in the context of the TeDUB project ("Technical Drawings Understanding for the Blind"), which aims at providing blind computer users with an accessible representation of technical diagrams. The TeDUB system consists of two separate parts: one for the (semi-) automatic analysis of images containing diagrams from a number of formally defined domains and one for the representation of previously analysed material to blind people.
One of the problems for blind and visually impaired people is to make use of information contained in graphics. While a number of mature techniques (e.g., optical character recognition, screen readers or braille devices) allow them to access text from documents, even if they originate from printed material, the content of informational graphics like technical diagrams typically remains inaccessible. The usual approaches to address these difficulties are tactile diagrams and the use of manually created meta-data like textual descriptions (see Kurze 1995). However, both approaches necessitate active human intervention. For tactile diagrams, the existing data has to be carefully redesigned (see Levi and Amick 1982). Other possible solutions rely on specialised (and often expensive) hardware like touch tablets and combine them with sound (e.g., the TACIS system, Gallagher and Frasch 1998).
The TeDUB system makes graphical information accessible using semi-automatic and automatic analysis of graphical content and the import of file formats that contain semantic information, and it presents the information to the blind user through a specialised navigation interface. It is intended to handle technical drawings (diagrams that conform to certain standards) from arbitrary domains and demonstrates this for three domains: analogue and digital electronic circuits, certain UML (Unified Modelling Language) diagrams and architectural floor plans. It consists of several parts for the interpretation and presentation of such diagrams. Since first results were published by Petrie et al. (2002) and Födisch et al. (2002), several prototypes have been developed. The system's architecture has received fundamental revisions and is now able to process three types of diagram input: bitmap graphics, vector graphics and file formats containing semantic information. The navigation interface has matured considerably based on the results of intensive user evaluations of the first prototypes.
The TeDUB system consists of two main parts, DiagramInterpreter and DiagramNavigator. DiagramInterpreter analyses existing diagrams and converts them into a representation that can be used by DiagramNavigator, which provides blind users with an interface to navigate and annotate these diagrams through a number of input and output devices.
The TeDUB system is able to handle diagrams at different levels of abstraction: Bitmap graphics, such as acquired through standard scanner hardware or found on web pages, vector graphics as typically produced with graphics programs like Corel Draw and file formats with semantic content - the XMI format (XML Metadata Interchange) or formats specific to CAD- and other modelling software fall into this category.

Figure 1: Architecture of DiagramInterpreter
DiagramInterpreter's core is the knowledge processing unit. It operates on a network of hypotheses and processes them incrementally until a semantic description of the whole diagram is found. The image processing unit analyses bitmap images and generates a first set of hypotheses based on the geometric information therein. Vector graphics files, which already contain explicit information about geometric primitives, can be used via DiagramInterpreter's SVG (Scalable Vector Graphics) import functionality. The Annotator allows a sighted user to interact with the interpretation process by inserting hypotheses manually and thus improving the quality of the interpretation as well as adding useful information not contained in the original diagram. All domain dependent aspects of DiagramInterpreter are externalised as formalised knowledge. Therefore, the system is designed to minimise the effort to incorporate a new type of diagram.
DiagramNavigator is the user interface component of the system. It presents the diagram content obtained by DiagramInterpreter to the user. It also performs XSL transformation of XMI-format UML diagrams exported from UML design tools like Rational Rose or ArgoUML into the same TeDUB form, presented by the same user interface (Figure 2). The great advantage of this latter approach is that the information contained in the diagram is converted perfectly into the TeDUB format: the variable result of image analysis of bitmaps is avoided.
In both cases the information is modelled as a set of nodes which can be navigated either hierarchically or as a collection of connected graphs. Output is screen-reader independent and utilises 2D and 3D sound. Input is via the keyboard or an optional tactile tablet. An inexpensive commercial games force feedback joystick is used as a simple input and output tactile device. With the exception of the tactile tablet, the interface is designed to use inexpensive and commercially-available devices: this is important if the system is to have any real application in future.
Analysed data is exchanged between DiagramInterpreter and DiagramNavigator in an XML-based format that contains the semantic information from the original image. As depicted in section 3, DiagramNavigator is able to communicate hierarchical and spatial information, which is represented by two types of edges: part-whole-relationships and relative positions for defined pairs of objects. Figure 3(a) and Figure 3(b) show parts of the representation of an example diagram from the architectural domain.

Figure 3(a): Representation of an architectural diagram: a section of the hierarchy

Figure 3(b): Representation of an architectural diagram: a section of the connectivity presentation.
The network of connected nodes does not necessarily respect the hierarchy levels.
As an advantage of this representation, information is only included if it is relevant to the user: in a UML class diagram, e.g., the exact geometric path of an association between class nodes does not help in the understanding of the diagram. On the other hand, the combination of the hierarchical representation of floor plans (see section 3) and the spatial layout of the rooms makes it easier for a blind user to find his way through the depicted building. By conveying the semantic hierarchical structures of the diagram the user does not have to build up the structures himself through painstaking synthesis of the simple nodes making up the diagram.
The main goal of DiagramInterpreter is to build an interpreted representation of a given diagram. As noted above, the TeDUB system allows to process three different types of input data. Of these, bitmap and vector graphics have to be interpreted in order to be presented to blind users in a meaningful way.
Automatic interpretation is performed by processing components of the diagram in a partonomic hierarchy of different abstraction levels, from lowest (geometric primitives like "straight line", "curve" or "rectangle") to highest (functional units like "room" in the architectural domain or "full adder" in an electronic circuit). In a partonomic hierarchy, two elements are related if one is a part of the other. In the architectural domain, e.g., several hypothesised lines or arcs may be parts of a door or a window, while several windows, doors and surrounding walls may be parts of a room. The knowledge management unit uses a data- and model-driven aggregation process which creates a complete interpretation of a diagram by stepwise inferring new parts from existing ones.
Diagrams in bitmap or vector graphics format may be of varying quality. This is especially true for graphics from scanned documents where noise and other distortions can lead to missing or ambiguous information. But information may also be ambiguous in vector graphics formats, e.g., if a text annotation must be assigned to one of two nearby objects. The inference mechanism deals with this uncertainty by treating elements from the diagram as hypotheses about its parts. Each hypothesis is assigned a value that represents the confidence in its correctness.
The TeDUB system aims at being domain independent. Therefore, all domain-specific aspects are externalised as formalised knowledge (ontologies) and new types of diagrams are made accessible to the system by specifying the corresponding ontologies. The core of the formal language consists of aggregation rules for the definition of concepts. Obviously, concepts on lower levels of abstraction are the least domain-dependent and are suited to be modelled in a reusable way. Concepts on the lowest level must also be pre-defined in order for the several modules of DiagramInterpreter to communicate with each other.
In the case of bitmap graphics (the lowest level of input to the TeDUB system), images are first analysed by the image processing module which provides the means for an extraction of image features. The module follows the usual image processing pipeline of pre-processing, segmentation and feature extraction (see Abmeyr 1994) with an emphasis on the extraction of lines. The goal is to obtain an initial set of simple hypotheses describing geometric properties of the image that serve as input to the knowledge management module.
The positions of lines in the image are determined using a skeletonisation approach, which determines the approximated centre lines of all components (an approach also used by Dosch et al. 2000). In a next step, these lines are transferred into graphs of connected components. The nodes of these graphs define crossings, end points and corners of lines, while the graphs' edges contain information about the line segments - their thickness, curvature and other properties necessary for the subsequent classification.
Currently, elements from the input graphics are classified by the image processing module as one of four concepts, which also constitute the pre-defined set of hypotheses necessary for the communication between modules: straight line, arc, line graph and textbox. A line graph contains connected line segments such as adjoining walls in a floor plan. A textbox describes the position and content of text lines in the diagram. The actual recognition of the contained text is done by an external OCR engine.
The TeDUB system is designed to communicate semantic information to the user, rather than precise component orientation and spatial position. The diagram content is formed into a connected network of nodes. There is also a compositional hierarchy, so a node may be a high-level aggregation of basic components or a low-level component. Figure 3(a) shows an example for an architectural diagram. The user navigates starting from the root, top node of the diagram, and so encounters the semantic structures before the simple components. This is intended to allow blind users to access the important high-level information as immediately and quickly as possible. The actual implementation of this hierarchical navigation is modelled upon Microsoft Windows Explorer: the user can move around all the diagram contents using the cursor keys, a mechanism prompted by the observation in early evaluation studies, which showed that users were familiar and comfortable with such an interface. It utilises simple earcons (non-speech sounds) in the style of Brewster (1998) as context and feedback sounds to supplement the text-based user interface, such as a tone to indicate the end of a list or the lack of child nodes of the current node. A miscellany of functions support common navigation and communication tasks, for example allowing annotation to be applied to any node, the ability to retrace one's steps with a back function like that in a web browser, a search function for finding nodes by content or type, the ability to hide or show different types of nodes, and simple editing abilities.
Spatial and connection information (which may or may not be important, depending on the diagram domain and the task undertaken by the user) is orthogonal to the hierarchical information, connecting nodes within levels (and possibly between them). An example is given in Figure 3(b).
The presentation of this connectivity and spatial information presents more problems than the text-based hierarchical information. The interface therefore includes a number of different functions driven by an inexpensive commercial games joystick as an unsophisticated tactile device. A map function allows the user to locate nodes within the diagram space by directly associating a joystick position with the corresponding point on the diagram: as the user moves the joystick, they hear the names of the nodes encountered at that location. The user can also use the joystick to explore the connections of one particular node: when the joystick is pointed in the direction of a neighbour its name is given. In both of these functions spatialised 3D audio is used to reinforce and confirm the tactile and text output. Standard computer sound cards are now capable of complex 3D effects, so no specialised equipment is required. Further user interface functions have been developed for communicating architectural diagrams, including an attempt to use the force feedback abilities of the joystick to delineate the shape of a room - spatial rather than connectivity information - and these will be evaluated with users in the next round of evaluation.
The user interface is designed to be screen-reader independent: it is built with standard Microsoft Windows controls (such as text boxes and buttons) and complies with Microsoft accessibility and user interface design guidelines. This allows users to use their familiar and reliable screen reader to access the diagram information, although it restricts the interface design (for example, making it impossible to vary speech location or voice to communicate different spatial information). Expert diagram users, such as blind software engineers, can be expected to be screen reader experts and quite possibly Braille users: it is sensible to allow them to utilize their strong screen reader skills rather than presenting them with a dedicated but perhaps less useful self-voicing speech interface.
To support navigation through UML diagrams a generic tactile overlay and touch tablet will be used to provide connection information between nodes in a similar way to that suggested by Blenkhorn and Evans (1998).
This section presents the results of the evaluation of the first TEDUB prototype software. The evaluation focused on the different user interfaces and how participants navigated through the diagrams. The test group consisted of eleven visually impaired participants. Each participant familiarised themselves with the software remotely. This was followed by a face-to-face interview using a semi-structured question schedule. Three had previous knowledge of the domain being evaluated (electronic circuits) and eight had no previous knowledge of this field.
During this evaluation, participants were asked to explore diagrams, to search for information and to calculate the output of a diagram given the input (as in the domain of digital diagrams). Following these tasks, the interviewer administered the question schedule, examining the participants' experience of the system and looking in detail at the specific components; the different interfaces and the ease with which participants could solve the initial tasks and find and understand the appropriate information.
The first part of the study concerned the use of the different interfaces, namely computer keyboard, screen reader, 2D sound, 3D sound and joystick.
In general, users of the system experienced no difficulties in working with the keyboard, the screen reader and the joystick. It was important that as far as possible the keyboard functions followed the keyboard commands with which users were already familiar. There were no problems encountered in the use of screen readers and the system performed well in this respect. The use of 2D sounds like warning and error signals was highly appreciated. The support of 3D sounds for the spatialisation of the diagrams seemed less effective. Where users could read the same information without supporting 3D sounds, they preferred to do so. In addition the 3D sound system is not very easy to use in office or educational environments. It was found that participants had to get used to the joystick, but after some practice it was felt to be quite a natural way to explore the diagrams.
By using a combination of these interfaces, all the participants were able to build up a coherent representation of the diagrams. Participants demonstrated an ability to find the relevant information in the most efficient manner and indicated that any kind of redundant information should be skipped. One new functionality was recommended: that the system should give more feedback about where the user is located within the system (i.e., current item and current level). In this respect, a function that warns if all the components on a level have been visited could help to reduce any uncertainty on the part of the user.
For the second part of the evaluation, we investigated whether or not users were able to understand the way information is structured in the TeDUB diagrams created by the prototype software.
The key result from the evaluation concerns the manner in which participants build up their mental representation of the diagrams being examined. Existing research (see Miller 1956) reports that the human short term memory is able to process 7 plus or minus two items simultaneously (although later research is stating that this is less than 7). These items can be solitary objects (such as numbers, words, or pictures) but also composite objects (i.e., related pieces of information that together form larger, meaningful objects) thereby stretching the total amount of to be processed items. This was found to be particularly important for the way in which information is presented in the TeDUB system when dealing with more complex diagrams.
The results from those participants who performed the tasks with the more complex hierarchical diagrams showed a confirmation of the hypothesis that clustering information can exceed the total amount of information that can be processed. In these diagrams, the information was clustered into a smaller number of composite items as the user navigated up to a higher level. Where it was possible to group related items into meaningful composite objects, users found it much easier to gain an overview of larger and more complex diagrams. This strategy will be taken into account for the presentation of information in the other domains being examined.
This paper provides an overview of the TeDUB project, its technical modules and the evaluation of its current state. The evaluations of the user interface are promising: The user interface features and functions are effective and well accepted by the users and the concept of providing information at different levels of abstraction has turned out to be very useful. Also, first results of the automatic interpretation of diagrams from the investigated domains are encouraging. The current TeDUB system is a good basis for further development in the project.
The work presented here is funded by the Information Society Technologies Programme of the European Commission under the project "TeDUB: Technical Drawings Understanding for the Blind" and the contract number IST-2001-32366.
Abmeyr, W. (1994), "Einführung in die digitale Bildverarbeitung", B. G. Teubner, Stuttgart