4.8.2. An axis‑based representation
A related alternative to “representation by parts” is “representation by axes”. An axis‑based representation, in which both major and minor axes of an object are represented by vectors (Marr & Nishihara, 1978), has been hypothesized as being used for many common 3‑D shapes, including the kinds of block‑like figures used by Shepard and Metzler (1971).
An alternative to the hypothesis that encoding times and rotation rates will increase monotonically is that performance will not be monotonic, because not all parts of an object have equal status in its representation. For example, the representation might be hierarchical (Hinton, 1979; Marr & Nishihara, 1978; Palmer, 1989), so that an object’s principal axis of elongation is represented by a generalized cylinder enveloping several vectors (Marr & Nishihara, 1978). It has been shown that the orientation of an object’s principal axis influences shape discrimination (Friedman & Hall, 1992; Humphery & Jolicoeur, 1988). Thus, in addition to orientation, the structure of an object along its principal axis might be more important for shape discriminations than the structure of the object along its minor axes.
If objects are represented in an object‑centred coordinate system, whose primary axes are determined from, say, the axes of elongation of the pattern, then why should it take longer to recognize a rotated photograph of an object than normal view of that object? Although one might argue that the normal position of an object is more familiar, and hence responded to quickly due to over‑learning, this in itself does not explain why the degree of rotated an object away from the familiar orientation should have such a systematic effect on reaction time. At very least, mental rotation phenomena provide important evidence about the establishment of reference frames in object perception.
4.8.3. Reference frames in mental rotation
There is an increasing body of evidence which suggests that at least under some conditions, objects are not represented by 3D object‑centred models.
Rock (1973) investigated the effects of a change in orientation of shape on subsequent recognition. He was concern of the problems people have in recognizing an outline of the African continent when it has been rotated 90 degrees within the plane (see Rock, 1983, Figure 3‑5). Moreover, a distorted version of the upright shape is judged to be more similar to the original than the rotated version.
In a subsequent study with the same objects, it was shown that merely displacing three‑dimensional novel wire objects laterally from one location relative to the viewer to another, for example, from upper left to lower right, produces a sharp decline in recognition (Rock & DiVita, 1987). These results suggests that a three‑dimensional object of the kind used yields a different retinal image of the same object as a function of its orientation or location relative to the viewer. The shape of the image of the object becomes very different. Therefore, there are cases where the representation achieved is primarily viewer‑centred despite the fact that all information necessary for the achievement of an object‑centred representation is available, that is, the three‑dimensional relationship of the parts of the object to one another. However, an object‑centred representation does not seem to be achieved.
It is true that we generally do recognize things when they remain in the same orientation in the environment and we view them from an altered orientation (Rock, 1973). Thus one might say that in such cases maintains of viewer‑centred orientation is not required for recognition. Even so we would have to say that an object‑centred representation must include reference to the directional coordinates of the environment. Only with objects that have clear intrinsic axes, such as human body, can representations ignore environmental orientation.
4.8.4. The role of frame of reference in shape recognition
The term frame of rotation designates a strategy in which the perceiver’s system of coordinates (or frame of reference) is rotated until it matches the orientation of the stimulus. For example, if subjects rotate their frame of reference to match that of the disoriented stimulus then RT should vary with the angular deviation between current stimulus and preceding stimulus.
It has been suggested that shape perception is driven by the presence of local shape features, such as lines of particular orientations, edges and angles, in particular locations in the external world (Lindsay & Norman and Neisser, 1967; Sutherland, 1968). However, representations specified in terms of retinally coded features could differ for the same shapes when they vary in size and orientation. Such representations would have little benefit for recognition and identification, which demand equivalent responses across size and orientation transformations. The type of shape representation necessary for recognition and identification purposes must be defined in terms of a characteristic of the shape that remains constant over changes in viewing angle and distance.
The concept of perceptual frames is important in understanding the perception of shape and orientation (Palmer, 1975; Marr and Nishihara, 1978). The reference frame account proposes to solve the problem of how we recognize the shape of an object in a way that is independent of its orientation (i.e. object constancy). If one encodes the object’s shape relative to a reference orientation that coincides with an orientation‑invariant property of the object itself, such as its axis of elongation or axis of symmetry, then its shape can be described in an orientation‑invariant way (Palmer, 1975).
Marr and Nishihara (1978) suggested that a representation of the three‑dimensional shape of the object must be constructed in a view‑independent coordinate system. Such descriptions are termed the 3D model representations. They proposed that 3D objects can be represented in “object‑centered coordinates” by a hierarchical description in terms of primitive shapes (called “generalized cylinders”) at various levels of resolution (see figure 2.2, page 42). Because each generalized cylinder has its own intrinsic axis, this axis can be used to define the reference orientation relative to which the object’s shape can be represented, thus providing an orientation‑invariant description that can be used to identify the object.
Accordingly, if local features of a shape are coded relative to the shape’s principal axis, then this shape description should remain constant over changes in shape’s orientation and size.
Evidence for this hypothesis has come from investigations of the time taken by subjects to classify two‑dimensional (2‑D) shapes (e.g., Humpherys, 1983; Palmer, 1980, 1985).
Humpherys (1983) showed that the time taken to match 2‑D shapes across different orientations is affected by the congruence of their descriptions relative to particular reference frames.
However, there is conflicting evidence on whether subjects can use orientation information specified in advance by a reference frame to facilitate shape processing. For instance Cooper and Shepard (1973) gave advance information about the orientation of a letter in a task requiring subjects to discriminate between normal versions of letters and their reflections. They found that the advance orientation information did not affect discrimination. Also, in studies of shape matching, Humpherys (1983) failed to find any effect of orientation on the time taken by subjects to respond that two shapes were structurally different: RTs were equally fast when the shapes were in the same orientation and when they were in different orientations. These data suggest that subjects cannot represent shape and orientation information independently.
There is also evidence from studies of mental rotation which is increasingly difficult to explain by object‑centred orientation‑independent descriptions (e.g., Jolicoeur and
Millikan, 1989; Tarr and Piker, 1989). A more plausible interpretation is that at least under some conditions multiple views of objects in different orientations are stored.
4.8.5. The alignment approach to object recognition
Ullman (1989) rejects the need for structural descriptions of objects entirely. Instead he proposed a pictorially‑based alignment method of object recognition using storage of multiple views. The alignment method involves two stages. In the first, the transformation required to align the perceived object with stored object models is determined. This transformation which includes changes in scale and orientation, could be based on minimal information such as the principal axis of the object or three corresponding points in the perceived object and model. These points can be located at salient points in the object boundary such as curvature maxima. The second stage in the recognition process involves matching across all possible object models. As a unique transformation has already been computed for each model, different views do not needto be taken into account at this stage. Changes in view which give very different images can be handled by storage of multiple views of object rather than three‑dimensional object‑centred structural descriptions. Ullman does not exclude the use of abstract descriptions, but suggests they can be used pictorially rather than in structural descriptions. Thus an abstract label can be applied to a specific location in an object model which is subjected to the alignment process. This is different from the use of a set of specific relational labels used in a symbolic structural description.
From the preceding discussion of structural descriptions it should be clear that the central idea is to have a representation that makes explicit the type and spatial arrangement of salient parts of a given object.
The distinction between image rotation and frame rotation hypotheses has received some attention in connection with several tasks involving the processing of spatial information.
Huttenlocher and Person (1973) asked children to anticipate how an array of objects would look if it were rotated around its axis. This “array rotation” problem was much easier than a “viewer rotation” problem of anticipating how the array would look from a different perspective. Subsequent work (Huttenlocher and Person, 1979) indicated that viewer rotation problems are actually easier than array rotation problems when the task requires the identification of objects in specified positions rather than the determination of the appearance of the entire scene. Presson (1982) reported similar findings with adult subjects. However, there has been little work directly contrasting the image rotation and frame rotation strategies in mental rotation tasks like those of Shepard and Cooper (1982).
Several finding, however, are pertinent. Cooper and Shepard (1973) found that advanced identity and orientation information cancelled the effects of rotation, whereas advance orientation information alone did not. This suggested that subjects are unable to mentally rotate an abstract frame of reference. Hinton and Parsons (1981), on the other hand, found advance orientation information to be effective when the stimulus set consisted only of characters processing common structural features. But in all cases where subjects are found to benefit from advance information, it is not clear whether this is achieved by rotating one’s system of coordinates or by rotating a complete or partial representation of the stimulus. This findings lead Koriat and Norman (1984) to investigates what is rotated in mental rotation task. In four experiments focused on sequential effects in mental rotation, one involving normal and reflected letters and the other three involving lexical decisions on Hebrew letter strings, much stronger evidence for the image rotation hypothesis was found, though weak but systematic effects of frame rotation were also obtained. Increased likelihood that the same orientation would be repeated did not yield any stronger frame rotation effects. Also there was no indication of consistent individual differences in the preference for the frame rotation strategy. Koriat and Norman’s results are consistent with those Cooper and Shepard’s (1973) interpretation of the lack of effects of advance orientation information as indicating that subjects cannot rotate an abstract frame of reference.
Hinton and Parsons (1981) reported that advance orientation information did reduce RTs in the Cooper and Shepard task, but only when there is a consistent relation between the prespecified frame and the to‑be‑discriminated stimulus. The absence of a consistent relation between the frame and the direction of the stimuli in Cooper and Shepard’s (1973) study have precluded any frame effects.
In some of the studies discussed in the preceding review, considerable debate has arisen over the existence of analogue representations in memory, particularly as they refer to visual memory.
It has been claimed that imagery is an instance of an analogue representation. Analogue representations directly mirror the world: analogues contain a point‑for‑point correspondence with the object they represent. For example, a photograph is an analogue.
Most of the evidence points to images being analogue representations. The most famous demonstration is the rotation experiments of Shepard and Metzler (1971). Subjects were asked to judge whether two block structures were the same or different when their relative orientation was varied. It was found that decision time was a simple linear function of the orientation difference between the two shapes, suggesting an analogical representation. This and other results (Shepard & Cooper, 1982) provide strong evidence that subjects perform such tasks by forming a mental image of one object and rotating it until it is at same orientation as the other. This result is impressive because images are not actual, rigid objects, and hence are not constrained by physics to have to pass through intermediate positions when the orientation of an imaged object is changed.
Furthermore, Kosslyn (1975) reports that if an object is imagined at a small size, more time is required to “see” its parts than if the object is imagined at a larger size (see Kosslyn, 1980, 1983). This result is interesting because it suggests that objects in images are subject to spatial summation (summing over a region of adjacent locations), a well‑known property of neural mechanisms used in vision.
Some evidence, however, suggests that mental imagery is related to higher order cognitive processes, and that it differs from visual perception. For example, Kosslyn et al., (1978, 1983) have found that people construct mental images by parts and congenitally blind individuals perform on imagery tasks in a fashion similar to how sighted individuals perform (Kerr, 1983). Thus, the relation between visual and imaginal experience continues to be puzzle.
Thus, in the following chapter I will provide theoretical and empirical evidence that imagery shares processing mechanisms with like‑modality perception.