4.7 The mental rotation and recognition of oriented objects
4.7.1. Definition of mental rotation
Mental rotation is the process of transforming the mental image of a three‑dimensional object to represent the same object as if seen from a different point of view. The shape is imagined as it would look in its canonical orientation (Rock, 1973), that is, with a viewer‑centered coordinate system replaced by a canonical, object‑centered coordinate system (Marr & Nishihara, 1978; Pinker & Finke, 1980)‑and only when such normalization is complete can recognition occur.
4.7.2. Empirical studies of mental rotation hypothesis
Mental rotation was first studied in a psychometric context, that is as an aspect of spatial ability. For example, it is the principal ingredient in the Space Test of Thurstone’s (1938) Primary Mental Abilities and is critically involved in two of the tests (Block Design and Visualization Memory) developed by Guilford and his colleagues within the “structure‑of‑intellect” model of intelligence (Guilford & Hoepfner, 1971).
Shepard (1971) and his co‑workers have developed the technique of studying mental rotation by measuring the reaction times of subjects making decisions in tasks where these decisions apparently require mental rotation of patterns.
In 1971 Shepard reported an experiment in which subjects were shown pairs of shapes of the kind displayed in Fig. 4.3.
FIGURE 4.3. Examples of the 3D stimulus pairs used by Shepard and Metzler (1971).
The members of each pair were either identical or were mirror images of one another. The two shapes were shown at different orientations, one being rotated with respect to the other through an angle that varied from 0 to 180 degrees. The task was to decide whether the two shapes were identical. Shepard found that the time taken to reach a decision increased linearly with the angular difference in orientation between the members of the pair being viewed and that the increase was the same whether one shape was rotated with respect to the other in the frontal parallel plane or around a vertical axis, that is to say, rotated in depth.
Shepard and Metzler (1971) interpreted this findings as suggesting that subjects would “imagine one object rotating in congruence with the other“, and that, subject to attention, they could do this at a certain rate without losing the essential structure of the rotated image. Shepard concluded that visual images can be rotated in the head and that it takes a constant time to rotate an image through a constant angle.
4.7.3. Mental transformations and visual comparison processes: Effect of complexity:
Shepard’s findings of increasing RT with angle of rotation have been replicated several times with experiments such as Suziki and Nakata (1988) who found that mean RTs obtained for identical pairs increased linearly with angular difference between the figures, and Corballis and McLaren (1984) found that RT increased sharply with the angular departure of each image from the designated normal upright orientation.
Shepard allows that the time may depend on the familiarity of the shape. He found that when two identical three‑dimensional shapes in different orientations are projected in succession on the eye, the observer unwillingly sees a continuous rotation of the shape from one orientation to the other, and that the time for rotation through a unit angle is much less than when a mental image of a shape is knowingly rotated. Shepard found also that complexity does not affect rotation time. Other experiments indicate that it does. For instance, Hochberg and Brooks (1960) defined complexity according to the number and size of angles and lines in a drawing. However, they did not use these measures of complexity to characterize perceived shape directly. Such complexity measures were used to predict judgement of tridimensionality of 2D line drawings of reversible‑perspective figures.
Although Shepard and Cooper (1982) argued that evidence for the effects of stimulus complexity on transformation times was inconclusive, such evidence has been found more recently (Bethell‑Fox & Shepard, 1988; Yuille & Steiger, 1982). For example, Bethell‑Fox and Shepard found that encoding times as well as transformation times increased as the number of pieces in 2‑D patterns increased, although this effect diminished with practice. Stimuli were 2‑D patterns of filled‑in squares within a 3 x 3 matrix, when separated groups of squares were considered to be separate pieces.
Yuille and Steiger (1982) added blocks to 3‑D figures that were similar to those used by Shepard and Metzler (1971), and made some featural information redundant and therefore unnecessary for the discrimination task. They found that increased figural complexity slowed the rate of mental rotation, but unlike Bethell‑Fox and Shepard, they found virtually no effect of figural complexity on encoding time.
In general, the complexity of an object’s structure should influence both the time to encode it and the time to transform its representation. Most researchers who have explored the effects of stimulus structure on encoding and transformation times in a mental rotation task have used two‑dimensional (2‑D) stimuli (Cooper & Podgorny, 1976; Bethell‑Fox & Shepard, 1988; Hochberg & Gellman, 1977). Thus, data relevant to the effects of three‑dimensional (3‑D) structure are sparse.
Empirical and theoretical consideration suggest that representations of three‑dimensional objects should be more difficult to rotate mentally than representations of two‑dimensional objects. From a theoretical standpoint it seems reasonable that representations of two‑dimensional shapes would be rotated faster than representations of three‑dimensional shapes. In general, the rotation of a three‑dimensional objects entails the possibility that portions of the object may become occluded which lead to complex transformation. In addition, three‑dimensional representations have one more dimension than two‑dimensional representations, which adds to the amount of information in the representation. This added information should lead to a greater processing load. Perhaps the visual system responds to the fact that two‑dimensional shape not usually rotated in depth (out of the image plane), and thus are not subject to the complex occlusion transformations. In this case, a different and simpler type of mental transformation may be used to compare shapes than that used to compare three‑dimensional. Such demonstrations imply that the kind of structural manipulations that influence performance on mental rotation may favour one or another scheme for representing 3‑D structure (e.g., Biederman, 1987; Hinton, 1979; Kosslyn, 1981; Marr & Nishihara, 1978; Rock, 1973; Tarr & Pinker, 1990; Lawrence & Friedman, 1994).
Kosslyn and Shwartz’s (1977, 1978) simulation model of mental imagery includes two levels of representation: a) a surface representation, b) a deep representation (Kosslyn & Shwartz, 1977, 1978; Kosslyn, 1980). The surface representation is a functional space in which two‑dimensional images are maintained in active memory. The deep representation is a long‑term memory representation in which shape information, stored in list format, is catalogued and interfaced with a verbal/propostional network. According to this model, the rotation of a two‑dimensional shape in the image plane is achieved entirely at the level of the surface representation by a single operator called ROTATE. The rotation of a three‑dimensional shape could be achieved in one of two ways. One approach is to postulate a third dimension in the surface representation and a new “ROTATE” operator capable of rotating three‑dimensional representations in an arbitrary direction (Kubovy & Wearing, 1982; Shepard & Metzler, 1971). In another approach, the surface representation can be maintained as a strictly two‑dimensional space. In this model, the two‑dimensional surface image could be periodically regenerated from a new vantage point during the rotation of a three‑dimensional image, using a more complex set of processes for the rotation of a three‑dimensional representation than the single ROTATE operation available for the rotation of a two‑dimensional surface image. Thus, according to the model of Kosslyn and Shwartz (1977, 1978, representations of three‑dimensional shapes should be more difficult to rotate than representations of two‑dimensional shapes.
Marr and Nishihara (1978) also postulate two distinct levels of representation similar to that postulated by Kosslyn and Shwartz (1977, 1978). One level, called the 2.5‑D sketch, represents visual information as seen from a particular vantage point. This representation contains information about the relative depth and orientation of surface elements in the depicted scene with respect to the viewer. The 2.5‑D sketch, however, does not contain any information about parts of the object that are not in view. The other level of representation, called the 3‑D sketch, represents the full three‑dimensional structure of objects without reference to any particular vantage point. Kosslyn and Shwartz’s “deep” representation resembles a 3‑D or “spatial” representation, with its coordinate systems each centered on the represented object. Their “surface” display” resembles a 2.5‑D or “perspective” representation, with a single viewer‑centered representation; the 2.5‑D sketch does not contain information about occluded parts of an object. Thus to perform a mental rotation of a three‑dimensional shape, it might be the 2.5‑D sketch updated from information stored in a 3‑D sketch, along with a changing specification of vantage point. Thus, similar to Kosslyn‑Shwartz frame work the rotation of a two‑dimensional shape in the image plane should be computationally easier than the rotation of three‑dimensional shape in depth.
4.7.4. Some Others Issues in Rotated Object Recognition
There are differences other than the dimensionality of stimuli between the experiments reported in the literature that could affect the rate of mental rotation. For example, different subjects participated in the different experiments; also, in some experiments the shapes were present orientation‑independent descriptions (e.g., Jolicoeur and Millikan, 1989; Tarr and Piker, 1989).
Despite all these findings, the work on mental rotation has given rise to two controversies, one experimental, the other theoretical.
The main experimental issue is whether the speed of mental rotation is constant regardless of the complexity of the shape.
4.7.5. The Rate of Mental Rotation of 3D vs 2D Representations
Models of visual information representations have tried to explain why the mental rotation rate for representations of three‑dimensional objects are slower than representations of two‑dimensional objects.
One approach in the literature assume that if mental rotation involves, the magnitude of the orientation effect on identification for patterns seen for the first time was equivalent to that found in a mental rotation. There is empirical evidence that mental rotation is indeed invoked in certain judgements about disoriented shapes.
In the classic study conducted by Cooper and Shepard (1973), subjects were timed as they decided whether rotated alphanumeric characters were normal or backward (mirror reversed) and found that their RTs increased significantly with the angular departure of the characters from the normal upright orientation. The authors explained that the subject prior to deciding the orientation of characters used a mental rotation process. More recent work suggests that the immediate recognition of known patterns such as alphanumeric characters or line drawings of objects is also sensitive to pattern disorientation (e.g. Corballis, Zbrodoff, Shetzer, & Butler, 1978;Jolicoeur, 1985, Jolicoeur & Landau, 1984; Jolicoeur, Snow & Murray, 1987). However not all decisions about rotated shape require mental rotation. For example, White (1980) conducted a RT task to name rotated alphanumeric characters which does not show significant dependence on angular orientation that is usually taken to imply mental rotation, although recognition may not be wholly independent of orientation (Jolicoeur & Landau, 1984).
The second more important issue is the theoretical dispute, which has to do with the nature of the brain’s representation of the visual world. Shepard argues that this representation is an internal imagery space with analogical properties. However, Snsake Shi (1981) claimed that experiments such as Shepard’s do not support either of the two theories of analogical and propostional; firstly, the analogue or continuous nature of internal representations and its transformation, and secondly, the resemblance or analogy between imagery and perceptual processes. He bases his arguments on the fact that “methodological problems arise concerning the relations among introspections, instructions and behaviour measurements”.
At this point it is difficult to reconcile the opposing sides in this dispute because first, it is difficult to build a theory of object recognition on an analogical representation and both introspection and experiment suggest that vision breaks objects into parts. Second, it is also difficult to envisage how an analogue description could be stored let alone rotated in the nervous system. Thus for Shepard’s results, it is possible we can only perform operations on propostional descriptions that correspond to the ways in which objects can be transformed in the real world. Since a real object can not rotate through a given angle except by passing through intermediate angles, it may be that the transform of a propostional description that correspond to rotation can only be performed in small steps.
Despite the findings of the previous evidence which suggests the importance of mental rotation in certain judgements about disoriented shapes, it is hard to understand how one could mentally rotate an unrecognized shape to canonical or upright orientation, because in the absence of recognition one could hardly know what the shape’s canonical orientation was. This argument is quite complicated in the case of three‑dimensional shapes, since parts of the shape would be hidden from view and as a consequence would have to be exposed if the shape were rotated in depth. An alternative form of the perceptual representation of shape is a structural description.
The following section examine the idea that perceptual representation of object is in the form of a structural description.
4.8. Structural Descriptions of Objects
Structural descriptions are symbolic descriptions of the features of a pattern and their spatial arrangements, which have proved useful in more recent approaches to the pattern and object recognition.
A psychological theory based round the idea of structural descriptions assumes that previously encountered objects are mentally represented as structural descriptions. New objects are themselves converted into structural descriptions in order to be recognized.
A structural description of any given object is then, achieved by breaking down an object into parts; the structure of each part and the relation between the parts are specified in terms of lower level entities, like lines and edges, and relationships, like “above” and “to the right of”.
Descriptions of this sort are called propositional because they contain entities, relations and properties, though the representation of these elements is not of course verbal. Structural descriptions are easier to apply to object recognition than “templates” or “feature” representations. A photograph of an object (or a “retinal image” of an object)
can be described by a series of structural descriptions at increasing levels of abstraction from the original intensity distribution.
4.8.1.Object decomposition into parts approach
A recent example of structural description applied to three‑dimensional objects is Biederman’s (1985) theory of recognition by components (RBC) [ for review see pp. 55]. According to this scheme, objects are described in terms of small set of primitive parts called “geons”. These primitives are similar to generalized cylinders used by Binford (1971), Marr and Nishihara (1978). They include simple three‑dimensional shapes such as boxes, cylinders, and wedges. More complex objects are described by decomposing them into their constituent geons, together with a description of spatial relations between components. Biederman proposes that a menu of 36 geons is sufficient to describe all objects and that geons can be identified by use “nonaccidential” properties such as curvature, symmetry, parallelism and cotermination.
Hoffman and Richards (1984) point out that many objects cannot be described in terms of generalized cones. The problem here is one of the choice of primitives rather than use of parts in object descriptions per se. Therefore, Hoffman and Richards advocate the use of a boundary‑based approach rather than a primitive‑based approach to identify parts for recognition. Hoffman and Richards outline rules for decomposing objects into parts which exploit extremes of curvature in surfaces. Although the rules proposed provide a natural heuristic for extracting parts, no means of describing the shape of the parts is provided. A primitive‑based approach might still be required to describe parts.
However, structural descriptions do not constitute a theory of pattern recognition, because we would need to specify more fully how the features and their arrangements were encoded from an image, and the nature of the matching algorithm enabling recognition to occur. For descriptions to be useful for recognition, they should be based on an external reference frame, not on an internal, viewer‑centred frame.
- Flawed Study Says Boys Like Trucks Because of Hormones (unaskedadvice.wordpress.com)