Meta-Analysis of Theory-of-Mind Development: The Truth about False Belief Author(s): Henry M. Wellman, David Cross, Tulanne Watson Source: Child Development, Vol. 72, No. 3 (May - Tun., 2001), pp. 655-684 Published by: Blackwell Publishing on behalf of the Society for Research in Child Development Stable URL: http://www.jstor.Org/stable/l 132444 Accessed: 07/09/2008 19:35 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=black. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org. http://www.jstor.org Child Development, May/June 2001, Volume 72, Number 3, Pages 655-684 Meta-Analysis of Theory-of-Mind Development: The Truth about False Belief Henry M. Welltnan, David Cross, and Julanne Watson Research on theory of mind increasingly encompasses apparently contradictory findings. In particular, in initial studies, older preschoolers consistently passed false-belief tasks—a so-called "definitive" test of mental-state understanding—whereas younger children systematically erred. More recent studies, however, have found evidence of false-belief understanding in 3-year-olds or have demonstrated conditions that improve children's performance. A meta-analysis was conducted (N = 178 separate studies) to address the empirical inconsistencies and theoretical controversies. When organized into a systematic set of factors that vary across studies, false-belief results cluster systematically with the exception of only a few outliers. A combined model that included age, country of origin, and four task factors (e.g., whether the task objects were transformed in order to deceive the protagonist or not) yielded a multiple R of .74 and an R2 of .55; thus, the model accounts for 55% of the variance in false-belief performance. Moreover, false-belief performance showed a consistent developmental pattern, even across various countries and various task manipulations: preschoolers went from below-chance performance to above-chance performance. The findings are inconsistent with early competence proposals that claim that developmental changes are due to tasks artifacts, and thus disappear in simpler, revised false-belief tasks; and are, instead, consistent with theoretical accounts that propose that understanding of belief, and, relatedly, understanding of mind, exhibit genuine conceptual change in the preschool years. INTRODUCTION "Theory of mind" has become an important theoretical construct and the topic of considerable research effort. Theory of mind describes one approach to a larger topic: everyday or folk psychology—the con-strual of persons as psychological beings, interactors, and selves. The phrase, theory of mind, emphasizes that everyday psychology involves seeing oneself and others in terms of mental states—the desires, emotions, beliefs, intentions, and other inner experiences that result in and are manifested in human action. Furthermore, everyday understanding of people in these terms is thought to have a notable coherence. Because actors have certain desires and relevant beliefs, they engage in intentional acts, the success and failure of which result in various emotional reactions. Whether or in what sense everyday psychology is theorylike is a matter of contention (see, e.g., Gopnik & Wellman, 1994; Nelson, Plesa, & Henseler, 1998). Regardless, the phrase, theory of mind, highlights two essential features of everyday psychology: its coherence and mentalism. How, when, and in what manner does an everyday theory of mind arise? This question has generated much current research with children. The question has been investigated using a variety of tasks and studies that focus on various conceptions within the child's developing understanding, for example, conceptions of desires, emotions, beliefs, belief-desire reasoning, or psychological explanation, among others (see, e.g., Astington, 1993; Flavell & Miller, 1998; Well-man, 1990). From the earliest research, however, a central focus has been on children's understanding of belief, especially false belief. Why? Mental-state understanding requires realizing that such states may reflect reality and may be manifest in overt behavior, but are nonetheless internal and mental, and thus distinct from real-world events, situations, or behaviors. A child's understanding that a person has a false belief—one whose content contradicts reality— provides compelling evidence for appreciating this distinction between mind and world (see, e.g., Dennett, 1979). A now classic false-belief task presents a child with the following scenario (Wimmer & Perner, 1983): Maxi puts his chocolate in the kitchen cupboard and leaves the room to play. While he is away (and cannot see) his mother moves the chocolate from the cupboard to a drawer. Maxi returns. Where will he look for his chocolate, in the drawer or in the cupboard? Four- and 5-year-olds often pass such tasks, judging that Maxi will search in the cupboard although the chocolate really is in the drawer. These correct answers provide evidence that the child knows that Maxi's actions depend on his beliefs rather than simply the real situation itself, because belief and reality © 2001 by the Society for Research in Child Development, Inc. All rights reserved. 0009-3920/2001/7203-0001 656 Child Development diverge. Many younger children, typically 3-year-olds, fail such tasks. Instead of answering randomly, younger children often make a specific false-belief error—they assert that Maxi will look for the chocolate in the drawer to which it was moved. These findings are empirically intriguing; it is striking that young children would make such a provocative error. They are also theoretically important in that they support a developmental hypothesis that children's theory of mind undergoes a major conceptual change in early life. To be clear, the claim is not that young children know nothing of mental states, but that they fail to understand representational mental states. One way of depicting this claim is presented in Figure 1. A simple understanding of a state such as desire could involve construing the desirer as having an internal, subjective urging for an external state of affairs. But an everyday understanding of belief requires the notion that the person has a representation of the world, the contents of which could be, and in the case of false beliefs are, quite different from the contents of the world itself. Thus, the change involved has been characterized as a shift (1) from a situation-based to a representation-based understanding of behavior (Perner, 1991), (2) from a connections to a representational understanding of mind (Flavell, 1988), or (3) from a simple desire to a belief-desire naive psychology (Wellman, 1990). In short, as revealed in part by successful false-belief reasoning, the claim Desire (wants an apple) Belief (thinks that that is an apple) Figure 1 Graphic depiction of how the person on the right construes the desire (top) or belief (bottom) of the person on the left. is that older children understand that people live their lives in a mental world as much as in a world of real situations and occurrences. False-belief performance has come to serve as a marker for mentalistic understanding of persons more generally. Thus, in research on individual differences in young children's social cognition, false-belief performances are used as a major outcome measure to assess the influence of early family conversations (Dunn, Brown, Slomkowski, Tesla, & Youngblade, 1991), engagement in pretend play (Youngblade & Dunn, 1995), or family structure (Hughes & Dunn, 1998; Perner, Ruffman, & Leekam, 1994) on development of mentalistic understandings. False-belief understanding has also become a major tool for research with developmentally delayed individuals. The theory-of-mind hypothesis for autism, in particular, claims that the severe social disconnectedness evident in even high-functioning individuals with autism is due to an impairment in their ability to construe persons in terms of their inner mental lives (see, e.g., Baron-Cohen, 1995). False-belief performances provided an initial empirical test of this claim in that high-functioning children with autism who are able to reason competently about physical phenomena often fail false-belief tasks, whereas Down syndrome and other delayed populations of equivalent mental age often do not (e.g., Baron-Cohen, Leslie, & Frith, 1985; for more comprehensive findings and comparisons, see Happe, 1995; Yirimiya, Erel, Shaked, & Solomonica-Levi, 1998). For various reasons, therefore, a considerable body of research has accumulated, which employ an increasing variety of false-belief tasks that focus on attempting to demonstrate and explain false-belief errors, as well as relate performance on false-belief tasks to other conceptions, tasks, and competencies. Theory-of-mind research goes well beyond this task and these data; nonetheless false-belief tasks have a central place in current social-cognitive research (see Flavell & Miller, 1998), much as conservation tasks once were focal for understanding cognitive development and for testing Piaget's findings and theorizing. For the case of false belief, just as in the conservation literature, the initial accounts, the initial tasks, and especially the claims of conceptual change have all been vigorously challenged. In particular, research on false belief instantiates a basic conundrum in the study of cognitive development. Performance on any cognitive task reflects at least two factors: conceptual understanding required to solve the problem ("competence") and other non-focal cognitive skills (e.g., ability to remember the key information, focus attention, comprehend, and an- Wellman, Cross, and Watson 657 swer various questions) required to access and express understanding ("performance"). The last 25 years of cognitive development research have produced a plethora of early competence studies and accounts essentially showing that on various tasks young children fail not because they lack the conceptual competence, but rather because the testing situation was too demanding or confusing. This research has had several desirable results: undeniable under-estimations of young children's knowledge have been exposed; information-processing analyses of how children arrive at their answers and responses, and not just what answers or responses they make, have flourished; and domain-general accounts of cognitive competence have yielded to more precise domain-specific understandings of children's conceptions and skills. At the same time, however, accepted demonstrations of conceptual change have largely disappeared. This is curious inasmuch as the interplay between cognitive change and stability is the cornerstone of all major theories of cognitive development. Yet each proposed developmental change (e.g., Piaget's conservation competence, Carey's proposed shift from naive psychology to naive biology, false-belief understandings) has seemingly evaporated in the mist of task variations showing enhanced performance in still younger children. Conceivably, conceptual change may indeed be rare. The contemporary re-emergence of strongly na-tivist perspectives on cognitive development both contributes to and derives from this possibility (e.g., Spelke, 1994). On the other hand, genuine conceptual changes may be obscured by current emphasis on early competence and task simplifications, making it difficult to comprehend the bigger picture amidst the haze of accumulating results from numerous task variations. A comprehensive analysis of the voluminous and varied false-belief research provides an important contemporary opportunity to examine this basic issue. Empirically, an increasing number of researchers now claim that the original false-belief tasks are unnecessarily difficult and that 3-year-olds can evidence improved, even above-chance, false-belief reasoning if the tasks are suitably revised (e.g., Siegal & Beattie, 1991; Sullivan & Winner, 1993). Not only the correct estimation of 3-year-olds is at issue, but more importantly, basic developmental trends. Some authors now claim that 3-year-olds, and much younger children as well, understand belief and false belief (e.g., Chandler, Fritz, & Hala, 1989; Fodor, 1992). False-belief competence, assessed correctly, is thus predicted to be high even in young children. Other authors (e.g., Robinson & Mitchell, 1995) claim that many 3-year- olds fail false-belief tasks but claim many 4- and 5-year-olds do as well. "The most striking thing about the age trends was the lack of them . .. Quite simply, it has become fashionable to claim that there is a sharp age trend, but in fact there is not" (Mitchell, 1996, pp. 137-138). These issues, controversies, and theories, along with the increasing amount of empirical findings, mandate a careful review. Indeed, several qualitative reviews have been presented, both several years ago when the number of focal studies were modest (Fla-vell, 1988; Perner, 1991; Wellman, 1990) and more recently as the studies and discrepancies have grown (Leslie, 2000; Mitchell, 1996; Taylor, 1996). But these reviews have come to different conclusions, and have failed to provide a compelling synthesis of all the data. We provide a quantitative review and integration of the findings, a meta-analysis of the data. A meta-analysis seems feasible and useful. The number of studies is now large; certainly sufficient for informative meta-analysis and, indeed, so large that considering all the studies one by one is difficult or impossible. In addition, although different qualitative reviews depict this database in contradictory fashions, many of the contradictions could be addressed by meta-analysis. Consider the essential dependent variable, false-belief performance as reported across studies. When studies show a mixture of above-, at-, and below-chance responding, as is the case for research on false belief, then a comprehensive pooling of the data across studies is needed to clarify the nature of children's performance. Of equal theoretical importance are possible independent variables, such as the ages of the children tested. In particular, investigations of false belief reasoning have increasingly differed according to a variety of task variables. The focal false-belief question has been asked in terms of action (where will Maxi look for his chocolate), in terms of thoughts (where does Maxi think his chocolate is), and in terms of speech (where will Maxi say his chocolate is). Sometimes the target object has been moved from its original location inadvertently, whereas at other times, the movement has been presented, emphatically, as a deliberate deception designed to fool or trick the protagonist. At various times the protagonist, Maxi, has been a puppet, a doll, a real person, or a video portrayal. Moreover, the change-of-locations task discussed earlier represents only one "standard" false-belief task among several. Another often-used task involves unexpected contents: Children see a crayon box, state it will have crayons inside, then open the box to find it is filled with candies. Then they are asked about someone else, Mary, who has never 658 Child Development looked inside the box, "What will Mary think is in here, crayons or candies?" A variation on unexpected-contents tasks are unexpected-identity tasks, in which a false belief is engendered by an object with a deceptive identity (e.g., a sponge that looks just like a rock). The variety of independent variables contributes to an increasingly confused picture. Again metaanalysis could be especially informative; sufficient variation exists on a number of independent variables across studies to allow for the examination of factors that have not been varied, or not comprehensively varied, within studies. Among the extended set of variables we will consider in this study, five deserve brief introduction because of their prominence in research and theory. The first variable is age. According to initial findings, false-belief performance changes dramatically from 3 to 5 years of age; this change may or may not remain when important task variations are accounted for. The second variable concerns deception—whether the child views the false-belief situation as happenstance or a deliberate ploy to trick the protagonist. Several authors claim that framing the task in terms of explicit trickery reduces or eliminates young children's errors (e.g., Chandler et al., 1989), although others claim it does not (e.g., Sodian, Taylor, Harris, & Perner, 1991). The third variable concerns salience. For example, framing the task in terms of trickery may be unimportant except that it serves to make mental state (being fooled or duped) more salient to the child in comparison with the alternative real state of affairs (where the chocolate really is). Salience issues have two separate aspects. Certain task manipulations arguably serve to enhance the salience of the protagonist's mental state; other manipulations serve to reduce the salience of the contrasting real state of affairs. The fourth variable of special import concerns whether the children are asked about someone else's false belief or their own. In a false-belief task for self (sometimes called a representational change task), a child may be shown the crayon box, and that it contains candies, and then asked, "Before you looked inside, did you think the box contained crayons or candies?" Different theoretical accounts make different predictions about the difficulty of false-belief judgments for self versus others (as detailed later in the Discussion section). Moreover, an empirically intriguing question is, do children understand beliefs first for their own case, for others, or both similarly? Finally, the national or community identity of the children tested needs to be considered, for example, whether children are from the United States, Austria, Japan, or a hunter-gatherer traditional African soci- ety. Understanding people in terms of mental states in general, and in terms of beliefs, more specifically, can be argued to be a universal folk psychological stance (e.g., Wellman & Gelman, 1998) or a specific manifestation of a particular Anglo-European individualistic interpersonal stance (e.g., Lillard, 1998). METHOD In a meta-analysis, conditions within studies, rather than individual participants or entire studies, comprise the unit of analysis (Glass, McGraw, & Smith, 1981). In our analyses, the proportion of children in a condition who demonstrated the target behaviors— correct false-belief judgments versus errors—was used as the essential dependent variable. (More precisely, we use the proportion of false-belief questions answered correctly, because many studies asked each child two or more false-belief questions.) A metaanalysis of such data is especially straightforward; many of the statistical problems that arise for other types of meta-analyses, which require the transformation and collapsing of various derived inferential statistics (e.g., F tests, t tests) (see Glass et al., 1981), can be avoided given this type of comparable descriptive data across studies. In addition, we have been able to rule out aberrations due to sampling by supplementing our ordinary least squares estimates with bootstrap estimates of parameters and their standard errors. The details of these analytic methods are presented below in the Results section. Here, we present information on the studies and conditions included in the analyses. Studies The total potentially relevant literature is large. Therefore, the first task was to search pertinent journals, review articles, databases, and chapters for studies on several related topics, including false belief, mental representations, theory of mind, understanding mental states, belief-desire reasoning, and folk or naive psychology. The references of each citation were searched for additional articles. The studies generated in this fashion were supplemented by a search through several conference abstracts and by soliciting information from a variety of known researchers in this field (e.g., P. L. Harris, A. Gopnik, S. Baron-Cohen, J. W. Astington, J. H. Flavell, M. Siegal, M. Chandler, K. Bartsch, J. D. Woolley, and C. Lewis). Our efforts to find relevant studies were concluded in January 1998; no studies published or sent after that time were included in the analyses. In retrospect, we have discovered that several studies published before Wellman, Cross, and Watson 659 1998 were omitted; given the size of the literature and its publication in a great many different journals this was inevitable. Many studies that were encountered initially were not considered further because they reported no false-belief tasks or conditions. Some studies or conditions were not included because they focused only on autistic or delayed samples; the focus in our study was on normally developing children. In total 38 articles were examined and excluded for these reasons. Not all studies or conditions purporting to include false-belief tasks with normally developing children could be included in the analyses. As already noted, different studies encompassed a wide variety of different task situations and questions. Most of these were reasonably straightforward, at least when sorted in relevant, measurable ways (e.g., deception versus none, puppets versus live protagonists). Some, however, were so different as to be irrelevant (e.g., conditions in which the real state of affairs was completely unknown, and hence, whether the protagonist's belief was false or true was indeterminable). Moreover, for our procedures we needed comparably reported data on the dependent measure (proportion of correct false-belief judgments) and we needed the data to be reported separately for children of various ages (e.g., 3- versus 4-year-olds) and for separate tasks (e.g., locations versus contents tasks). Several studies collapsed their reported findings together across age or task features. For a number of these, more detailed data was sought and received from the authors, but this was not always possible or clarifications were not always forthcoming. In total, our analyses encompassed 77 reports or articles including 178 separate studies and 591 conditions. A list of the number of studies and conditions appears in Table 1. These same 77 reports yielded 58 conditions that were excluded for the reasons described earlier (less than 10% of the conditions that were included). One other exclusion was made. False-belief tasks typically ask not only the false-belief question (e.g., Where does Maxi think his chocolate is?) but also one or more control questions (e.g., Where is the chocolate really? Where did Maxi put his chocolate?). Performance on these control questions varies, but typically even the youngest children do reasonably well, and often only children who answer the controls correctly are then included in the final data. Conditions in which fewer than 60% of the children answered the control questions correctly or in which 40% or more of the initial subjects were dropped were excluded from our analyses. A total of nine conditions were excluded for these reasons; we assumed that these con- ditions involved atypically difficult tasks or extremely confused children. In total, a large majority of false-belief research was examined, representative of both published and unpublished false-belief studies in the field as of January 1998. The analyses of the data provided by these studies proceeded in several stages. In each, different subsets of the total data were used, as clarified in the Results section. Coding Each condition included in the analyses was coded for the dependent variable (proportion of correct responses to the false-belief question) and for a variety of features constituting the following independent variables: 1. Year of publication. 2. Mean age and number of participants in a condition. 3. Percentage of participants passing control questions, and percentage dropped from the research. 4. Country of participants: for example, the United States, United Kingdom, Austria, Japan, and so forth. 5. Type of task: Three levels of task type, which distinguished locations versus contents versus identity tasks (as described earlier). 6. Nature of the protagonist: Five levels that described the protagonist as a puppet or doll; a pictured character; a videotaped person; or a real person present in, or absent from, the current situation. Protagonists who were real, present persons were then also coded as either the self or another person. 7. Nature of the target object: Four levels that described the target object as a real object (e.g., chocolate), a toy, a pictured object, or a videotaped object. 8. Real presence of the target object: Two levels denoting whether, at the time the false-belief question was asked, the target object was real and present (e.g., chocolate was in the drawer, the crayon box contained candies) or not (e.g., Maxi's chocolate was used up and thus now absent, the crayon box was empty and thus contained no real object). 9. Motive for the transformation: Two levels capturing whether the key transformation (e.g., the change of location or the substitution of unexpected contents) was done to explicitly trick the protagonist (deception), or for some other reason including for no explicit reason at all. 660 Child Development Table 1 Listing of the Studies and Conditions Included in the Meta-Analysis Primary Nonprimary Total Studies Conditions Conditions Conditions Studies Used in Included in Included in Included in Authors Year Reported Meta-Analysis Meta-Analysis Meta-Analysis Meta-Analysis Astington, Gopnik, and O'Neill 1989 2 2 8 0 8 Avis and Harris 1991 1 1 0 6 6 Baron-Cohen 1991 1 1 0 1 1 Baron-Cohen, Leslie, and Frith 1985 1 1 0 1 1 Bartsch 1996 2 2 3 3 6 Bartsch, London, and Knowlton 1997 2 1 1 0 1 Bartsch and Wellman 1989 2 1 1 0 1 Berguno 1997 1 1 3 0 3 Carlson, Moses, and Hix 1998 3 2 3 0 3 Carpendale and Chandler 1996 2 2 13 0 13 Chandler and Hala 1994 4 4 14 0 14 Chen and Lin 1994 1 1 2 2 4 Clements and Perner 1994 1 1 4 0 4 Custer 1996 1 1 0 2 2 Dalke 1995 2 2 12 2 14 Davis 1997 2 1 2 1 3 Flavell, Flavell, Green, and Moses 1990 4 2 2 0 2 Flavell, Mumme, Green, and Flavell 1992 4 2 2 2 4 Freeman and Lacohee 1995 6 6 31 6 37 Freeman, Lewis, and Doherty 1991 5 4 15 7 22 Fritz 1992 2 2 10 2 12 Frye, Zelazo, and Palfai 1995 3 2 12 0 12 Ghim 1997 5 4 7 1 8 Gopnik and Astington 1988 2 2 30 12 42 Hala and Chandler 1996 1 1 8 0 8 Hala, Chandler, and Fritz 1991 3 1 3 0 3 Happe 1995 1 1 0 2 2 Harris, Johnson, Hutton, Andrews, and Cooke 1989 3 2 0 5 5 Hickling, Wellman, and Gottfried 1997 2 2 4 0 4 Hogrefe, Wimmer, and Perner 1986 6 4 16 0 16 Johnson and Maratsos 1977 1 1 4 0 4 Kikuno 1997 1 1 2 0 2 Koyasu 1996 1 1 9 0 9 Lalonde and Chandler 1995 1 1 6 0 6 Leekam and Perner 1991 1 1 2 0 2 Leslie and Thaiss 1992 2 2 4 0 4 Lewis, Freeman, Hagestadt, and Douglas 1994 5 4 0 6 6 Lewis and Osborne 1990 1 1 18 0 18 Lillard and Flavell 1992 2 2 3 0 3 (Continued) 10. Participation in the transformation: Three levels describing whether the child initially helped to set up the task props, engaged in actively making the key transformation, or only passively observed the events. 11. Salience of the protagonist's mental state: Four levels describing whether the mental state had to be inferred from the character's simple absence during the key transformation, whether the character's absence was emphasized and explicitly noted, whether the false-belief experience was demonstrated initially on the chil- dren themselves (e.g., the child initially discovered that the crayon box contained candies), or whether the character's mental state was explicitly stated (e.g., the child was told "Maxi thinks it is in the cupboard") or pictured in some fashion (e.g., via a thought bubble). 12. Type of question: Four levels denoting whether the false-belief question was phrased in terms of where the protagonist would look (or some other belief-dependent action the character might take), what he'd think or believe, what he'd say, or what he'd know. Wellman, Cross, and Watson 661 Table 1 Continued Primary Nonprimary Total Studies Conditions Conditions Conditions Studies Used in Included in Included in Included in Authors Year Reported Meta-Analysis Meta-Analysis Meta-Analysis Meta-Analysis Mayes, Klin, and Cohen 1994 1 1 6 0 6 Mazzoni 1995 1 1 1 1 2 Mitchell and Lacohee 1991 3 3 7 0 7 Moore, Pure, and Furrow 1990 2 1 2 0 2 Moses 1993 2 2 0 10 10 Moses and Flavell 1990 2 2 4 0 4 Naito, Komatsu, and Fuke 1994 1 1 4 2 6 Perner, Leekam, and Wimmer 1987 2 2 9 5 14 Perner and Wimmer 1987 2 2 8 0 8 Phillips 1994 7 1 2 0 2 Riggs, Peterson, Robinson, and Mitchell 1998 4 2 2 0 2 Robinson and Mitchell 1992 5 2 3 0 3 Robinson and Mitchell 1994 3 3 5 0 5 Robinson and Mitchell 1995 6 4 6 2 8 Robinson, Riggs, and Peterson 1997 4 2 2 0 2 Robinson, Riggs, and Samuel 1996 3 3 8 0 8 Roth and Leslie 1998 2 2 4 6 10 Ruffman, Olson, Ash, and Keenan 1993 3 3 2 5 7 Russell and Jarrold 1994 3 3 4 4 8 Saltmarsh, Mitchell, and Robinson 1995 5 5 12 0 12 Seier 1993 1 1 0 1 1 Sheffield, Sosa, and Hudson 1993 1 1 4 0 4 Siegal and Beattie 1991 2 2 8 0 8 Slaughter and Gopnik 1996 2 2 3 1 4 Sullivan and Winner 1991 1 1 18 0 18 Sullivan and Winner 1993 1 1 12 0 12 Taylor and Carlson 1997 1 1 4 0 4 Vinden 1996 1 1 4 8 12 Wellman and Bartsch 1988 3 1 3 0 3 Wellman, Hollander, and Schult 1996 4 1 3 0 3 Wimmer and Hartl 1991 3 3 6 5 11 Wimmer and Perner 1983 4 3 16 0 16 Wimmer and Weichbold 1994 1 1 8 0 8 Winner and Sullivan 1993 1 1 6 0 6 Woolley 1995 2 2 7 1 8 Yoon and Yoon 1993 2 1 16 0 16 Zaitchik 1990 5 1 4 0 4 Zaitchik 1991 1 1 12 0 12 Totals 178 143 479 112 591 13. Temporal marker: Two levels capturing whether the false-belief question explicitly mentioned the time frame involved ("When Maxi comes back, which place will he look in first?") or not. Because the way in which an individual condition was to be coded on the above measures was not always clear-cut, two independent raters coded 94 conditions representing 20 studies. Agreement (agreements divided by agreements plus disagreements) ranged from 92 to 100%. Over all codings, reliability averaged 97%. On some variables (year of publication, country and motive for the transfor- mation) agreement was perfect. The two lowest reliabilities were 92% (for percent passing control items and for salience of the protagonist's mental state). All disagreements were resolved through discussion. RESULTS Primary Conditions As noted, the entire database for the meta-analysis contained information on 591 false-belief conditions. In the first wave of analyses, however, we included 662 Child Development only what we termed primary conditions. These were conditions in which (1) subjects were within 14 months of each other in age, (2) less than 20% of the initially tested subjects were dropped from the reported data analyses (due to inattention, experimental error, or failing control tasks), and (3) more than 80% of the subjects passed memory and/or reality control questions (e.g., "Where did Maxi put the chocolate?" or "Where is the chocolate now?"). Our reasoning was that age trends are best interpretable if each condition's mean age represents a relatively narrow band of ages; interpretation of answers to the target false-belief question is unclear if a child cannot remember key information, does not know where the object really is, or cannot demonstrate the verbal facility needed to answer parallel control questions. In most of the studies, few subjects were dropped, very high proportions passed the control questions, and ages spanned a year or less, so primary conditions included 479 (81%) of the total 591 conditions available. The primary conditions are enumerated in Table 1; they were compiled from 68 articles that contained 128 separately reported studies. Of the 479 primary conditions, 362 asked the child to judge someone else's false belief; we began our analyses by concentrating on these conditions. On average in the primary conditions, 3% of children were dropped from a condition, children were 98% correct on control questions, and ages ranged 10 months around their mean values. In an initial analysis only age was considered as a factor. As shown in Figure 2, false-belief performance dramatically improves with age. Figure 2A shows each primary condition and the curve that best fits the data. The curve plotted represents the probability of being correct at any age. At 30 months, the youngest age at which data were obtained, children are more than 80% incorrect. At 44 months, children are 50% correct, and after that, children become increasingly correct. Figure 2B shows the same data, but in this case the dependent variable, proportion correct, is transformed via a logit transformation. The formula for the logit is: logit = ln(r^), where "In" is the natural logarithm, and "p" is the proportion correct. With this transformation, 0 represents random responding, or even odds of predicting the correct answer versus the incorrect answer. (When the odds are even, or 1, the log of 1 is 0, so the logit is 0.) Use of this transformation has three major benefits. First, as is evident in Figure 2B, the curvilinear relation between age and proportion correct is T 0 I_I_pig Q Q I_I_I_I_I_I_I 30 40 50 60 70 80 90 100 110 Age (Months) B „_. * w w «« * o 30 40 50 60 70 80 90 100 110 Age (Months) Figure 2 Scatterplot of conditions with increasing age showing best-fit line. (A) raw scatterplot with log fit; (B) proportion correct versus age with linear fit. In (A), each condition is represented by its mean proportion correct. In (B), those scores are transformed as indicated in the text. straightened, yielding a linear relation that allows systematic examination of the data via linear regression; second, the restricted range inherent to proportion data is eliminated, for logits can range from negative infinity to positive infinity; and third, the transformation yields a dependent variable and a measure of effect size that is easily interpretable in terms of odds and odds ratios (see, e.g., Hosmer & Lemeshow, 1989). The top line of Table 2 summarizes the initial analysis of age alone in relation to correct performance Wellman, Cross, and Watson 663 Table 2 Summary of Meta-Analytic Results for the Primary Conditions Variables Main Effect Interaction with Age Effect Sizea Age: F(l, 360) = 229.15, p < .001 2.94 for 1 year Nonsignificant Year of publication F(l, 325) = = 2.65, p > .10 F(l, 324) = .96, p > .32 Type of task F(l, 359) = = .63, p > .42 F(l, 358) = .64, p > .42 Type of question F(3, 357) = = 1.96, p > .11 F(3, 354) = .81, p > .48 Nature of the protagonist F(4, 356) = = 1.66, p > .44 F(4, 352) = 1.15, p > .33 Nature of the target object F(3, 357) = = 2.49, p > .06 F(3, 354) = 2.35, p > .07 Self versus other F(l, 230) = = 1.77, p > .18 F(l, 229) = 3.10, p > .07 Main effects Motive F(l, 359) = = 14.27, p < .001 F(l, 358) = .30, p > .58 1.90 Participation F(2, 358) = = 5.91, p < .003 F(2, 356) = 1.25, p > .28 1.96 Real presence F(l, 359) = = 16.05, p < .001 F(l, 358) = .63, p > .42 2.17 Salience F(3, 357) = = 4.28, p < .006 F(3,354) = .98, p > .40 1.92 Country F(6, 345) = = 10.42, p < .001 F(6, 359) = 1.04, p > .40 Australia versus United States = 2.27 United States versus Japan = 1.48 Interaction Temporal marker F(l, 358) = = 5.45, p < .02 F(l, 358) = 7.57, p < .006 a Effect sizes are presented only for significant variables. Effect sizes were computed in odds ratios and represent the increased odds of being correct given a facilitating value of a variable, as explained in the text. (transformed via the logit transformation). The far right column presents the measure of effect size: the odds of being correct increase 2.94 times for every year that age increases. At about 44 months of age, children are about 50% correct, or at 0 on the transformed measure; hence, at that age the odds of being correct or incorrect are even or 1.0 (Figure 2B). The effect size measure in Table 2 indicates the increase in the odds of being correct for a more advantageous value of a variable. A 2.94 increase for age means that for children who are 1 year older (i.e., 56 rather than 44 months of age) percent correct would be 74.6%. In terms of months, the effect size for age is 1.09 per month, or an increase from 50% to 52% correct from 44 to 45 months of age. A meta-analysis must be guided by an organized set of questions and hypotheses. Our analyses were originally directed toward evaluating several baseline models of children's performance in false-belief tasks. The simplest model is a null model predicting random performance on false-belief tasks. Random performance at some age might suggest that children are confused, have few relevant systematic conceptions, or that the tasks tap little of children's understandings. Predictions of random performance contrast clearly with predictions of significantly below-chance performance (systematic false-belief errors) and significantly above-chance performance (systematic correct understanding of false beliefs). Developmen-tally, a baseline model predicts no developmental change—for example, all ages evidence essentially random performance, or all evidence similar above-chance performance. A central question for evaluating these baseline models is whether and where children's performance significantly differs from chance. Ninety-eight percent of all false-belief conditions in the studies sampled used two locations (Maxi's chocolate might be in the drawer or the cupboard), or two limited identities (the crayon box usually contains crayons but now contains candies), thus chance responding represents 50% correct (or a score of 0 in the transformed data). We calculated a 95% confidence band around the best-fit line shown in Figure 2B and then determined where performances within this band fell below or above 0. These confidence bands are also depicted in Figure 2B.1 At younger ages—essentially 41 months (3 years, 5 months) and younger—children performed below chance, making the classic false-belief error. At older ages—essentially 48 months (4 years) and older—they performed above chance, significantly correct. As shown in Figure 2, then, average per- lThe confidence bands, here and throughout the analyses, are natural extensions of confidence intervals for the population mean, with the exception that confidence limits are formed for the conditional mean of the dependent variable at different combinations of the independent variables. These are simultaneous confidence bands that contain the error rate at .05, no matter how many combinations of independent variables are investigated (Draper & Smith, 1981). 664 Child Development formance changes rapidly during the period 3 to i¥t years from significantly wrong to significantly correct. This age trend constitutes the foundation for analyses of the influence of various factors on performance in which we examined the effects of the 12 remaining independent variables (e.g., year of publication, type of task, nature of the target object, and so forth, as overviewed in the Method section). Again, consider in advance various patterns of results that might emerge. As depicted in Figure 3A, one pattern of results is that some additional variable (e.g., type of task—a comparison between locations versus contents tasks) would have no effect on the foundational age trajectory. Alternatively, as shown in Figure 3B, that variable may be significant, but only as a main effect that does not interact with age. That is, differences on that variable affect performance similarly at all ages. Finally, as depicted in Figure 3C, a focal variable could interact significantly with age. A variety of patterns might yield significant interactions with age, but the one shown in Figure 3C is especially relevant for discriminating early competence accounts from claims of conceptual change. To confirm early competence claims, task modifications should influence younger children's performance (and thus account for their poor performance relative to older children), and, in particular, some task version(s) should increase young children's responding to above-chance levels. We tested for such patterns as follows. First, linear regression was used to screen for main effects and all two-way interactions with age. If interactions with age were absent, the interaction term was dropped from the regression and main effects were then tested. For those variables shown to have a significant effect, confidence bands were constructed around the obtained regression lines to determine where these lines differed from chance. Finally, these primary analyses were corroborated both by using more assumption-free bootstrapping methods and by examining the entire omnibus dataset of 591 conditions. Due to the large database, a conservative .01 level for significance was generally adopted. Because interaction effects are of particular theoretical importance, however, any interactions at the .05 level were considered significant as well. As shown in Table 2, six variables did not significantly affect performance, neither by themselves nor in interaction with age; five were significant but failed to interact with age; and one significantly interacted with age. Nonsignificant variables. Given 362 conditions, a failure to find significance with such a powerful test is noteworthy. For example, although year of publication No effect Main effect Interaction young old young old young 0id Figure 3 Three hypothetical patterns of results: (A) No effect, (B) main effect, and (C) interaction. is of no theoretical interest, it provides a meta-analytic check for the possibility that an initial intriguing finding can shrink or disappear as later experiments are devised with better controls and procedures (Green & Hall, 1984). Some authors contend that early reports of young children's false-belief errors are plagued with just such problems. However, the meta-analysis shows that false-belief results from earlier studies are virtually identical to those from later studies. More important, however, is that type of task, type of question, nature of the protagonist, and nature of the target object were all nonsignificant. To confirm graphically the absence of some of these variables as significant factors, Figure 4 shows the data for type of question and type of task. Type of question concerns whether the false-belief question is phrased in terms of what the character will think, know, or say, or where he will look. As can be seen in Figure 4A, these variations make little difference. As noted, false-belief tasks come in three essential forms: change-of-location tasks, unexpected-contents tasks, and unexpected-identity tasks. In fact, unexpected-identity tasks are rare (used in only 44 of 362 conditions), and preliminary analysis showed that they did not differ from unexpected-contents tasks, which they closely resemble. Therefore, unexpected-identity tasks were collapsed with unexpected-contents tasks and Figure 4B thus plots the data for these two main task types, locations versus contents (including identity). As shown, these task types are remarkably equivalent. In addition, the "medium" in which the false-belief task is presented has no significant effect. That is, it makes no difference if the protagonist is presented as a real person, a puppet, a doll, a pictured storybook character, or a videotaped person. Similarly, as long as a concrete target object is present at the time the false-belief question is asked, it makes no difference whether the object is a real item (a piece of edible chocolate), a toy (a small toy car), a picture of an object (a drawing of a piece of chocolate), and so forth. Most studies maintained concordances across protag- Wellman, Cross, and Watson 665 bo O o u c o o a. o 0 --1 --2 -3 -4 -5 1 1 III! \/A - y - Type of Question Know - _ Look _ Say ----------- 1 1 Think - -1 1 1 1 1 1 30 40 50 60 70 80 Age (Months) 90 100 110 B bo O o u c o o a. o 5 — 4 -3 -2 -1 -0 --1 --2 --3 --4 - ~i i-r Type of Task O Contents -x Location - J_ 30 40 50 60 70 80 90 100 110 Age (Months) Figure 4 (A) Proportion correct versus Age X Type of Question, and (B) proportion correct versus Age X Type of Task. onist and object. If a real person was used as the protagonist, then a concrete object or toy object was used as the object. Likewise, a doll or puppet protagonist was matched with a toy object and a drawing of a girl was matched with a drawing of a piece of chocolate. At least within these loose confines, children do not reason better or worse about real people versus dolls, or about real chocolate versus pictured chocolate. Self versus other. The analysis for nature of the protagonist, described thus far, included only conditions in which the protagonist was someone else. Conditions in which the false-belief question was asked about the child's own belief (self-conditions) were excluded for all the analyses summarized at the top of Table 2, with the exception of the line labeled self versus others. To compare whether false-belief judgments for self differ from those for other people, it was not appropriate to simply compare all self-conditions to all other conditions-because self-questions were asked for only a limited number of tasks. All self-questions are about a person—the child (rather than, e.g., a puppet or doll); all self-tasks are content or identity tasks, not location tasks. By their nature, all self-tasks require the child to experience an initial demonstration (e.g., to say a crayon box will contain crayons and then see that it contains candies instead). To insure comparability, therefore, the 117 self-conditions were compared with the 118 other-conditions in which the protagonist was a person, the task was a content or identity task, and the child viewed an initial demonstration of or participated actively in making the key transformation. As Figure 5 shows, in this analysis of 235 conditions, children's correct responses to false-belief questions for self versus other did not differ, and were virtually identical at the younger ages. Figure 5 also shows that the essential age trajectory for tasks requiring judgments of someone else's false belief is paralleled by an identical age trajectory for children's judgment of their own false beliefs. Young children, for example, are just as incorrect at attributing a false belief to themselves as they are at attributing it to others. Main effects. Five variables were significant as main effects, but did not interact with age. In essence, these factors can enhance children's performance, but in 5 1 1 1 1 . o, ' 1 ' ' 4 - 3 '5b 0 2 rrect 1 o U 0 rtion -1 - [odoj -2 is Self/Other -3 - x Self ---- -4 - O Other - -5 1 "I 1 I 1 ! 1 1 30 40 50 60 70 80 90 100 110 Age (Months) Figure 5 Proportion correct versus Age X Self/Other. 666 Child Development doing so they leave the underlying developmental trajectory (from incorrect to correct performance) unchanged. Figure 6 graphically depicts the effects for motive for the transformation ("motive")/ participation in the transformation ("participation"), salience of the protagonist's mental state ("salience"), and real presence of the target object ("real presence"). Motive concerned the motive presented to the child for the change of location or the unexpected contents. Sometimes a deceptive motive was explicitly stated (the chocolate was moved to trick the protagonist), sometimes it was not (either because no motive was stated—the crayon box just contained candies, or, rarely, a nondeceptive motive was mentioned—to put them away). As is clear in Figure 6A, a deceptive motive enhances children's performance, and does so for children of all ages. Table 2 presents F values for all the variables as well as effect sizes for significant variables. To reiterate, just as for age, this odds ratio measure indicates the increase in the odds of being correct for a more advantageous value of the variable: for example, for motive, the increase in odds of being correct for tasks with deceptive motives over those without deceptive motives. As shown in Table 2, this value for motive is 1.90, that is, a deceptive motive increases the odds of being correct by 1.90. In percentage terms, if children are 50% correct at 44 months without deception, then they are 66% correct with deception. Participation by the child in transforming the target object is also important. Often children are essentially passive onlookers; for example, they watch as someone transfers Maxi's chocolate from one place to another. However, in some tasks children help set up B Motive Deception X---- No Deception 50 60 70 80 90 100 110 Age (Months) 40 50 60 70 80 90 100 110 Age (Months) bo o I—I o U c o o sx o Participation Initial Passive Transform 30 40 50 60 70 80 90 100 110 Age (Months) D be o ►J o U c o o o Real Presence No Yes 30 40 50 60 70 80 90 100 110 Age (Months) Figure 6 (A) Proportion correct versus Age X Motive; (B) proportion correct versus Age X Participation; (C) proportion correct versus Age x Salience; (D) proportion correct versus Age X Real Presence. Wellman, Cross, and Watson 667 or manipulate the initial task situation or story props. Further, in some tasks the children themselves make the essential transformation, for example, moving Maxi's chocolate, or taking crayons out of a crayon box and putting in candies instead. As shown in Figure 6B, this last type of engagement—actively making or helping make the crucial transformation— positively influences children's performance at all the ages tested. As shown in Table 2, actively making the transformation increases the odds of being correct 1.96 over a baseline of being a passive onlooker. In percentage terms, if at 44 months children who are passive onlookers are 50% correct, then children who are actively involved in transforming the task materials are 66% correct. The variables real presence and salience both arguably influence the extent to which the task focuses on mental-state information. Again, consider false-belief tasks as providing information about two realms of content: real-world contents and mental-state contents. One might attempt to make mental state more focal either indirectly by diminishing the salience of the contrasting real-world contents, or directly by enhancing the salience of mental states. The variable real presence describes the status of the target object at the time the false-belief question is asked— whether the true state of affairs is instantiated by a real and present object (e.g., Maxi's chocolate in the cupboard, or the candies in the crayon box) or not (e.g., Maxi's chocolate was removed from the drawer and eaten, and, thus, is not now real and present). As shown in Figure 6D, if the object is not real and present, children are more likely to answer correctly. With an effect size of 2.17, if children at 44 months are 50% correct with real and present objects, then they are 68% correct if the object is not real and present. Because this variable does not interact with age, however, both values of real presence leave the basic age trajectory unchanged. Moreover, even when there is no real and present object, the younger children do not perform at above-chance levels; 95% confidence bands around that line substantially overlap with zero. Thus, the youngest children may move from below-chance performance to at-chance performance, but only older children move to above-chance performance. Comparably, salience of the protagonist's mental state is also significant. In this case, as shown in Figure 6C, most task variations are equivalent. For example, it does not matter if the protagonist is merely absent when the transformation is made and the mental state must be inferred, whether the character's absence is emphasized, or if the false-belief situation and experience is initially demonstrated on the children themselves (e.g., they discover the crayon box contains candies). Yet, if the protagonist's belief itself is clearly stated or pictured, this significantly raises performance—for example, younger children move from below-chance to at-chance performance. Even if mental states are stated or pictured, however, young children do not achieve above-chance performance and the basic age trajectory remains the same. The 44 conditions included as stated or pictured are worth "unpacking" further. In one type of stated or pictured condition, the protagonist's belief is stated (e.g., "Maxi thinks his chocolate is in the drawer.") and then the false-belief question asks where he will look (e.g., "Where will Maxi look for his chocolate?"). To be correct the child must at least recognize the implication of Maxi's thought for his behavior in a situation in which behaving according to belief means Maxi will search for his chocolate where it isn't. In a second type of stated or pictured condition, the protagonist's belief is again stated (e.g., "Maxi thinks his chocolate is in the drawer.") and the false-belief question asks what Maxi thinks (e.g., "Does Maxi think his chocolate is in the drawer or the cupboard?"). Arguably, in this type of task children could respond correctly by simply repeating back the earlier belief statement ("Maxi thinks his chocolate is in the drawer."). Finally, a third type of stated or pictured condition shows the protagonist's belief in pictorial form. For example, at the time Maxi puts his chocolate in the drawer the child is asked to choose a picture showing where the chocolate is at this point (and the picture is either kept in view or removed from view). Or, Maxi may be shown with a thought bubble depicting his belief (e.g., a thought bubble of chocolate in the drawer and not the cupboard). Intriguingly, all these variations produce largely the same results: they all are helpful, but none provide evidence for the type of interaction noted in Figure 3C. Finally, the country of origin influences performance. Figure 7, which includes lines for the seven countries in which there were six or more total conditions, shows that at any one age, children from various countries can perform better or worse than each other. Nonetheless, children in all countries exhibit the same developmental trajectory. Conditions from the United States and the United Kingdom represent the largest sample (together contributing 48% of all conditions). Thus, using those countries as a baseline, children in Korea perform similarly; children in Australia and Canada perform somewhat better; and those in Austria and Japan perform somewhat worse. The effect size values shown in Table 2 present two extremes. If at 668 Child Development bJO O o u s o o a, o J_L J_I I_L 30 40 50 60 70 80 90 100 110 Age (Months) Figure 7 Proportion correct versus Age X Country for the primary conditions. 44 months of age children in the United States are 50% correct, then children in Australia are 69% correct and children in Japan are 40% correct. Interaction. One variable, temporal marking, interacts with age. This variable refers to whether the false-belief question emphasizes the time frame involved (e.g., "When Maxi comes back, where will he look first for his chocolate?"). As shown in Figure 8, only at older ages does including temporal information in the target question significantly increase correct performance. Temporal markers increase the length and complexity of false-belief questions. Conceivably this complexity may hinder, or at least fail to enhance, young children's performance, although older children seem to benefit from the clarity this additional information provides. Bootstrap Analyses Data for meta-analyses often fail to meet certain standard statistical assumptions. For example, in our analyses some studies contributed 10 or more conditions to the dataset but others contributed only one or two conditions, making independence assumptions problematic. Bootstrap analyses are more assumption free. The basic idea of the bootstrap is to replace the theoretical normal distribution with the empirical distribution formed by the data themselves, N observations (Freedman & Peters, 1984). One then repeatedly samples with replacement (B samples of size N) from this empirical distribution, each time computing the 5 4 3h bJD O >—1 o U c o o o i-1-1-r Temporal Marker o Yes ---- x No - J_L 30 40 50 60 70 80 90 100 110 Age (Months) Figure 8 Proportion correct versus Age X Temporal Marker. coefficients of the regression model and storing the results. When the procedure is completed, there are B estimates of each parameter, each computed on a randomly selected set of N observations. The mean of these B estimates is the bootstrap estimate of the parameter; the standard deviation of the B estimates is used to estimate the standard error of each parameter. The key analyses in this study were confirmed by conducting parallel bootstrap analyses (where N = 362 and B = 1,000). For example, consider the regression model underlying Figure 2B. If only age is used to account for false-belief judgments, then R2 = .39, and the following coefficients are obtained (with standard errors in parentheses) proportion correct (transformed) = -3.96 [constant] + .09 [age]. (.305) (.006) Using bootstrap procedures to calculate the parameters of this model 1000 times yielded mean values of -3.98 for the constant and .09 for age, with the standard deviation for these parameters being .303 and .006, respectively. In short, this bootstrap analysis closely replicates the earlier finding in all respects. Bootstrap analyses confirmed all the effects reported above. In particular, bootstrap analyses confirmed the absence of interactions between age and any other significant factor, except temporal marking. Omnibus Analyses The findings from primary conditions were also corroborated by conducting similar analyses on the Wellman, Cross, and Watson 669 entire omnibus dataset. For example, the analysis of age, depicted in Figure 2, includes the data from 362 primary conditions; the parallel omnibus analysis included 453 conditions. Age was significant in the omnibus analysis, replicating the primary analysis. The regression equations are closely similar: DV (primary) = -3.96 [constant] + .090 [age] R2 = .39; DV (omnibus) = -3.51 [constant] + .090 [age] R2 = .28. The principal difference is that the omnibus analysis is noisier, yielding a lower R2 and consequently accounting for less of the overall variance. Similarly, each variable that was significant in the primary analyses was re-analyzed, using the omnibus dataset. Results closely replicated the primary analyses, although in each case they proved noisier and thus accounted for less overall variance. Specifically, results for motive, real presence, participation, salience, and country again evidenced main effects and no interactions. Temporal marking evidenced the same interaction that was apparent in the primary analyses (shown in Figure 8). We prefer the primary analyses over the omnibus ones because, methodologically, the primary analyses focused on studies with better controls and more complete data, and, analytically, the primary analyses provided a better accounting of the variance. Nonetheless, the close similarity of the results from the primary and omnibus analyses further attests to the robustness of the findings. The omnibus data allow for the most comprehensive look at the influence of children's country of origin on their false-belief reasoning. It is reasonable that during attempts to tailor tasks to different cultural materials, nonstandard tasks might result. Figure 9, in conjunction with Figure 7, therefore, presents more complete information about country. The line for the United States in Figure 9 closely replicates the U.S. line in Figure 7 for the primary analysis, and thus serves as a baseline for other comparisons. The other lines in Figure 7 depict performance by children in Western and non-Western literate, developed countries. The omnibus data, however, include children from two nonliterate, more traditional communities: the line in Figure 9 for Africa depicts responses from hunter-gatherer Baka children (Avis & Harris, 1991) and the line for Peru depicts data from Quechua-speaking Peruvian Indians (Vinden, 1996). Just as in the primary analyses, country is clearly significant but does not interact with age. For these two cultural communities, just as for the others shown in Figure 7, children's false belief performance increases across 60 O o u c o o o 30 40 50 60 70 80 90 100 110 Age (Months) Figure 9 Proportion correct versus Age X Country for three different countries within the omnibus conditions. years in equivalent age trajectories, although at any one age children from different countries and cultures can perform differently. Multivariate Accounting To summarize, four task variables help the performance of younger (as well as older) children—motive, participation, salience, and real presence. One non-task variable, country, also included values that enhanced performance. In all the analyses reported thus far, younger children (30-40 months of age) remain below or at-chance level. It remains to assess, however, if a combination of task enhancements might enable very young children to perform systematically above chance level. In addition, it is important to determine what combined model best predicts children's false-belief performances. Finally, testing multiple factors in a combined model is important because the significant influence of one factor may disappear if the other factors are controlled for as well. For example, the effects for country may conceivably be due to investigators in one country exclusively using more helpful task variations; or the apparent benefits of active participation may disappear if motive is also included in the analysis (if, for example, almost all active participation tasks also included deception). We tested combined models on primary conditions because regression predictions were more precise for primary conditions rather than the omnibus set (which, to reiterate, consistently yielded similar, but statistically noisier, results). The first model tested included all variables that significantly enhanced young children's performance in the earlier two-way 670 Child Development analyses—country, motive, participation, salience, real presence, and age. With these factors included, the multiple R = .74, and R2 = .55. Hence these factors accounted for 55% of the variance in children's correct false-belief judgments. Again, bootstrap analyses closely replicated this regression model. Because country is a subject variable not under experimental control, it was excluded in the next model tested. In this model all factors except participation significantly and independently contributed to the overall prediction, multiple R = .68, R2 = .47. The final model therefore dropped participation and included motive, salience, real presence, and age, multiple R = .68; R2 = .46.2 Using this final model three predicted sets of effects were then examined, as shown in Figure 10. The "best-effects" alternative shows predicted performance when values of all variables are maximally enhanced. Therefore, for motive, the best-effects model was based on framing the transformation in terms of explicit deception of the protagonist; for salience, this model was based on explicitly stating or picturing the protagonist's (false) mental contents; and for real presence, this model was based on the absence of a real target object at the time of the child's judgment. The "worst-effects" alternative examined predicted performances in the presence of detrimental values of these variables. For example, for motive the worst-effects model was based on the absence rather than the presence of explicit deception. The "no-effects" predictions were based on neutral or average values of each variable. As shown in Figure 10A, the no-effects prediction closely mimics the overall age changes revealed in the initial analysis of age alone (shown in Figure 2B). Most important, the best-effects prediction in Figure 10A is identical in slope to the no-effects line; even the best-effects combination of these enhancing variables does not differentially enhance younger children's performance. Moreover, a 95% confidence band that is fit around the best-effects line substantially overlaps with 0 or chance performance. These confidence bands are shown in Figure 10B. Specifically, the best-effects combination does not yield significant above-chance performance at the youngest ages; only at 40 months and older does that 2 This combined model arguably provides the most precise estimates of effect size for these key variables, because the influence of any one variable is controlled for the influence of other variables in this model. The effect size values generated from this model were: age, 3.32 (for each increase of 1 year); motive, 2.10; real presence, 2.63; and salience, 1.80. Comparing these values to those in Table 2 shows close agreement as well as some modest adjustments. Note that in this combined model the effect of age is not attenuated (as expected by an early competence hypothesis). A ^ bo o o