A Synthesis of Read-Aloud Interventions on Early Reading Outcomes Among Preschool through Third Graders at Risk for Reading Difficulties

A synthesis and meta-analysis of the extant research on the effects of storybook read aloud interventions for children at-risk for reading difficulties ages 3–8 is provided. A total of 29 studies met criteria for the synthesis, with 18 studies providing sufficient data for inclusion in the meta-analysis. Read aloud instruction has been examined using dialogic reading, repeated reading of stories, story reading with limited questioning before, during, and/or after reading, computer assisted story reading, and story reading with extended vocabulary activities. Significant, positive effects on children’s language, phonological awareness, print concepts, comprehension, and vocabulary outcomes were found. Despite the positive effects for read aloud interventions, only a small amount of outcome variance was accounted for by intervention type.

Reading aloud to young children has been a commonplace practice in homes and schools for years. Parents, educators, policymakers, and politicians have promoted read alouds with the common belief that reading to children makes a difference in children’s literacy development. Indeed, adults reading to children under varying conditions may have social-emotional benefits for the children (Bus, 2001) as well as positive impacts on children’s language and literacy development (van Kleeck, 2004; Lonigan, 1994; NRC, 1998; Wasik & Hendrickson, 2004) and motivation to read (Gambrell & Marinak, 2009).

Many children learn to read effortlessly; however, an estimated 10% of children have difficulty acquiring reading skills and need additional support or specialized instruction (Catts & Hogan, 2003). Conditions that place children at risk for reading difficulties include poverty, cultural and linguistic differences, neurologically-based problems, inadequate instruction, limited development-enhancing opportunities, or familial history of reading disabilities (McCoach, O’Connell, Reis, & Levitt, 2006; Neuman & Dickinson, 2001). Knowledge of these early risk factors as well as the negative impact of reading difficulties on future success in school and life (National Center for Educational Statistics 2004; 2006) have led to an increased focus on the early identification and prevention of reading difficulties in the early grades, including preschool (Catts & Hogan, 2003; Lyon, 1998).

The focus on early intervention for children at-risk for reading difficulties has led early childhood and elementary educators and researchers to examine early literacy instructional practices including the commonplace practice of storybook read alouds (National Early Literacy Panel, 2009). A corpus of research investigating the effects of read aloud practices in preschool and the early elementary grades has developed over the past several decades. Previous meta-analyses (e.g. Blok, 1999; Bus IJzendoorn & Pellegrin, 1995; Mol, Bus & DeJong, 2009; National Early Literacy Panel, 2009), and syntheses (Karweit & Wasik, 1996; Scarborough & Dobrich, 1994) have provided important information to inform the use of read alouds with young children, including a) support for reading aloud as an effective practice for improving child outcomes, b) initial work to identify study features associated with stronger effects, and c) some support of read-alouds as a practice for children at risk for reading difficulties, and d) identification of methodological concern for further study.

Scarborough and Dobrich (1994) synthesized 31 studies conducted over a period of thirty years that explored the influence of parent-child read aloud experiences on language and literacy development. Effect sizes for some studies were reported, and were moderately positive. Experimental groups outperformed control groups in interventions that involved parent training for shared book reading strategies. However, the strength of the association was inconclusive based on variability of results and methodology across the studies reviewed.

While the extensively cited Scarborough and Dobrich (1994) review was replete with evidence to support reading to children, the authors also drew attention to the potential overconfidence that educators, parents, and researchers place on the contributions of parent-child reading to the promotion of literacy development. They cited an ongoing debate about the bidirectional effect between child interest in reading and being read to. In other words, does reading to young children promote interest and skill development or do inherent interest and skill elicit more reading, thus impacting further skill development? Scarborough and Dobrich concluded that reading aloud to children is a complex practice, worthy of additional research and thus, subsequent studies and reviews on the practice have since flourished.

Karweit and Wasik (1996) complemented Scarborough and Dobrich’s (1994) work by examining studies that focused on the effects of storybook reading on 4- and 5-year-olds in school settings. In their narrative review of studies that included children from disadvantaged settings conducted between 1988 and 1994, the authors identified the following practices associated with positive outcomes: small group size in favor of whole group or individual story reading, repeated readings to facilitate child involvement and comprehension, vocabulary instruction using synonyms or role playing, and teacher interaction and questioning strategies to support vocabulary learning and comprehension. Although the authors identified disadvantaged children as a focus, the data were not disaggregated to describe differential effects for children with varying abilities or risk status.

Blok (1999) conducted a meta-analysis of ten studies, published between 1968 and 1994, in which read alouds took place in educational settings. The studies included aggregated data for disadvantaged (not defined) and non-disadvantaged children ranging in age from 31 to 90 months. The overall effects of a variety of read aloud interventions were medium in size for oral language (ES = 0.63) and reading (ES = 0.41) outcomes. Blok (1999) identified study characteristics that acted as moderators for oral language and wide reading outcomes, citing that instruction with younger children and smaller instructional group sizes were associated with larger effects. In addition, the use of untrained adult readers was associated with greater effects in wide reading. It is unclear though, whether the provision of training for the readers was a constant variable across studies in which this was reported. An important, general conclusion of the meta-analysis was that although some studies revealed positive effects, the authors also reported some poor quality research and highlighted the need for additional empirical work on the use of reading aloud in educational settings as a means to improve literacy outcomes.

An increase in high-quality research on story read alouds has since occurred. A recent meta-analysis of early literacy practices for preschool and kindergarten children examined experimental and quasi-experimental literacy literature through 2003 (National Early Literacy Panel, 2009). The panel reported that shared reading interventions demonstrated moderate effects on children’s print knowledge and oral language skills. Shared reading was examined across settings (e.g, schools, home, pediatricians provided books to parents, etc.) and across adult readers (parents, teachers). While the overall effect size for children who were not at risk seemed to be higher than for children who were at risk, the difference in effects was not statistically significant. Information on features of instruction and the effects on specific literacy outcomes for the at risk population were not analyzed.

Our synthesis and meta-analysis differs from previous reviews in several ways, First, we focus on only teacher-delivered interventions, excluding parent interventions (e.g. Bus et al., 1995; NELP, 2009). Second, we limit our review to studies that focus on students at risk for reading difficulty. Third, because previous meta-analytic results indicate no differences in effect size due to age differences (NELP, 2009), we have included preschool through third graders. Finally, previous syntheses have focused on limited interventions (e.g. Bus et al, 1995 focused on frequency of book reading) or limited outcomes (e.g. Mol et al., 2009 focused on vocabulary and print knowledge. Our synthesis considers all early reading and language outcomes. We address the following questions:

What outcomes result from read-aloud practices in educational settings for children considered at risk for reading difficulty?

What features of read aloud instruction are associated with improved outcomes for children at risk for reading difficulty?

Method

A comprehensive search of the literature was performed through a three-step process (Cooper, 1998). First, a computer search of ERIC and PsycInfo was conducted to locate studies published between 1984 and 2008. Topic-related terms or root forms of those terms (story telling, story reading, book reading, storybook reading, read aloud, oral reading, retelling, and dialogic preschool*, prekindergarten, pre-kindergarten, kindergarten*, young children, early childhood, daycare, school, elementary school, teaching, intervention, literacy, oral language, communication, comprehension, language, LD, at-risk, disadvantaged, low income, disabilities, learning dis*, learning difficulties, language delays, vocabulary, struggling readers, emerging readers, reading disabilities, reading difficulties) were used in various combinations to identify the greatest number of related articles. A total of 3,752 articles were identified in the initial search.

Second, to ensure the most recently published articles were included, a hand search of the following nine journals from 2004 to 2008 was conducted: Journal of Educational Psychology; Reading Research Quarterly; Language or Speech or and Hearing Services in Schools; Early Childhood Research Quarterly; Topics in Early Childhood; American Journal of Speech Language Pathology; Exceptional Children; American Educational Research Journal; and Journal of Communication Disorders. Third, we searched the reference list of each qualifying study to ensure that all studies meeting our criteria were identified.

Studies were selected if they met the following criteria:

Participants were in preschool – 3 rd grade or ages 3 to 8. When a sample also included older participants, the study was retained if at least 50% of the sample met the age/grade criteria or outcome data for the targeted age/grade range (3–8 years; preschool-grade 3) could be disaggregated from the larger sample.

Children were at-risk for reading difficulties based on at least one of the following categories: low achievement in phonemic awareness, vocabulary, or letter identification, few preschool or home literacy experiences, low socioeconomic status, family history of reading disability, or attended a school with historically low reading achievement. When a study included average achieving children as well as those at risk, the study was retained if at least 50% of the sample was at-risk for reading difficulties or the reading outcomes could be disaggregated for at-risk children.

A treatment-comparison, multiple treatment, single-group, or single subject research design was utilized. Case studies and perception studies (e.g. Collins-Stanley & Gan, 1996) were excluded from the synthesis.

The study took place in a preschool, day care, or school setting. If an intervention took place in one of the selected settings and children also participated in a home-based or parental intervention, the study was included. Interventions conducted solely by parents were excluded.

The intervention consisted of a read aloud of a storybook in an alphabetic language by an adult or using audio-tape or computer assisted applications. Multi-component studies for which read aloud was not the primary focus (e.g. Hindson, Byrne, Fielding-Barnsley, Newman, Hine, & Shankweiler, 2005; Ukrainetz, Cooney & Dyer, 2000), interventions that included read alouds of expository text (e.g. Daly & Martens, 1994), and observation studies of existing practices (e.g. Purcell-Gates, McIntyre & Freppon, 1995) were excluded.

At least one dependent measure assessed reading or language-related outcomes. The study was published in a refereed journal.

Twenty-seven articles (29 studies) met these criteria. Two articles included more than one study (Beck & McKeown, 2007; Coyne, McCoach & Kapp, 2007).

Data Analysis

Coding procedures

We coded each study to identify and organize pertinent information using a code sheet employed in earlier syntheses (Kim, Vaughn, Wanzek, & Wei, 2004; Vaughn, Kim, Sloan, Hughes, Elbaum, & Sridhar 2003) and aligned with the What Works Clearinghouse Design and Implementation Assessment Device (Institute for Educational Sciences, 2003), which is used to evaluate the quality of studies. The code sheet included information on participants, design, description of conditions, clarity of causal inference, possible sources of intervention contamination, and reported findings.

The participant section of the code sheet included four forced choice items (socioeconomic status, risk type, exceptionality, and gender) and two open-ended items (age as described in the text and risk type as described in the text). Design information was recorded using four forced choice items (research design, assignment of participants to study groups, use of fidelity checks, and report of pretest scores) and one open ended item (criteria for participant selection). Detailed information on both treatment and comparison groups was coded through open-ended items (age and grade of participants, site of intervention, role of person implementing intervention, length of each session, duration of intervention, total number of sessions, and frequency of sessions).

Clarity of causal inference was recorded using six items for experimental designs (e.g. presence/absence of differential attrition between intervention and comparison group, sample size of the intervention group at the beginning and end of the study) and nine items for quasi-experimental designs (e.g. procedures used to equate groups, evidence of differential attrition between groups). Additional items allowed coders to describe assessment measures. Finally, precision of outcomes for both effect size estimation and statistical reporting was coded using a series of eight forced-choice or yes/no questions (e.g. evidence of assumptions of independence, normality, and equal variance). Effect sizes were calculated using reported performance on outcome measures and sample sizes for each treatment and comparison condition.

Six researchers participated in a series of training sessions (approximately 8 hours total) on interpreting the code sheet. Following training, researchers coded one article to determine inter-rater agreement (agreements divided by agreements plus disagreements). Inter-rater reliability of .95 for study characteristic coding and .91 for effect size coding was achieved. Teams of two researchers coded each article, resolved any differences in coding, and reached final decisions by consensus. After coding was completed, information was compiled in table format. Table 1 includes information on study design, number and age of participants, type and duration of intervention, and person who implemented the intervention. In Table 2 , interventions are described and effect sizes are reported when appropriate data were available.

Table 1

Features of Intervention Studies

StudyStudy DesignNumber of ParticipantsAge (Grade)# of Sessions (frequency and/or duration)Person ImplementingType of InterventionType of Outcome
Boykin & Cunningham (2001)
Fidelity reported: No
Multiple Treatment6484–96 months (NR)NRResearcherOtherComprehension
Coyne, McCoach & Kapp (2007)
Study 2
Fidelity reported: Yes
Multiple Treatment3264–84 months (K)3 sessions (NR)ResearcherVocabularyVocabulary
Dickinson & Smith (1994)
Fidelity reported: No
Multiple Treatment2548 – 60 months (NR)NRTeacherDialogicVocabulary;
Pemberton & Watkins (1987)
Fidelity reported: No
Multiple Treatment2038 – 57 months (Pre-K)8 sessions (2/week × 4 weeks)ResearcherRepeated ReadingLanguage Development; Vocabulary
Reutzel, Hollingsworth, & Elderege (1994)
Random assignment*
Fidelity reported: Yes
Multiple Treatment79NR (2)80 sessions (daily × 16 weeks; 30-min)TeacherDialogicWord Recognition; Vocabulary; Comprehension
Beck & McKeown (2007)
Study 2
Fidelity reported: Yes
Single Group76NR (K – 1st)45 sessions (daily × 9 weeks; 20 min)TeacherVocabularyVocabulary
Davies, Shanks, & Davies (2004)
Fidelity reported: No
Single Group3159 months (K/1)24 sessions (3/week × 8 weeks; 40 min)Speech Teacher ParaprofessionalDialogicComprehension
Bellon-Harn, Hoffman, & Harn (2004)
Fidelity reported: No
Single Subject366 – 72 months (NR)# sessions NR (4 weeks)ResearcherOtherLanguage Development
Aram & Biron (2004)
Fidelity reported: No
Treatment/Comparison9536–60 months (Pre-K)66 sessions (2/week; 33 weeks)ResearcherDialogicPhonological Awareness; Word Recognition; Comprehension
Beck & McKeown (2007)
Study 1
Fidelity reported: Yes
Treatment/Comparison98NR (K – 1st)50 sessions (daily × 10 weeks)TeacherVocabularyVocabulary
Bygrave (1994)
Fidelity reported: No
Treatment/Comparison2991 months (NR)115 sessions (daily × 23 weeks)TeacherLimited QuestioningLanguage Development; Vocabulary; Comprehension
Combs (1987)
Fidelity reported: No
Treatment/Comparison24NR (K)# sessions NR (3 weeks)TeacherDialogicComprehension
Coyne, McCoach & Kapp (2007)
Study 1
Fidelity reported: Yes
Treatment/Comparison3164–78 months (K)3 sessions (20–30 min)ResearcherVocabularyVocabulary
Coyne, Simmons, Kame’enui, & Stoolmiller (2004)
Random assignment*
Fidelity reported: No
Treatment/Comparison64NR (K)NR (21 weeks)TeacherDialogicPhonological Awareness; Vocabulary
Hargrave & Senechal (2000)
Fidelity reported: Yes
Treatment/Comparison3636–60 months (Pre-K)20 Sessions (Daily × 4 weeks)TeacherDialogicVocabulary
Justice & Ezell (2002)
Random assignment*
Fidelity reported: Yes
Treatment/Comparison30
T: 15
41–62 months (K)32 sessions (4/week; 8 weeks)ResearcherRepeated ReadingPhonological Awareness; Word Recognition
Justice, Meier & Walpole (2005)
Random assignment
Fidelity reported: Yes
Treatment Comparison5760–77 months (K)20 sessions (1–3 days/week × 10 weeks; 20 min)
20 sessions (1–3 days/week × NR; 20 min)
Researcher (Graduate Students)

Table 2

Study Findings and Effect Sizes

T1 (Low Movement Theme): The behaviors and activities of the characters reflected few expressive movement themes. In the first context, children were told to stand or sit while listening to the story. They then answered questions on that story. In the second context, children were told that they could dance and clap if they wanted to while listening to the story. They then answered questions on that story.

T2 (High Movement Theme): The behaviors and activities of the characters reflected expressive movement themes, such as running, dancing, jumping. Same contexts as T1

T1 (Extended Word Instruction) Variety of interactive activities during and following read aloud focused on recognizing, discussing, and answering questions related to target words encountered during read alouds.

T2 (Embedded Word Instruction) During read aloud, simple definition of target words were provided and then sentences were reread with definition replacing the target word

T1 (Co-constructive Approach): Read-aloud characterized by high amounts of talk by both children and teachers during read-aloud.

T2 (Didactic – Interactional Approach): Engaged in limited amounts of talk before, during, and after read-aloud.

T3 (Performance Oriented Approach): Read-aloud characterized by talk before and after read-aloud; little talk during read-aloud.

T1 (Modeling Reading): Read-aloud story with same content as T2, but no recasting.

T2 (Recast Reading): Read-aloud story written with base sentence-recast pairings (e.g. This is a frog. He is a big, green frog); followed the same plot as T1 story.

T1 (Shared Book Experience): Texts introduced by big book read alouds, children read text with partners, echo, and choral reading

T2 (Oral Recitation Lesson): Texts read aloud, children practice, rehearse, and recite texts

T1 (Text Talk Rich Instruction): Read alouds followed with rich vocabulary instruction and discussion

T2 (Text Talk More Rich Instruction) Read alouds followed with more rich vocabulary instruction and discussion (i.e. instruction on words was more frequent and for a longer duration)

T (Narrative skills training): Children encouraged to derive own endings for read aloud stories, retell stories, and make up own stories using puppets and role plays; structures of narratives emphasized.

T (Repeated Story Book Reading): Using same book for a week, researchers responded to children’ responses to “wh” questions by using a cloze procedure with expansion, a cloze procedure without expansion, or a contrast word procedure depending on interaction.

T (Joint Reading): Open-ended questions, drawing, and drama were used to discuss read-aloud stories. C (Comparison): No description given. T (Text Talk): Read alouds followed with rich vocabulary instruction and discussion C: (Read aloud) Typical read aloud with similar trade-books

T1 (Music program): Musical activities, such as singing and movement, which provided opportunities for children to acquire knowledge of musical skills and concepts

T2 (Storytelling program): Read alouds aimed at improving children organization, comprehension, and memory skills

T3 (Combination program): Combination of music and story telling programs C (Control): No program T (Modeled Approach): Modeled aspects of reading process using enlarged texts during read-aloud. C (Comparison): Read books aloud with emphasis on enjoying the story; few questions were asked.

T (Extended Word Instruction): Variety of interactive activities during and following read aloud focused on recognizing, discussing, and answering questions related to target words encountered during read alouds.

C (Comparison): No instruction

T (Storybook): 3 target vocabulary words taught before the read aloud, after the read aloud children were given opportunities to retell stories using target vocabulary and selected illustrations as prompts

C (Control): Sounds and letters module of a commercial reading program

T (Dialogic Reading): Asked wh-questions, followed correct responses with another questions, and emphasized importance of enjoying stories during read-alouds.

C (Comparison): Typical reading instruction.

T (Print-Focused): During read-aloud, prompts for print conventions, concepts of words, or alphabet knowledge were given.

C (Picture-Focused): During read-aloud, prompts regarding pictures (such as characters, perceptual focus, or action focus) were given.

Print Recognition Words in Print

T (Word Elaboration): Target words were explicitly defined as they occurred in the text followed by use of the word in context.

C (Comparison): Regular kindergarten curriculum

T1 (E-book): Children read story via computer in “read with dictionary” mode (oral reading with explanation of difficult words) as well as in “read story and play” mode (designed to enhance phonological awareness and comprehension).

T2 (Adult Readers): Adults completed read aloud with prescribed instructions that included pre-reading activities (predictions and introduction of vocabulary words and their definitions) as well as concrete and inferential questions during and after reading.

C (Comparison): Typical practice.

T1 (Typical Shared Reading): Read text aloud, commented on pictures aloud, and asked children if there were any questions

T2 (Dialogic Reading): Read text aloud, ask children to name or describe objects or actions from text, and asks open-ended questions

C (Control): Typical preschool curriculum

PA: Alliteration Oddity

T1 (Dialogic reading): Read text aloud, ask children to name or describe objects or actions from text, and asks open-ended questions. High compliance center.

T2 (Dialogic reading): Read text aloud, ask children to name or describe objects or actions from text, and asks open-ended questions.

C1 (Control): No specific instructions or activities. High compliance center. C2 (Control): No specific instructions or activities. Low compliance center.

T1 (Structural Discussion): Questions posed to enhance listening comprehension before and after read aloud focused on setting, theme, characters, episodes, and resolution.

T2 (Traditional Discussion): Questions posed to enhance listening comprehension before and after read aloud focused on recalling facts and details, cause and effect relationships, classifying, interpreting characters’ feelings, problem solving, and relating the story to children’s experiences.

T3 (Combined Structural and Traditional Discussion): Questions posed to enhance listening comprehension before and after read aloud focused on both story structure elements and traditional discussion elements.

C (Comparison): No questions or discussion initiated to enhance listening comprehension before or after read aloud.

T1 (Different Book Reading): Listened to a different story read aloud each week. T2 (Repeated Book Reading): Listened to 3 different books read 3 times each.

C (Reading Readiness Activities): Focused on visual and auditory discrimination skills using a commercial program.

Comprehension: Focus on Meaning

Comprehension: Story Structure

Comprehension: Focus on Print

T (Metacogntive language): Read aloud texts included numerous metacognitive verbs; teachers encouraged to use metacognitive verbs during text discussions

C (Comparison): Read aloud texts were edited to remove metacognitive verbs

T (CD-ROM Storybook Program): Allowed children to select and control the pace of the story read aloud. Included pre-and post-reading activities

C1 (Comparison): Children spent the same amount of time on the computer but did not use the CD-ROM storybook program

T1 (Con-current Translation): During read-aloud, two languages used interchangeable or concurrently; care taken to avoid use of direct translation between languages

T2 (Preview/Review): Used two languages separately. Preview and review activities conducted in dominant language; read-aloud conducted in target language

C (Comparison): Read-aloud conducted in target language without explanation of stories.

T (Interactive Book Reading Techniques): Defined target vocabulary, used objects as visual representations, provided opportunities for children to used target vocabulary, and asked open-ended and reflective questions before, during, and after read-aloud.

C (Comparison): Typical read alouds; teachers used same texts but were not trained in the interactive book reading techniques

T: (Book Reading and Oral Language Strategies): Teachers promoted reflective discussion during and after reading by introducing target vocabulary words, objects that represented target words, and open-ended questions regarding the objects. Active listening, rich language, and expanded feedback were also promoted.

C: Head Start centers were provided with a list of the books used in T. The books were displayed and used by teachers on an independent basis.

T (Dialogic reading): Dialogic reading; books chosen because of their potential to support vocabulary growth

C (Control): Play activities in small groups with toys

Note. T = Treatment; C = Comparison; PPVT = Peabody Picture Vocabulary Test; WPPSI = Wechsler Preschool and Primary Scale of Intelligence; PPVT-R = Peabody Picture Vocabulary Test – Revised; EOWPVT-R = Expressive One Word Picture Vocabulary Test -Revised; LMAC – R = Lindamood Auditory Conceptualization Test – Revised; PCC = Percentage of Consonants Correct;

* norm-referenced measure; + This group did not meet inclusion criteria for use in synthesis or meta-analysis.

Article categorization

A team of two researchers independently categorized articles by intervention type. A total of six read-aloud intervention types were identified. Descriptors for each intervention type were developed (see Table 3 ) and used by a third researcher to verify the categorization.

Table 3

Descriptors for Article Categorization.

Authors call the intervention “dialogic reading.”

The storytelling role is gradually shifted from the adult reader to the child through various techniques (e.g. open ended questions, repetition, modeling; Ezell & Justice, 2005).

Conversational turn-taking occurred. Turns may consist of verbal, vocal, or nonverbal responses (Ezell & Justice, 2005).

The teacher asks questions and prompts the child to increase the quality of descriptions with the goal of the child learning to become the storyteller (Lonigan, 1998).

Same book is read aloud on several occasions. Repeated reading is the main component of the intervention. Questioning before, during, and/or after reading. Does not engage in extended dialogue. Read aloud is conducted through the use of a computer Activities before, during and after reading focused on vocabulary words and/or concepts.

More than simply defining words. Activities must be conducted with the purpose of increasing word knowledge.

Read aloud of text is provided

No focus on dialogic reading, repeated reading, limited questioning, computer assisted, or extended vocabulary instruction.

Effect size calculation

The primary statistic used to analyze the relationship between treatment and outcomes across the outcome measures was Cohen’s d (Cohen, 1988). This statistic indicates, in standard deviation units, the extent to which the treatment groups outperformed the control groups. For studies in this synthesis that employed a treatment-comparison design, effect sizes can be interpreted as d = 0.2 as small, d = 0.5 as medium, and d = 0.8 as a large effect (Cohen, 1988). The authors calculated Cohen’s d by taking the difference of the treatment and control group means and dividing by the pooled standard deviation. Instances where F-test statistics were provided instead of means and standard deviations, the effect size was calculated using the F statistic and degrees of freedom. Nine studies did not report sufficient data to allow effect size calculation. Moreover, it is known that effect sizes from small studies tend to be biased, thus a sample weighted estimate of the standardized difference was computed.

Meta-analysis plan

We conducted a meta-analysis of a subset of 18 studies that employed a treatment-comparison design and included information about outcomes and sample sizes that allowed for meta-analytic analysis (studies are denoted in tables). Many of the studies included multiple treatment or control groups and administered more than one measure in order to assess performance on various outcomes. As such, in order to account for treatment and control differences specific to the different types of outcomes, individual meta-analyses were run for: language, phonological awareness, word recognition, print concepts, vocabulary, and reading comprehension. Estimation of the mean effect size, variability across studies, and moderator effects were assessed using a multilevel model approach using HLM 6.0 software (Raudenbush & Bryk, 2002). Multilevel models are aptly applied to meta-analytic data for a number of reasons: a) multi-level modeling can account for subjects nested within studies by calculating the variance at both the subject and study level; b) multilevel modeling provides flexibility in mixed model estimation; and c) the researcher can simultaneously estimate the average effect size (fixed effect), variance of the effect (random effect), and residual variance modeled by moderator variables.

Multilevel modeling has the benefit of estimating results based on several methods, including restricted maximum likelihood (REML; Raudenbush & Bryk, 1986). Traditionally, ordinary least squares or weighted least squares are used in a regression analysis, with both methods designed to reduce the squared differences between predicted and observed values of the dependent variables. In the context of a multilevel design however, both methods are less efficient and require more participants to achieve results comparable to those estimated by REML (Kreft & Leeuw, 1998). Although multilevel analyses are typically performed with the presumption that the variances are previously unknown, the meta-analysis must be performed under the known variance models. In these models, the variances must be calculated and entered into the multivariate data matrix along with the study ID, effect size, and moderator variables. The estimation of multilevel meta-analytic models is performed in a two-stage process. Initially, the unconditional model is run to estimate the grand mean effect size and the variance of the true effects. The level-1 model (within-studies) is summarized as:

dj = δj + ej dj = the effect size from a study δj = the true effect size of dj across all studies, or the corresponding population parameter ej = the sampling error associated with dj as the estimate of δj.

Should statistically significant variability be observed, the analyst may perform a conditional analysis, which includes the level-2 moderator variables. The purpose of this second model is to generate an effect size for prediction, and estimate remaining residual variance after controlling for the entered level-2 moderators, summarized as:

δj = γ0 + γ1W1j + … + γsjWsj + uj γ0 = grand mean effect size γs = regression coefficients W1jWsj = moderator variables predicting δj uj = level-2 random error.

When combining the two equations, two sources of error are contained; one for the unconditional model (uj) and one for the conditional level-2 model (ej). Determination of whether the unconditional or conditional model is selected is based on the estimation of τ (i.e. observed score variance), whereby a value of zero indicates that no variance in study effect sizes remains after partitioning sampling errors, thus the level-2 model would not be needed. Should τ be significantly different from zero, the conditional model would be employed to test if the residual variance could be explained by the selected moderator variables.

Furthermore, at both the unconditional and conditional levels, empirical bayes (EB) estimates may be derived to compare with the maximum likelihood estimates (REML). This process allows the researcher to supply the grand mean effect size and create a normal distribution of effect sizes, shrinking the extreme estimates closer towards the mean. An advantage of this procedure is that the analyst may observe the degree of unreliability that exists within a particular study effect size. Whereas the REML approach considers all values as likely, the EB estimates generate a normal distribution around the grand mean for the individual cases, with those showing the most shrinkage as evidence that the units have little data, or may considered to be outliers (Raudenbush & Bryk, 2002). Moreover, from both models, the variance components for the grand means can be used to estimate the proportion of reduction in variance accounted when using the selected moderators in the analyses. In this equation τ ̑ qq (UC) represents the variance component for the unconditional model, and τ ̑ qq(C) is the variance component for the conditional model (Raudenbush & Bryk, 2002).

Prior to meta-analysis, estimation of the file-drawer problem was conducted to determine the extent to which unpublished, unsuccessful studies would need to be incorporated to make results non-significant. Since most published studies report statistically significant findings, it is supposed that greater non-published research exists for the topic. Results indicated that the file drawer problem did not pose a threat to the current study (1.72 < 1.96).

Results

A total of 27 articles, comprising 29 studies (Beck & McKeown, 2007 andCoyne et al., 2007 each contain two studies) are included in this synthesis, and reflect a variety of study designs. Therefore, we conducted several types of analyses to fully explain the results of these studies. First, we analyzed study features (e.g. study design and use of random assignment). Second, we conducted a meta-analysis of all treatment-comparison design studies to determine the effect of read aloud interventions on several early reading outcomes, along with a follow-up moderator analysis to examine differences between studies using criterion measures (“hard” criteria; i.e. below average achievement) or background characteristics (“soft” criteria; i.e. low SES, low school quality, history of family risk) and differences using standardized or researcher developed measures. Finally, we synthesized all single-group, single subject, multiple treatment, and three treatment-comparison studies not included in the meta-analysis by outcome and intervention type.

Study Features

In the following section, we summarize information about study characteristics and design elements. Detailed information about each study is included in Tables 1 and ​ and2 2 .

Sample characteristics

Sample sizes ranged from one to 254 children, with a mean of 56.6, and a median of 58 participants. The majority of studies targeted preschool through kindergarten children (n = 22). However, three studies included first graders and two studies included second or third graders.

Study design

This corpus of studies included 21 treatment-comparison, five multiple treatment, two single group, and one single subject study. Several design elements strengthen the reliability and lend credibility to findings from treatment-comparison studies. These include use of random assignment, fidelity of treatment procedures, and the use of standardized dependent measures (U.S. Department of Education, 2003; Raudenbush, 2005; Shadish, 2002). The number of treatment-comparison studies that possess these design elements are reported in Table 4 . Of note, two treatment-comparison studies (Lonigan, Anthony, Bloomfield, Dyer & Samwel, 1999; Whitehurst, Arnold, Epstein, Angell, Smith & Fischel, 1994) reported all three elements.

Table 4

Quality of treatment-comparison studies.

ElementNumber of Studies
Random assignment to conditions11
Fidelity of treatment reported12
Standardized dependent measures14
Random assignment, treatment fidelity, and standardized measures2

Interventions

Studies were placed into one of six categories (computer assisted, dialogic reading, limited questioning, repeated reading, vocabulary, and other) based on the authors’ description of the intervention. The distribution of intervention type by design is reported in Table 5 . In addition, Table 2 includes a description of each study’s treatment and comparison conditions.

Table 5

Type of intervention by study design

Intervention TypeStudy Design
Treatment-ComparisonMultiple TreatmentSingle GroupSingle SubjectMarginal Totals
Dialogic1021013
Extended Vocabulary31105
Repeated Reading11002
Limited Questioning20002
Computer Assisted20002
Other21014
Marginal Totals2052128 *

* Korat & Shamir (2007) included a computer assisted treatment and a limited questioning treatment in their multiple treatment design study and are not included in the study totals.

Seventeen studies reported number of intervention sessions with a range from 3 to 155, and a mean of 30 sessions. The number of sessions was not reported in 12 studies. Seven studies reported the length of sessions, with a range of 6 to 40 minutes and a mean of 24 minutes. Among treatment-comparison studies, the number of intervention sessions averaged 29, while the length of sessions averaged 20 minutes. Available information about frequency and length of sessions is reported in Table 1 .

Meta-Analysis

In this section, we report results of the meta-analysis by outcome (see Table 6 ) coupled with report of our investigation of moderating effects of intervention type on each outcome (see Table 7 ).

Table 6

Grand Mean Weighted Effect Sizes from Unconditional Model

OutcomeFixed EffectsRandom Effects
CoefficientSEtp-valueVariance ComponentSDp-value
Language0.290.093.160.0050.0030.06>.50
PA0.780.155.230.420.65
PC0.860.273.170.010.710.84
RC0.700.144.940.280.53
Voc1.020.185.831.481.21
WR0.230.370.620.551.000.99

Note. PA = Phonological Awareness; PC = Print Concepts; RC = Reading Comprehension; Voc = Vocabulary; WR = Word Recognition

Table 7

Moderator Analysis for Outcomes

OutcomeModerator# of ESCoefficientSEtp-valueVariance ComponentSDp-valueR2 Red.
PADR190.840.155.490.410.640.02
CA2−0.390.70−0.560.58
LQ1−1.070.70−1.530.14
PCDR30.700.431.650.140.660.820.06
RR50.600.591.020.34
CA3−0.520.74−0.710.50
RCDR150.600.163.810.000.270.520.04
CA50.670.391.740.09
Other1−0.180.62−0.300.77
VocDR300.570.192.970.0051.061.030.28
CA61.150.502.320.03
LQ21.431.071.330.19
Voc10−0.420.65−0.650.52
Other31.610.384.18
WRDR40.040.490.080.941.161.07-
RR21.951.311.490.21
LQ1−0.111.21−0.090.93
CA10.031.210.030.98

Note. PA = Phonological Awareness; PC = Print Concepts; RC = Reading Comprehension; Voc = Vocabulary; WR = Word Recognition; CA = Computer Assisted; LQ = Limited Questioning; RR = Repeated Reading; DR = Dialogic Reading

Language Outcomes

The range of the 23 weighted effect sizes for language outcomes was −0.48 to 1.79, with an overall weighted mean from the unconditional model of 0.29 (t(23) = 3.16, p = .005). This suggests that children who received read aloud interventions significantly outperformed children in the comparison group on measures of language. The variance component from the multilevel model of 0.003 was not statistically significant (p = >.500.; Table 6 ), indicating that while intervention children outperformed comparison children, there was not significant variability among studies. Although the range suggested larger deflections from the mean in terms of magnitude, the amount of error in effect sizes precluded a rejection of the null hypothesis.

Phonological Awareness Outcomes

By modeling treatment differences, the amount of variance was reduced from 0.42 in the unconditional model to 0.41. Using the estimates from both models, the variance reduction was 0.20, indicating that 2% of the variance in effect sizes was explained by the type of intervention that was administered.

Print Concepts Outcomes

Comprehension Outcomes

Vocabulary Outcomes

Findings from Additional Studies

The meta-analysis of findings from treatment-comparison studies provides confident conclusions about causal inferences. However, results from single-group, multiple treatment, and single-subject studies can be used to support or refute findings from the meta-analysis. Findings are summarized by comprehension, vocabulary, language, and word recognition outcome measures. Within each outcome measure, findings from single group, multiple treatment, and single-subject studies are reported. Because all studies with phonological awareness or print concept outcomes were included in the meta-analysis, no additional information for these outcome variables is reported below.

Language Outcomes

In addition to the three studies included in the meta-analysis, one computer assisted (Verhallen, Bus & deJong, 2006), one limited questioning (Bygrave, 1994), one “other” (Bellon-Harn, Hoffman & Harn, 2004), and one repeated reading study (Pemberton & Watkins, 1987) included at least one language outcome. Verhallen and colleagues (2006) assessed understanding of syntax by having children repeat sentences from a story, with the idea that as children better understand the grammar of sentences, they make fewer mistakes in repeating the sentences. Effect sizes were larger for kindergarteners who listened to a story read four times using an interactive computer-based, story reading program (ES = 1.04) than children who listened to the story one time on the computer based program (ES = 0.27) or listened to an adult read the story aloud four times (ES = −0.04). Bygrave (1994) reported no differences among seven year olds between limited questioning and typical practice groups on a measure of receptive language concepts. In Bellon-Harn and colleague’s (2004) single subject study, five and six year olds with phonological and language impairments increased the complexity of utterances after several types of scaffolds during read-aloud sessions, including cloze procedures (e.g. “There is some paint. They made…” and student completes the sentence), or a contrast word procedure (e.g. student says “weg” and teacher says, “weg or leg?”) paired with asking a series of “wh” questions. Pemberton and Watkins (1987) reported no difference in language outcomes for pre-kindergarteners who were repeatedly read a story (6–7 times) written with base sentence-recast pairings (e.g. This is a frog. He is a big green frog) versus children who were read a book without the recast pairings (ES = 0.06).

Comprehension

In addition to the five treatment-comparison studies included in the meta-analysis, three dialogic reading (Combs, 1987; Davies, Shanks & Davies, 2004; Reutzel, Hollingsworth & Eldredge, 1994), two limited questioning (Bygrave, 1994; Morrow, 1984), and one study categorized as “other” (Boykin & Cunninham, 2001) included at least one comprehension outcome. The goal of dialogic reading treatments in three studies (Combs, 1987; Davies et al., 2004; Reutzel et al., 1994) was for children to develop a deep understanding of the story through the use of several strategies, including retelling. On tests of story recall, Combs (1987) reported that below average pre-kindergarteners doubled their recall of stories at posttest. Davies and colleagues (2004) reported statistically significant growth in the quantity of information children recalled, but no statistically significant growth from pre to post test in the quality of story retellings. In Reutzel et al.’s (1994) study, second graders were assigned to one of two different dialogic reading interventions (oral recitation lessons or shared book experience), whereby each group was read stories aloud and provided opportunities to practice retelling the stories. Additionally, children in the oral recitation group each had a copy of the text and spent time focused on developing reading fluency. There were little detectable differences between the two groups on measures of oral retelling (ES = 0.09) and the comprehension subtest of the TTBS (ES = 0.13). Small effects sizes were detected on a measure of answering explicit questions (ES = 0.25). However, children in the shared book experiences group out-performed children in the oral recitation group when answering implicit questions (ES = 0.85).

Bygrave (1994) and Morrow (1984) each implemented a limited questioning read-aloud intervention. In Bygrave’s study, a story reading condition whereby seven-year-olds were read one short story per day and asked questions aimed at increasing comprehension and memory skills over a 23-week period was implemented. Morrow (1984) provided teacher-identified, low achieving kindergarteners with one of three types of questions before and after read aloud sessions conducted once per week over an 8-week period: (a) questions focused on story structure (e.g. setting, theme), (b) “traditional discussion” of the story, or (c) combined structural and traditional discussion. Bygrave (1994) and Morrow (1984) both reported comprehension outcomes for children at risk for reading difficulties assigned to the treatment conditions did not exceed those in the control conditions.

Vocabulary

Fifteen treatment/comparison studies with vocabulary outcomes were included in the meta-analysis. One additional dialogic (Reutzel et al., 1994), one limited questioning (Bygrave, 1994), one “other” (Pemberton & Watkins, 1987) and two vocabulary studies (Beck & McKeown, 2007 study 2; Coyne et al., 2007 study 2) reported at least one vocabulary outcome measure. Second graders who received dialogic reading intervention (Reutzel et al., 1994) slightly outperformed children in a more fluency focused intervention on a standardized measure of vocabulary (ES – 0.19). However, Bygrave (1994) reported no significant differences between groups at post test. Pemberton and Watkins (1987) examined the effect of language recastings on vocabulary outcomes. They provided one group of pre-kindergarteners with multiple readings of a text written with language recastings, and the comparison group with repeated readings of a story with the same content, but no language recastings. Teachers read the book aloud to children six to seven times. Authors reported no statistically significant differences between the treatment and comparison groups on a standardized test of vocabulary (ES = 0.05). However both groups demonstrated statistically significant vocabulary gains from pre to post test.

Two authors designed vocabulary based interventions delivered during read aloud sessions (Beck & McKeown, 2007 study 2; Coyne et al., 2007 study 2). Both studies provided one treatment group with rich, extended vocabulary instruction that included a variety of interactive activities before, during, and following read alouds that focused on recognizing, discussing, and answering questions about target vocabulary words. However, the second treatment condition differed between these two studies. When Coyne and colleagues (2007, study 2) compared an extended vocabulary treatment with read alouds containing simple definitions of target words provided during reading, kindergarteners in the extended vocabulary treatment scored significantly higher on researcher developed measures of expressive (ES = 1.70), receptive (ES = 0.99), and context vocabulary (ES = 1.12). In addition, children with higher PPVT scores prior to treatment were more likely to learn word meanings through extended instruction than children with lower initial PPVT scores.

Beck and McKeown (2007, study 2) compared an extended vocabulary treatment (“Rich Instruction”) with a more intensive intervention where children received more frequent extended vocabulary instruction for a longer duration (“More Rich Instruction”). While effect sizes could not be calculated, authors reported that among both kindergarten and first graders, the pre to post test gain in words learned among children who received “More Rich Instruction” was higher than children in the “Rich Instruction” condition.

Word Recognition

One dialogic reading study (Reutzel, et al., 1994) reported a word recognition outcome. Reutzel and colleagues (1994) reported that second graders who received the dialogic intervention plus fluency instruction outperformed children who received the dialogic intervention alone on the Iowa Test of Basic Skills (Hoover, Hieronymus, Frisbie & Dunbar, 1993) Word Analysis subtest (ES = −1.36) and oral reading error rate (ES = −1.01). However, the dialogic intervention only group read a greater number of words per minute at post test (ES = 0.63)

Discussion

In this synthesis we sought to examine the effects of read-aloud practices on the literacy outcomes of children at-risk for reading difficulties. Read aloud instruction has been examined in several instructional formats including dialogic reading, repeated reading of stories, story reading with limited questioning before, during, and/or after reading, computer assisted story reading, and story reading with extended vocabulary activities. Dialogic reading has received the most examination in the literature. A few additional read aloud instructional techniques have been examined in single studies, reviewed in this synthesis as “other” read aloud interventions.

Previous syntheses of storybook reading have indicated a lack of high quality research, qualifying the findings and decreasing the ability to make robust statements regarding the effects of read alouds on literacy outcomes for children (Blok, 1999; Scarborough & Dobrich, 1994). Our synthesis of the current literature indicates the amount of high quality of research has increased. We found 20 treatment-comparison studies of read aloud interventions specifically for children with risk factors for reading difficulties. Fourteen studies included standardized assessments of children’s literacy outcomes, allowing for reliable and valid examination of broad literacy outcomes following interventions. The larger and higher quality corpus of studies allowed us to examine more differentiated categories of independent and dependent variables than has been previously reported.

The meta-analysis revealed significant, positive effects for read aloud interventions on children’s language, phonological awareness, print concepts, comprehension, and vocabulary outcomes. These results suggest that read aloud interventions provide children at-risk for reading difficulties with higher literacy outcomes than children who do not participate in these interventions. Although the effect on language outcomes was small and not moderated by intervention type, all other mean effects were large with intervention type explaining some of the variance in outcomes. These results are line with the recent National Early Literacy Panel (NELP, 2008) review of the effects of shared-reading interventions. The Panel reported significant effects for oral language (including listening comprehension and vocabulary) and print knowledge, but did not disaggregate these individual outcomes for children at-risk for reading difficulties. Encouragingly, our findings suggest children at-risk for reading difficulties do benefit from read aloud interventions in these areas. In contrast to the NELP (2008) results, we also found significant effects for phonological awareness outcomes suggesting a possible additional benefit for children who are at-risk for reading difficulties when read aloud interventions are implemented in educational settings.

Dialogic reading has the most causal evidence to support its effects on children’s literacy outcomes with 8 experimental studies. The meta-analysis indicated moderate to large mean effect sizes for dialogic reading interventions on child outcomes of phonological awareness, print concepts, reading comprehension, and vocabulary. Thus, the evidence suggests that extended dialogue around the read aloud with transfer of storytelling to the child can improve literacy outcomes for children at-risk for reading difficulties.

The meta-analysis also revealed that computer-assisted interventions demonstrated small to large mean effects on literacy outcomes while limited questioning interventions revealed a small, negative mean effect size on phonological awareness yet a large effect size related to vocabulary outcomes. In both cases, there were few effect sizes contributing to these mean effects, and the mean effects for limited questioning were not significantly different than the mean effect for dialogic reading. Computer-assisted interventions demonstrated significantly higher effects on reading comprehension and vocabulary outcomes for children than the dialogic reading interventions. The higher effects on vocabulary may be the result of the outcome measures. The computer-assisted studies used researcher-developed measures assessing student knowledge of the meaning of the words in the book read aloud during the intervention. In contrast, the dialogic reading studies largely measured broad vocabulary outcomes (beyond words in the intervention storybooks) on standardized measures of vocabulary.

The practice of repeatedly reading stories to children at-risk for reading difficulties has received limited attention, with only two studies examining the intervention effects on only two literacy outcomes, language and print concepts. A large mean effect was noted for print concepts; however these effects are reported from only one study and were not significantly different from the print concepts outcomes for dialogic reading. A second study found no differences in language outcomes between two versions of a repeated reading treatment, but reported significant differences between pretest and posttest scores for children involved in the treatments. The evidence suggests that repeated reading interventions may have potential for positively effecting student outcomes, but additional research is needed.

Intervention type accounted for 28% of the variance in vocabulary effect sizes. Although seemingly counterintuitive, the meta-analysis indicates that read alouds with extended vocabulary had the lowest mean effect size for vocabulary outcomes. However, this small effect was not significantly different than the moderate effect for dialogic reading. A closer examination of the vocabulary studies indicates children are making gains in the words that are taught in the intervention with the storybook, but demonstrating less gain in uninstructed vocabulary words. Two studies in this synthesis also noted differential effects on vocabulary outcomes, reporting children with the lowest vocabulary at pretest made the most gains in vocabulary (Coyne et al., 2004; Justice et al., 2005). Importantly, our synthesis reveals that participation in the different types of read aloud interventions seems to stimulate vocabulary development for children at-risk for reading difficulties, even on standardized measures of vocabulary.

Despite the positive effects for read aloud interventions, only a small amount of variance was accounted for by intervention type in the outcomes. The meta-analysis indicates that other unknown factors beyond the provided intervention explain significant amounts of variance in child outcomes on each of the measures. Similarly, several studies examining multiple read aloud treatments reported no significant differences in outcomes for the different treatment conditions, though significant pre to posttest gains were realized in each treatment. These findings may suggest that while read aloud interventions are valuable for children’s literacy outcomes, some of the specific features related to improvement have not been fully realized in the literature. Currently, the strongest evidence comes from dialogic reading interventions suggesting that incorporating extended child-adult dialogue and questioning around storybooks is a valuable practice in educational settings. Computer-assisted read alouds demonstrate promise for improving children’s literacy outcomes as well. The studies examining computer-assisted read alouds generally included interactive features allowing the child to engage in key aspects of the story line or character manipulation.

The literature provides little information regarding the long-term effects of read aloud interventions. Only 3 studies examined delayed or long-term outcomes of the interventions (Coyne et al., 2007 Study 1 and Study 2; Hindson et al., 2005). In both of their studies, Coyne et al. (2007) reported children in the read aloud intervention continued to outperform children in the control group 6 weeks after intervention on words taught in the intervention. Hindson et al. (2005) reported 1 year follow-up results indicating children in the read aloud intervention continued to perform at grade level expectations, but children not at-risk for difficulties made more gains in the year thus widening the gap. Future research on read aloud interventions should provide additional information on the long-term outcomes of participation in the interventions to better inform practice.

Limitations

We have the most confidence in reporting findings related to dialogic reading given the relatively large number of studies examining this type of intervention. Only 2–3 studies each have examined computer-assisted read alouds, repeated reading of read alouds, or limited questioning. Thus, often there was a lot of error around the estimates. The low number of studies with common intervention features precluded us from examining more specific features of intervention implementation such as group size and amount of time in intervention that may moderate student outcomes. These factors may explain some of the additional variance in child literacy outcomes and warrant continued research to inform early intervention efforts for children at-risk for reading difficulties. Finally, one should interpret the findings of the supplementary tests of differing risk criteria and standardized or researcher based assessments with caution. The very small number of effect sizes contributing to some of these supplementary tests was underpowered to accurately detect statistically significant differences, meaning that we cannot completely rule out the possibility of Type I error.

Contributor Information

Elizabeth A. Swanson, The University of Texas at Austin.

Jeanne Wanzek, Florida State University.

Yaacov Petscher, Florida Center for Reading Research.

Sharon Vaughn, The University of Texas at Austin.

Jennifer Heckert, Educational Consultant.

Christie Cavanaugh, The University of Florida.

Guliz Kraft, The University of Texas at Austin.