Statistical learning, learning biases and the nature of human languages
How do we learn language? In particular, how do young children end up as competent adult-like speakers of their native language(s)? I am interested in how the way in which children process the language input they receive leads them to derive a languages “system” which can produce and understand novel utterances they have never heard before. I have a longstanding interest in how statistical learning over the language input allows children to identify reoccurring patterns and leads to appropriate generalization. Recently, we have become interested in whether these processes are underpinned discriminative learning – drawing on a a well understood theory of learning developed in the study of animal learning. This work is in collaboration with Michael Ramscar (personal web-page) and is funded by a grant from the Leverhulme Trust.
The primarily methodology we used to explore these questions is Artificial Language Learning experiments, where participants learn and are tested on novel languages created by the experimenter. The languages can be very simple, but this provides a controlled methodology for exploring how the statistics of language input affect what is learned. Our experiments have explored how the structure of language input can affect the extent to which learners extract generalization [Wonnacott et al., 2017, 2012, 2008; Wonnacott, 2011; Perfors et al., 2010]. A current collaboration with Ben Ambridge continues this work as part of an ERC funded project. Ongoing experiments explore what types of input lead learners to avoid over-generalization of linguistic constructions (e.g. not to generalize the verb “carry” to the construction *he carried the child the parcel). There is evidence that frequently hearing utterances such as *he carried the parcel to the child plays a role, but what about the learners’ more general experience of hearing “carry” in other constructions?
Other work uses similar artificial language learning methodology to looks at learners’ biases and how these might shape human languages. For example, languages exhibit variation: in English the precise way in which we pronounce the plural marker -s varies (e.g. sometimes “s” e.g. “cats”, sometimes “z” e.g. “dogs”). While it seems logically possible that this kind of variation could occur completely at random (e.g. randomly chose to produce “s” or “z”), this type of behaviour very rarely (possibly never) occurs in human languages. Instead, linguistic variation is predictable: in the case of -s, the pronunciation is predictable from the last sound in the noun. Why do languages work like this? Seminal work by Hudson Kam & Newport suggests that this is due to strong learning biases in children which lead them to regularize inconsistent input. With collaborators Kenny Smith and Olga Feher, we have been exploring the extent to which these biases for regularization are also present to a weaker extent in adult learners, how they might be exacerbated by interactions between language users, and how this might influence language structure [Smith et al., 2017; Feher et al., 2016; Smith & Wonnacott, 2010]. With collaborator Anna Samara we have also looked at whether children and adults can learn probabilistic social linguistic conditioning (i.e. learning that certain speakers are more likely to use some forms than others) [Samara et al., 2017].
Researchers in the Language Learning Lab
Anna Samara (former postdoc, now collaborator)
Second language learning in children and adults
See our ESRC research grant page here, and click here for more details on our past Teachers Workshops on Second Language Learning in the Primary Years.
This research program explores how the statistical structure of the input can affect learning of a modern foreign language, and how this differs for learners of different ages. This has potential implications for language teaching in schools.
One set of experiments, with Anastasia Giannakopoulou, Helen Brown
and Meghan Clayards, explores the learning of non-native speech contrasts (e.g. Greek speakers of English learning the difference between “sheep” and “ship”). There is evidence that adults learn better when speech sounds are exemplified across multiple contexts, however our experiments suggest that this is not necessarily the case for children, at least using standard language training methods [Giannakopoulou et al., 2017, Brekelmans, 2020]. Similarly, in the area of vocabulary learning, we have found that adults are better able to recall words they have hear from multiple talkers, but this is not beneficial for Children [Sinkeviciute et al., 2019]. An experiment looking at learning of tones in Mandarin Chinese found no evidence for a benefit of hearing multiple talkers even in adults [Dong et al., 2019].
Another experiment looks at learning of grammatical gender classes (i.e. the division of words into masculine and feminine) in Italian. 7 year old children learned Italian words by playing a computerized training game. The words were “marked” as masculine and feminine – masculine words were preceded by the word “il” and feminine words by the word “la”, and masculine words ended in an “o” and feminine words ended in an “a”. We found that children showed strong learning of the gender markings for the trained words, but there was only weak generalization of the patterns (as seen with for new word) [Brown et al., 2016]. Ongoing experiments explore how manipulating the input boosts learning (for example, “staging” the input so that singulars are learned before plurals; “skewing” the input, so that one marker is more frequent than the other).
This research it has been funded by a research grant from the ESRC held in collaboration with Dr Helen Brown (see the link at the top) as well as an SSHRC Insight Grant held by collaborator Meghan Clayards.
Researchers in the Language Learning Lab
Some of our research explores the processes involved in spelling development, in collaboration with Anna Samara. Spelling is a complex and challenging task, particularly in orthographies where letters and sounds do not have one-to-one correspondence (e.g., in English, vowel sounds can be spelled in as many as five different ways!). In line with increasing evidence that memorization and explicit learning skills do not suffice for competent spelling skill to develop, we investigate spellers’ frequency-based sensitivity: For example, can beginner spellers pick up on untaught orthographic conventions (e.g. gz and dz are illegal spellings of frequent word-final sound combinations in English; *bagz, *padz) from simple text exposure and what are the computational mechanisms at play? We address these questions using artificial lexicons, i.e., novel words which exemplify spelling patterns akin to those seen in natural orthographies. We incidentally expose participants to these words and subsequently, ask them make judgments about unseen words which either follow or violate the novel spelling patterns. Using these methods, we have shown that frequency statistics do have an influence on children’s spelling preferences: For example, beginning spellers rapidly learn and generalize over novel orthographic conventions for permissible letter contexts (e.g., d and o cannot occur next to one another) [Samara & Caravolas, 2014] both when these are embedded within rime units (i.e., vowel-plus-final-consonant units) and body units (i.e., initial consonant plus-vowel units) [Samara, Singh, & Wonnacott, in preparation]. Our ongoing work compares children’s ability to learn different types of statistics in orthographic stimuli e.g. co-occurrence frequency vs. conditional probability) and explores co-dependencies with the processes of extracting similar statistics from spoken input.
Other research in reading development has been conducted in collaboration with Prof Kate Nation (Oxford) and Dr Holly Joseph (Reading). For example, we have shown experimentally [Joseph et al., 2014] that so-called “Age of Acquisition” effects in word reading (i.e. the fact that the age at which a word is first encountered affects the way it is later processed as a mature reader) can result from the order in which words are encountered during learning: if we teach participants new words via passive reading exposure, early exposed words are subsequently read differently from later exposed words. This speaks against accounts of reading development where Age-of-Acquisition results from changing brain plasticity in developing readers, or where these effects are an epiphenomena due to other statistical properties (which were all controlled in the study). We have also conducted various eye-tracking experiments exploring children’s reading of syntactically ambiguous sentences [Wonnacott et al., 2016]. We were particularly interested in the relationship between children’s online processing (as revealed by eye movements) and offline comprehension.
Researchers in the Language Learning Lab