
Haskins Laboratories
facilityNew Haven, Connecticut, United States
Research output, citation impact, and the most-cited recent papers from Haskins Laboratories (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Haskins Laboratories
Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.
(1964). A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements. WORD: Vol. 20, No. 3, pp. 384-422.
An overview of the basic ideas of articulatory phonology is presented, along with selected examples of phonological patterning for which the approach seems to provide a particularly insightful account. In articulatory phonology, the basic units of phonological contrast are gestures, which are also abstract characterizations of articulatory events, each with an intrinsic time or duration. Utterances are modeled as organized patterns (constellations) of gestures, in which gestural units may overlap in time. The phonological structures defined in this way provide a set of articulatorily based natural classes. Moreover, the patterns of overlapping organization can be used to specify important aspects of the phonological structure of particular languages, and to account, in a coherent and general way, for a variety of different types of phonological variation. Such variation includes allophonic variation and fluent speech alternations, as well as 'coarticulation' and speech errors. Finally, it is suggested that the gestural approach clarifies our understanding of phonological development, by positing that prelinguistic units of action are harnessed into (gestural) phonological structures through differentiation and coordination.
Abstract Over the past few decades, neuroimaging has become a ubiquitous tool in basic research and clinical studies of the human brain. However, no reference standards currently exist to quantify individual differences in neuroimaging metrics over time, in contrast to growth charts for anthropometric traits such as height and weight 1 . Here we assemble an interactive open resource to benchmark brain morphology derived from any current or future sample of MRI data ( http://www.brainchart.io/ ). With the goal of basing these reference charts on the largest and most inclusive dataset available, acknowledging limitations due to known biases of MRI studies relative to the diversity of the global population, we aggregated 123,984 MRI scans, across more than 100 primary studies, from 101,457 human participants between 115 days post-conception to 100 years of age. MRI metrics were quantified by centile scores, relative to non-linear trajectories 2 of brain structural changes, and rates of change, over the lifespan. Brain charts identified previously unreported neurodevelopmental milestones 3 , showed high stability of individuals across longitudinal assessments, and demonstrated robustness to technical and methodological differences between primary studies. Centile scores showed increased heritability compared with non-centiled MRI phenotypes, and provided a standardized measure of atypical brain structure that revealed patterns of neuroanatomical variation across neurological and psychiatric disorders. In summary, brain charts are an essential step towards robust quantification of individual variation benchmarked to normative trajectories in multiple, commonly used neuroimaging phenotypes.
In listening to speech, one typically reduces the number and variety of the many sounds with which he is bombarded by casting them into one or another of the phoneme categories that his language allows. Thus, a listener will identify as b, for example, quite a large number of acoustically different sounds. Although these differences are likely to be many and various, some of them will occur along an acoustic continuum that contains cues for a different phoneme, such as d. This is important for the present study because it provides a basis for the question to be examined here: whether or not, with similar acoustic differences, a listener can better discriminate between sounds that lie on opposite sides of a phoneme boundary than he can between sounds that fall within the same phoneme category. There are grounds for expecting an affirmative answer to this question. The most obvious, perhaps, are to be found in the common experience that in learning a new language one often
We have argued that dynamically defined articulatory gestures are the appropriate units to serve as the atoms of phonological representation. Gestures are a natural unit, not only because they involve task-oriented movements of the articulators, but because they arguably emerge as prelinguistic discrete units of action in infants. The use of gestures, rather than constellations of gestures as in Root nodes, as basic units of description makes it possible to characterise a variety of language patterns in which gestural organisation varies. Such patterns range from the misorderings of disordered speech through phonological rules involving gestural overlap and deletion to historical changes in which the overlap of gestures provides a crucial explanatory element. Gestures can participate in language patterns involving overlap because they are spatiotemporal in nature and therefore have internal duration. In addition, gestures differ from current theories of feature geometry by including the constriction degree as an inherent part of the gesture. Since the gestural constrictions occur in the vocal tract, which can be charactensed in terms of tube geometry, all the levels of the vocal tract will be constricted, leading to a constriction degree hierarchy. The values of the constriction degree at each higher level node in the hierarchy can be predicted on the basis of the percolation principles and tube geometry. In this way, the use of gestures as atoms can be reconciled with the use of Constriction degree at various levels in the vocal tract (or feature geometry) hierarchy. The phonological notation developed for the gestural approach might usefully be incorporated, in whole or in part, into other phonologies. Five components of the notation were discussed, all derived from the basic premise that gestures are the primitive phonological unit, organised into gestural scores. These components include (1) constriction degree as a subordinate of the articulator node and (2) stiffness (duration) as a subordinate of the articulator node. That is, both CD and duration are inherent to the gesture. The gestures are arranged in gestural scores using (3) articulatory tiers, with (4) the relevant geometry (articulatory, tube or feature) indicated to the left of the score and (5) structural information above the score, if desired. Association lines can also be used to indicate how the gestures are combined into phonological units. Thus, gestures can serve both as characterisations of articulatory movement data and as the atoms of phonological representation.
ABSTRACT We propose an approach to phonological representation based on describing an utterance as an organised pattern of overlapping articulatory gestures. Because movement is inherent in our definition of gestures, these gestural ‘constellations’ can account for both spatial and temporal properties of speech in a relatively simple way. At the same time, taken as phonological representations, such gestural analyses offer many of the same advantages provided by recent nonlinear phonological theories, and we give examples of how gestural analyses simplify the description of such ‘complex segments’ as /s/–stop clusters and prenasalised stops. Thus, gestural structures can be seen as providing a principled link between phonological and physical description.
Learning to read requires an awareness that spoken words can be decomposed into the phonologic constituents that the alphabetic characters represent. Such phonologic awareness is characteristically lacking in dyslexic readers who, therefore, have difficulty mapping the alphabetic characters onto the spoken word. To find the location and extent of the functional disruption in neural systems that underlies this impairment, we used functional magnetic resonance imaging to compare brain activation patterns in dyslexic and nonimpaired subjects as they performed tasks that made progressively greater demands on phonologic analysis. Brain activation patterns differed significantly between the groups with dyslexic readers showing relative underactivation in posterior regions (Wernicke's area, the angular gyrus, and striate cortex) and relative overactivation in an anterior region (inferior frontal gyrus). These results support a conclusion that the impairment in dyslexia is phonologic in nature and that these brain activation patterns may provide a neural signature for this impairment.
A three-tone sinusoidal replica of a naturally produced utterance was identified by listeners, despite the readily apparent unnatural speech quality of the signal. The time-varying properties of these highly artificial acoustic signals are apparently sufficient to support perception of the linguistic message in the absence of traditional acoustic cues for phonetic segments.
Earlier experiments with dichotically presented nonsense syllables had suggested that perception of the sounds of speech depends upon unilateral processors located in the cerebral hemisphere dominant for language. Our aim in this study was to pull the speech signal apart to test its components in order to determine, if possible, which aspects of the perceptual process depend upon the specific language processing machinery of the dominant hemisphere. The stimuli were spoken consonant-vowel-consonant syllables presented in dichotic pairs which contrasted in only one phone (initial stop consonant, final stop consonant, or vowel). Significant right-ear advantages were found for initial and final stop consonants, nonsignificant right-ear advantages for six medial vowels, and significant right-ear advantages for the articulatory features of voicing and place of production in stop consonants. Analysis of correct responses and errors showed that consonant features are processed independently, in agreement with earlier research employing other methods. Evidence is put forward for the view that specialization of the dominant hemisphere in speech perception is due to its possession of a linguistic device, not to specialized capacities for auditory analysis. We have concluded that, while the general auditory system common to both hemispheres is equipped to extract the auditory parameters of a speech signal, the dominant hemisphere may be specialized for the extraction of linguistic features from those parameters.
Memory is fleeting. New material rapidly obliterates previous material. How, then, can the brain deal successfully with the continual deluge of linguistic input? We argue that, to deal with this "Now-or-Never" bottleneck, the brain must compress and recode linguistic input as rapidly as possible. This observation has strong implications for the nature of language processing: (1) the language system must "eagerly" recode and compress linguistic input; (2) as the bottleneck recurs at each new representational level, the language system must build a multilevel linguistic representation; and (3) the language system must deploy all available information predictively to ensure that local linguistic ambiguities are dealt with "Right-First-Time"; once the original input is lost, there is no way for the language system to recover. This is "Chunk-and-Pass" processing. Similarly, language learning must also occur in the here and now, which implies that language acquisition is learning to process, rather than inducing, a grammar. Moreover, this perspective provides a cognitive foundation for grammaticalization and other aspects of language change. Chunk-and-Pass processing also helps explain a variety of core properties of language, including its multilevel representational structure and duality of patterning. This approach promises to create a direct relationship between psycholinguistics and linguistic theory. More generally, we outline a framework within which to integrate often disconnected inquiries into language processing, language acquisition, and language change and evolution.
By watching each other's lower oscillating leg, 2 seated Ss kept a common tempo and a particular phase relation of either 0 degrees (symmetric mode) or 180 degrees (alternate mode). This study investigated the differential stability of the 2 phase modes. In Experiment 1, in which Ss were instructed to remain in the initial phase mode, the alternate phase mode was found to be less stable as the frequency of oscillation increased. In addition, analysis of the nonsteady state cycles revealed evidence of a switching to the symmetric phase mode for the initial alternate phase mode trials. In Experiments 2 and 3, Ss were instructed to remain at a noninitial phase angle if it was found to be more comfortable. The transition observed between the 2 phase modes satisfies the criteria of a physical bifurcation--hysteresis, critical fluctuations, and divergence--and is consonant with previous findings on transitions in limb coordination within a person.
An abstract is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.
Classic non-native speech perception findings suggested that adults have difficulty discriminating segmental distinctions that are not employed contrastively in their own language. However, recent reports indicate a gradient of performance across non-native contrasts, ranging from near-chance to near-ceiling. Current theoretical models argue that such variations reflect systematic effects of experience with phonetic properties of native speech. The present research addressed predictions from Best's perceptual assimilation model (PAM), which incorporates both contrastive phonological and noncontrastive phonetic influences from the native language in its predictions about discrimination levels for diverse types of non-native contrasts. We evaluated the PAM hypotheses that discrimination of a non-native contrast should be near-ceiling if perceived as phonologically equivalent to a native contrast, lower though still quite good if perceived as a phonetic distinction between good versus poor exemplars of a single native consonant, and much lower if both non-native segments are phonetically equivalent in goodness of fit to a single native consonant. Two experiments assessed native English speakers' perception of Zulu and Tigrinya contrasts expected to fit those criteria. Findings supported the PAM predictions, and provided evidence for some perceptual differentiation of phonological, phonetic, and nonlinguistic information in perception of non-native speech. Theoretical implications for non-native speech perception are discussed, and suggestions are made for further research.
The discovery of audiovisual mirror neurons in monkeys gave rise to the hypothesis that premotor areas are inherently involved not only when observing actions but also when listening to action-related sound. However, the whole-brain functional formation underlying such "action-listening" is not fully understood. In addition, previous studies in humans have focused mostly on relatively simple and overexperienced everyday actions, such as hand clapping or door knocking. Here we used functional magnetic resonance imaging to ask whether the human action-recognition system responds to sounds found in a more complex sequence of newly acquired actions. To address this, we chose a piece of music as a model set of acoustically presentable actions and trained non-musicians to play it by ear. We then monitored brain activity in subjects while they listened to the newly acquired piece. Although subjects listened to the music without performing any movements, activation was found bilaterally in the frontoparietal motor-related network (including Broca's area, the premotor region, the intraparietal sulcus, and the inferior parietal region), consistent with neural circuits that have been associated with action observations, and may constitute the human mirror neuron system. Presentation of the practiced notes in a different order activated the network to a much lesser degree, whereas listening to an equally familiar but motorically unknown music did not activate this network. These findings support the hypothesis of a "hearing-doing" system that is highly dependent on the individual's motor repertoire, gets established rapidly, and consists of Broca's area as its hub.
The language environment modifies the speech perception abilities found in early development. In particular, adults have difficulty perceiving many nonnative contrasts that young infants discriminate. The underlying perceptual reorganization apparently occurs by 10-12 months. According to one view, it depends on experiential effects on psychoacoustic mechanisms. Alternatively, phonological development has been held responsible, with perception influenced by whether the nonnative sounds occur allophonically in the native language. We hypothesized that a phonemic process appears around 10-12 months that assimilates speech sounds to native categories whenever possible; otherwise, they are perceived in auditory or phonetic (articulatory) terms. We tested this with English-speaking listeners by using Zulu click contrasts. Adults discriminated the click contrasts; performance on the most difficult (80% correct) was not diminished even when the most obvious acoustic difference was eliminated. Infants showed good discrimination of the acoustically modified contrast even by 12-14 months. Together with earlier reports of developmental change in perception of nonnative contrasts, these findings support a phonological explanation of language-specific reorganization in speech perception.
The cerebral organization of word identification processes in reading was examined using functional magnetic resonance imaging (fMRI). Changes in fMRI signal intensities were measured in 38 subjects (19 males and 19 females) during visual (line judgement), orthographic (letter case judgement), phonological (nonword rhyme judgement) and semantic (semantic category judgement) tasks. A strategy of multiple subtractions was employed in order to validate relationships between structure and function. Orthographic processing made maximum demands on extrastriate sites, phonological processing on a number of frontal and temporal sites, and lexical-semantic processing was most strongly associated with middle and superior temporal sites. Significant sex differences in the cerebral organization of reading-related processes were also observed.
Previous studies with synthetic speech have shown that second-formant transitions are cues for the perception of the stop and nasal consonants. The results of those experiments can be simplified if it is assumed that each consonant has a characteristic and fixed frequency position, or locus, for the second formant, corresponding to the relatively fixed place of production of the consonant. On that basis, the transitions may be regarded as “movements” from the locus to the steady state of the vowel. The experiments reported in this paper provide additional evidence concerning the existence and positions of these second-formant loci for the voiced stops, b, d, and g. There appears to be a locus for d at 1800 cps and for b at 720 cps. A locus for g can be demonstrated only when the adjoining vowel has its second formant above about 1200 cps; below that level no g locus was found. The results of these experiments indicate that, for the voiced stops, the transition cannot begin at the locus and go from there to the steady-state level of the vowel. Rather, if we are to hear the appropriate consonant, the first part of the transition must be silent. The voiced stops are best synthesized by making the duration of the silent interval equal to the duration of the transition itself. An experiment on the first formant revealed that its locus is the same for b, d, and g.
Converging evidence from a number of neuroimaging studies, including our own, suggest that fluent word identification in reading is related to the functional integrity of two consolidated left hemisphere (LH) posterior systems: a dorsal (temporo-parietal) circuit and a ventral (occipito-temporal) circuit. This posterior system is functionally disrupted in developmental dyslexia. Reading disabled readers, relative to nonimpaired readers, demonstrate heightened reliance on both inferior frontal and right hemisphere posterior regions, presumably in compensation for the LH posterior difficulties. We propose a neurobiological account suggesting that for normally developing readers the dorsal circuit predominates at first, and is associated with analytic processing necessary for learning to integrate orthographic features with phonological and lexical-semantic features of printed words. The ventral circuit constitutes a fast, late-developing, word identification system which underlies fluent word recognition in skilled readers.
In three experiments we show that articulatory patterns in response to jaw perturbations are specific to the utterance produced. In Experiments 1 and 2, an unexpected constant force load (5.88 N) applied during upward jaw motion for final /b/ closure in the utterance /baeb/ revealed nearly immediate compensation in upper and lower lips, but not the tongue, on the first perturbation trial. The same perturbation applied during the utterance /baez/ evoked rapid and increased tongue-muscle activity for /z/ frication, but no active lip compensation. Although jaw perturbation represented a threat to both utterances, no perceptible distortion of speech occurred. In Experiment 3, the phase of the jaw perturbation was varied during the production of bilabial consonants. Remote reactions in the upper lip were observed only when the jaw was perturbed during the closing phase of motion. These findings provide evidence for flexibly assembled coordinative structures in speech production.