Wikipedia and the scientific community
Language, Speech & Song
The prerequisites for speech
Human beings are unique in being the only species known to be able to use a spoken grammatical language, although research has shown that whales and dolphins most likely use some form of grammatical structure and syntax in their communications.   
In general, paleoanthropologists have focused on the question "why" did we develop the ability to speak, rather than "how". The most common reason given was the suggestion that we acquired the ability to speak when we began to hunt on the savannah, so that we could pass hunting strategy to each other. But no other pack animal (that we know of, orca and dolphin might) uses language as a means of working together to hunt their prey. Wolves and lions seem to get on without it extremely well. But wolves and lions cannot use language to communicate their hunting strategy because they lack the three basic requisites for language.
So what are the prerequisites for a spoken language or grammatical song, and how did we acquire them?
The Origin of Articulate Language Revisited: The Potential of a Semi-Aquatic Past of Human Ancestors to Explain the Origin of Human Musicality and Articulate Language
"A musical origin of language at the evolutionary level (for the species Homo sapiens) and at the ontogenetic level (for each newborn) is parsimonious and no longer refutable.
We then should ask why song, i.e., vocal dexterity and vocal learning, was evolved in our species and why it is largely absent from other ‘terrestrial’ animals, including other primates, but present in disjoint groups such as cetaceans, seals, bats and three orders of birds? I argue that this enigma, together with a long list of other specifically human characteristics, is best understood by assuming that our recent ancestors (from 3 million years ago onwards) adopted a shallow water diving lifestyle. The swimming and diving adaptations of the upper airway (and vocal) tract led to increased vocal dexterity and song, and to increased fine tuning of motoric and mimicking abilities. These are shared by creatures that can freely move in three dimensions (swimming and flying animals) and that can respond instantaneously to the behavioural changes of other animals. Increased bodily mimicking, together with increased vocal dexterity, both a consequence of a semi-aquatic lifestyle, led to integrated song and dance, which predisposed towards producing and mimicking speech and gesture, and to the ability to use prosodic clues to learn the grammar of whichever language."
Voluntary Control of Sound Production
The crucial step in the evolution of human speech might have been the linking of varied song production (as in gibbons) with voluntary breathing (as in diving mammals). Gibbon song, like bird song, is a territorial (emotional) behavior that is not under direct voluntary will.
Mammals that regularly dive, not only have to be able to seal their airways whenever necessary to prevent water entering the lungs, but also must be able to hyperventilate whenever and exactly at the moment they intend to dive, and to hold their breath under water. Terrestrial mammals automatically breathe deeper and faster when they exercise and need more oxygen, and this is directly regulated by the partial carbon dioxide pressure (pCO2) at the respiratory centre in the brain stem. Such breathing reflex, however, would be catastrophic for a diving mammal, since their pCO2 would be higher the longer they swim under water. Diving mammals need to have direct conscious (voluntary) control of their breathing musculature when they prepare to dive and when they re-surface.
When comparing human brain organization with that of the chimpanzee, the human voluntary breath control becomes apparent. In all primates and in many other mammals, fine and voluntary muscular skeletal movements are initiated in the precentral cortex, which in humans and other primates is called Brodmann’s Area 4 of the neocortex. However, in humans – as opposed to apes and other terrestrial mammals – the breathing musculature is represented in the precentral cortex. Thus, humans differ from other hominoids in that they are able to breathe whenever they want, i.e., at free will.
Whereas cetaceans only breathe at free will (through the blowhole, the equivalent of our nostrils), humans possess two types of breathing: autonomous abdominal or diaphragmatic breathing through the nose, and free thoracal or chest breathing through the mouth. Most of the time we are not consciously breathing, and at rest we breathe automatically through the nose using our diaphragm and abdominal muscles, but when doing exercise we switch to open-mouthed breathing with thoracal musculature, using also the intercostal muscles of the rib cage. In the water, beneath the surface we hold our breath, and at the surface we breathe through the mouth, only using our thoracal muscles. Humans, unlike chimpanzees and other primates, not only have the laryngeal musculature represented in Area 4 of the neocortex, but also have a very large representation of the oral musculature in Area 4. (Interestingly, and as a consequence of this organisation, only in humans, damage of Area 4 produces muteness ).
This in turn would have made possible the production of laryngeal sounds (rhythmic tones, modified by the configuration of the lips, tongue and oral cavity into vowels) at free will, i.e., influenced by connections to other neocortical centres such as the visual (Brodmann’s Area 17 etc.) or auditory cortex (Area 41 etc.), thus making it possible to arbitrarily attribute a particular sound or melody to what was seen or heard.
When this voluntary control of laryngeal sounds was combined with the fine control of lips, tongue and velum, (initially developed) for suction feeding of slippery foods and/or underwater food manipulation (see above), this could have led to the introduction of consonants produced by brief interruptions of the airway by the tongue or lips at the labial, dental, palatal or velar regions of the oral cavity. In combination with the vowels, produced by the vocal chords and comparable to the sounds made by apes, consonants would have dramatically increased the number of possible phonemes (sound unities) available for communication.
Comparative studies between apes and monkeys indicate that the early hominoids well before the split of lesser and great apes (~ 18 Ma?) probably already had evolved a larynx that had descended in the neck relative to the hyoid bone and was able to move freely and independently of the hyoid bone, presumably for the production of loud rhythmical and melodic territorial duetting songs. After the Homo-Pan split (~ 5 Ma?), Homo populations apparently developed voluntary breath-holding abilities and the ability to close the airway entrances through the evolution of voluntary control of the oral musculature and of a round tongue, perfectly fitting in the smooth and vaulted palate, all unique among primates, and possibly explained as an adaptation to underwater and/or suction feeding on smooth seafood.
Indeed, the specific Homo characteristics are in our opinion best explained by assuming that sooner or later Homo populations dispersed to other continents along the coasts, where they collected littoral foods, not only through beach-combing and wading, but also through diving. Comparative data suggest that the consumption of slippery seafoods might help explain why human ancestors evolved flatter faces, smaller mouth openings, reduced dentition, smooth palates, round tongues, and descended hyoids. These innovations facilitated mouth closure at the labial, dental, alveolar, palatal and velar articulation places, allowing the production of consonants.
In combination with the song production already present in (some of) the early apes, this voluntary airway control made possible the extraordinary song capacities of the human species . Later, the attachment of an arbitrary meaning to a musical phrase or utterance could have resulted in songs that conveyed free information, a precursor to spoken language.
In addition, the diving-for-seafood scenario as an explanation for our voluntary breathing control and our flexible tongue and mouth coincides perfectly with the idea that the rapid expansion of the human neocortex came about as a result of an increase in the consumption of brain-specific nutrients such as DHA found in seafood.
In conclusion, on top of inherent song production as already present in early apes, the evolution of voluntary breath control, of increased oral musculature flexibility, of increased musicality and song production, and of very large brains, all four explainable – directly or indirectly - as the consequences of seafood consumption, may explain why the human species is the only species on Earth to use spoken (grammatical) language.
"Was Man More Aquatic in the Past? Fifty Years after Alister Hardy" ebook Bentham Sci.Publ.M.Vaneechoutte, S.Munro & M.Verhaegen, Chapter 12: "Seafood, diving, song and speech" p.181-9 in M.Vaneechoutte cs eds 2011
 Deacon TW. The symbolic species. New York: Norton 1997.
Seafood, Diving, Song and Speech
"Was Man More Aquatic in the Past? Fifty Years after Alister Hardy" ebook Bentham Sci.Publ.M.Vaneechoutte, S.Munro & M.Verhaegen, Chapter 12: "Seafood, diving, song and speech" p.181-9
 Nishimura T. Comparative morphology of the hyo-laryngeal complex in anthropoids: Two steps in the evolution of the descent of the larynx. Primates 2003; 44: 41-9.
Morgan, Elaine: The Descent of the Child, Souvenir Press, 1994, chapter 20: Talking, p. 134-135
Human Uniqueness Compared to "Great Apes": Absolute Difference
Humans do not have the pharyngeal air sacs that are present in many non-human primates. This may reflect an adaptation enhancing the robustness of speech communication. Speech is produced by filtering the source of acoustic energy generated at the larynx by the process of phonation in English vowels and “voiced” consonants such as [m], or by turbulent air flow at the larynx (the sound [h]), or at constrictions formed by the tongue such as the sound [s].The airway above the larynx – the supralaryngeal vocal tract (SVT) acts as a filter, allowing maximum energy to pass through it at formant frequencies that are determined by the SVT’ shape and length. Formant frequencies are arguably the primary acoustic features that differentiate the phonetic elements that specify words. Pharyngeal air sacs would absorb energy at specific frequencies, interfering with the complex perceptual processes involved in recovering the formant frequency pattern from the flow of speech. Some of the war gases used in World War One enlarged the vestigial pharyngeal air sacs present in humans. Observations of the gassed soldiers noted that their speech often was distorted and difficult to comprehend. However, if recordings of their speech were made, they have been lost. Computer modeling studies of the acoustic effects of adding pharyngeal air sacs to a human SVT would address this question. 
Music and Language Syntax Interact in Broca's Area: An fMRI Study
Richard Kunert cs 2015
A language main effect in Broca's area only emerged in the complex music harmony condition, suggesting that (with our stimuli & tasks) a language
|Website: F. Mansfield, 2015|
Disclaimer: This site is currently under construction. Every effort has been (will be!) made to trace the copyright owners of any images or text used on this site to request permission and to give proper credit. If you are the copyright holder of any images, files or text and have not been contacted, please contact the webmaster in order to rectify this.