Aquatic Ape Human Ancestor Theory

Aquatic Ape Theory - What is it?

A Brief Summary of AAT - key arguments

A Brief History and Key Proponents of AAT

Current Aquatic Evolution Theories


Alternative theories of human evolution

Wikipedia and the scientific community

. Anatomical Evidence
... Bipedalism
... Birth and babies
... Brain
... Breath control
... Fat
... Fingers, toes and feet
... Furlessness
... Hair and baldness
... Kidneys
... Menopause
... Nose
... Olfactory sense
... Pachyostosis
... Paranasal Sinuses
... Platycephaly
... Sexual features
... Surfer's ear
... Sweating
... Tears
... Underwater vision

. Diet
. Language & Song
. Sleep (USWS)
. Waterside environments
. Sea Gypsies

. Homo erectus - shallow diver

. Fossil evidence
. Paleoecological evidence

A call to scientists...

Recent News and Updates

Books and publications

Videos links



Language, Speech & Song

The prerequisites for speech

Human beings are unique in being the only species known to be able to use a spoken grammatical language, although research has shown that whales and dolphins most likely use some form of grammatical structure and syntax in their communications. [1] [2] [3]

In general, paleoanthropologists have focused on the question "why" did we develop the ability to speak, rather than "how". The most common reason given was the suggestion that we acquired the ability to speak when we began to hunt on the savannah, so that we could pass hunting Wolves hunting as a packstrategy to each other. But no other pack animal (that we know of, orca and dolphin might) uses language as a means of working together to hunt their prey. Wolves and lions seem to get on without it extremely well. But wolves and lions cannot use language to communicate their hunting strategy because they lack the three basic requisites for language.

So what are the prerequisites for a spoken language or grammatical song, and how did we acquire them?

  1. Sufficient intelligence: Dogs that bark at an intruder, for instance, are responding by instinct to a perceived threat but are limited by their vocal range and intelligence to basically warn: "Stay away or I'll bite you!" Being able to adapt one's message to include greater descriptive details (nouns, adjectives, verbs,adverbs, prepositions, conjunctions, relative clauses, etc.) and tense modifiers which relate to when (past, present, future) can only be produced by an animal that has a clear and conscious understanding of these elements. We also need to be able to formulate and understand what we want to say before we say it. The human brain is relatively larger and more complex than our ape cousins' brains and, as studies have shown, higher intelligence and brain growth in humans can be linked to a diet rich in long-chain polyunsaturated fatty acids, notably DHA and omega 3, which is abundently found in seafood.

  2. Suitable vocal apparatus: The sounds we produce stem from the larynx, are modified by the pharynx and specifically morphed into vowel or consonant sounds by the palate, the tongue and the lips. Most primates are already equipped for vocal communication. Darwin suggested that one of the origins of speech could be found in early-hominoid territorial sound production, eg. gibbon duetting (song, tones, rhythm, dialogue). However, humans have several remarkable differences when compared to chimpanzees and other primates, not only in the nasal, oral, pharyngeal and laryngeal anatomy (Fig. 1), but also in the neurological control of these structures. It appears that these morphological changes could have developed as a result of eating soft and slippery foods such as shell-fish, which require relatively little chewing and can be swallowed whole under water. These adaptations in humans can best be explained by part-time collecting shallow aquatic food including through shallow diving.

  3. Breath control abilities: We speak by being able to control the flow of air over our larynx and vocal cords. If we could not control our breathing we would not be able to speak. Breath control abilities are also a prerequisite for any diving air-breathing animal.

Midsagittal sections through chimpanzee and human heads.

The Origin of Articulate Language Revisited: The Potential of a Semi-Aquatic Past of Human Ancestors to Explain the Origin of Human Musicality and Articulate Language

Vaneechoutte, M.
Laboratory Bacteriology Research, Faculty of Medicine & Health, Sciences, University of Ghent, De Pintelaan 185, 9000 Gent, Belgium.

"A musical origin of language at the evolutionary level (for the species Homo sapiens) and at the ontogenetic level (for each newborn) is parsimonious and no longer refutable.

We then should ask why song, i.e., vocal dexterity and vocal learning, was evolved in our species and why it is largely absent from other ‘terrestrial’ animals, including other primates, but present in disjoint groups such as cetaceans, seals, bats and three orders of birds? I argue that this enigma, together with a long list of other specifically human characteristics, is best understood by assuming that our recent ancestors (from 3 million years ago onwards) adopted a shallow water diving lifestyle. The swimming and diving adaptations of the upper airway (and vocal) tract led to increased vocal dexterity and song, and to increased fine tuning of motoric and mimicking abilities. These are shared by creatures that can freely move in three dimensions (swimming and flying animals) and that can respond instantaneously to the behavioural changes of other animals. Increased bodily mimicking, together with increased vocal dexterity, both a consequence of a semi-aquatic lifestyle, led to integrated song and dance, which predisposed towards producing and mimicking speech and gesture, and to the ability to use prosodic clues to learn the grammar of whichever language."

Read the full pdf document

Voluntary Control of Sound Production

Placido Domingo - opera singers need amazing breath control abilities, as do divers. The crucial step in the evolution of human speech might have been the linking of varied song production (as in gibbons) with voluntary breathing (as in diving mammals). Gibbon song, like bird song, is a territorial (emotional) behavior that is not under direct voluntary will.

Mammals that regularly dive, not only have to be able to seal their airways whenever necessary to prevent water entering the lungs, but also must be able to hyperventilate whenever and exactly at the moment they intend to dive, and to hold their breath under water. Terrestrial mammals automatically breathe deeper and faster when they exercise and need more oxygen, and this is directly regulated by the partial carbon dioxide pressure (pCO2) at the respiratory centre in the brain stem. Such breathing reflex, however, would be catastrophic for a diving mammal, since their pCO2 would be higher the longer they swim under water. Diving mammals need to have direct conscious (voluntary) control of their breathing musculature when they prepare to dive and when they re-surface.

When comparing human brain organization with that of the chimpanzee, the human voluntary breath control becomes apparent. In all primates and in many other mammals, fine and voluntary muscular skeletal movements are initiated in the precentral cortex, which in humans and other primates is called Brodmann’s Area 4 of the neocortex. However, in humans – as opposed to apes and other terrestrial mammals – the breathing musculature is represented in the precentral cortex. Thus, humans differ from other hominoids in that they are able to breathe whenever they want, i.e., at free will.

The 'nostrils' of a whale's blowhole.Whereas cetaceans only breathe at free will (through the blowhole, the equivalent of our nostrils), humans possess two types of breathing: autonomous abdominal or diaphragmatic breathing through the nose, and free thoracal or chest breathing through the mouth. Most of the time we are not consciously breathing, and at rest we breathe automatically through the nose using our diaphragm and abdominal muscles, but when doing exercise we switch to open-mouthed breathing with thoracal musculature, using also the intercostal muscles of the rib cage. In the water, beneath the surface we hold our breath, and at the surface we breathe through the mouth, only using our thoracal muscles. Humans, unlike chimpanzees and other primates, not only have the laryngeal musculature represented in Area 4 of the neocortex, but also have a very large representation of the oral musculature in Area 4. (Interestingly, and as a consequence of this organisation, only in humans, damage of Area 4 produces muteness [1]).
Moreover, humans have direct fibers connecting Area 4 to the nucleus ambiguus (cortico-ambiguus connections), so they can voluntarily control the larynx muscles (nucleus ambiguus) and the breathing muscles (brain stem). When humans acquired this expansion of Area 4 to include breathing musculature (as required for diving), this would have encompassed or intensified the representation of the laryngeal musculature in Area 4, and brought laryngeal sound production under strong voluntary control.

This in turn would have made possible the production of laryngeal sounds (rhythmic tones, modified by the configuration of the lips, tongue and oral cavity into vowels) at free will, i.e., influenced by connections to other neocortical centres such as the visual (Brodmann’s Area 17 etc.) or auditory cortex (Area 41 etc.), thus making it possible to arbitrarily attribute a particular sound or melody to what was seen or heard.

Representaton of body parts in Brodmann’s Area 4.When this voluntary control of laryngeal sounds was combined with the fine control of lips, tongue and velum, (initially developed) for suction feeding of slippery foods and/or underwater food manipulation (see above), this could have led to the introduction of consonants produced by brief interruptions of the airway by the tongue or lips at the labial, dental, palatal or velar regions of the oral cavity. In combination with the vowels, produced by the vocal chords and comparable to the sounds made by apes, consonants would have dramatically increased the number of possible phonemes (sound unities) available for communication.

Comparative studies between apes and monkeys indicate that the early hominoids well before the split of lesser and great apes (~ 18 Ma?) probably already had evolved a larynx that had descended in the neck relative to the hyoid bone and was able to move freely and independently of the hyoid bone, presumably for the production of loud rhythmical and melodic territorial duetting songs. After the Homo-Pan split (~ 5 Ma?), Homo populations apparently developed voluntary breath-holding abilities and the ability to close the airway entrances through the evolution of voluntary control of the oral musculature and of a round tongue, perfectly fitting in the smooth and vaulted palate, all unique among primates, and possibly explained as an adaptation to underwater and/or suction feeding on smooth seafood.

Indeed, the specific Homo characteristics are in our opinion best explained by assuming that sooner or later Homo populations dispersed to other continents along the coasts, where they collected littoral foods, not only through beach-combing and wading, but also through diving. Comparative data suggest that the consumption of slippery seafoods might help explain why human ancestors evolved flatter faces, smaller mouth openings, reduced dentition, smooth palates, round tongues, and descended hyoids. These innovations facilitated mouth closure at the labial, dental, alveolar, palatal and velar articulation places, allowing the production of consonants.

In combination with the song production already present in (some of) the early apes, this voluntary airway control made possible the extraordinary song capacities of the human species [2]. Later, the attachment of an arbitrary meaning to a musical phrase or utterance could have resulted in songs that conveyed free information, a precursor to spoken language.

In addition, the diving-for-seafood scenario as an explanation for our voluntary breathing control and our flexible tongue and mouth coincides perfectly with the idea that the rapid expansion of the human neocortex came about as a result of an increase in the consumption of brain-specific nutrients such as DHA found in seafood.

In conclusion, on top of inherent song production as already present in early apes, the evolution of voluntary breath control, of increased oral musculature flexibility, of increased musicality and song production, and of very large brains, all four explainable – directly or indirectly - as the consequences of seafood consumption, may explain why the human species is the only species on Earth to use spoken (grammatical) language.


"Was Man More Aquatic in the Past? Fifty Years after Alister Hardy" ebook Bentham Sci.Publ.M.Vaneechoutte, S.Munro & M.Verhaegen, Chapter 12: "Seafood, diving, song and speech" p.181-9 in M.Vaneechoutte cs eds 2011

[1] Deacon TW. The symbolic species. New York: Norton 1997.
[2] Vaneechoutte M, Skoyles J. The memetic origin of language: Humans as musical primates. J Memetics 1999; 2. Available at: Cited 2010 October 25.

Seafood, Diving, Song and Speech

Abstract: In this paper we present comparative data suggesting that the various elements of human speech evolved at different times, and originally had different functions. Recent work by Nishimura [1-6] shows that what is commonly known as the laryngeal descent actually evolved in a mosaic way in minimally 2 steps: (a) a descent of the thyroid cartilage (Adam¹s apple) relative to the hyoid (tongue bone), a descent also seen in non-human hominoids; (b) a descent of the hyoid bone relative to the palate, which is less obvious in non-human hominoids, and which is accentuated by the absence of prognathism in the short & flat human face. Comparisons with other animals suggest: (a) the 1st descent might be associated with loud or/and varied sound production; (b) the 2d might be part of an adaptation to eating seafoods such as shellfish, which can be sucked into the mouth & swallowed without chewing, even underwater. We argue that the origin of human speech is based on different preadaptations that were present in human ancestors: (a) sound production adaptations related to the descent of the thyroid cartilage associated with the territorial calls of apes; (b) transformation of the oral& dentitional anatomy including the descent of the hyoid, associated with reduced biting & chewing; (c) diving adaptations, leading to voluntary control of the airway entrances & voluntary breath control. Whereas chimpanzee ancestors became frugivores in tropical forests after they split from human ancestors c 5 Ma, human ancestors became littoral omnivores, explaining why chimpanzees did not evolve language skills, why human language is a relatively recent phenomenon, and why it is so strongly dependent upon the availability of voluntary breath control, not seen in other hominoids, but clearly present in diving mammals.


"Was Man More Aquatic in the Past? Fifty Years after Alister Hardy" ebook Bentham Sci.Publ.M.Vaneechoutte, S.Munro & M.Verhaegen, Chapter 12: "Seafood, diving, song and speech" p.181-9

[1] Nishimura T. Comparative morphology of the hyo-laryngeal complex in anthropoids: Two steps in the evolution of the descent of the larynx. Primates 2003; 44: 41-9.
[2] Nishimura T, Mikami A, Suzuki J, Matsuzawa T. Descent of the larynx in chimpanzee infants. Proc Natl Acad Sci USA 2003; 100: 6930-3.
[3] Nishimura T. Descent of the larynx in chimpanzees: Mosaic and multiple-step evolution of the foundations for human speech. In: Matsuzawa T, Tomonaga M, Tanaka M, Eds. Cognitive development in chimpanzees. Tokyo: Springer 2006; pp. 75-95.
[4] Nishimura T, Mikami A, Suzuki, J, Matsuzawa T. Descent of the hyoid in chimpanzees: Evolution of face flattening and speech. J Human Evol 2006; 51: 244-54.
[5] Nishimura T. Understanding the dynamics of primate vocalization and its implications for the evolution of human speech. In: Masataka N, Ed. The origin of language. Tokyo: Springer 2008; pp. 111-30.
[6] Nishimura T, Oishi T, Suzuki J, Matsuda K, Takahash T. Development of the supralaryngeal vocal tract in Japanese macaques: Implications for the evolution of the descent of the larynx. Am J Phys Anthropol 2008; 135: 182-94.


Elaine Morgan's book: The Descent of the Child

"There was a popular theory that man became intelligent when he became carnivorous, because carnivores spend less of their time eating, so they have leisure hours in which to play and experiment and indulge their curiosity — which is, as often as necessity, the mother of invention. It is said, too, that the secret of humanity's success is the prolonged retention of a childlike openness to new experiences and ideas. But even among carnivores, the age group which does most of the playing does not consist of adults, and it is not easy to see why being childlike should be productive of innovation in everyone except children.

There are three points worth making in this connection. One is that when instinctive sound repertoires in primates differ radically between the infants and the adult — as, for example, in mouse lemurs and bush babies — the infant repertoire is always the more varied and extensive.

The second is that girls learn to speak sooner and more fluently than boys. This seems to suggest that ancestrally vocal commuication was more important within the mother/infant relationship than it was in the case of other social relationships.

The third is that in the first four of five years of life a language — or even two or three languages simultaneously — can be acquired avidly and apparently without effort. From then on, new languages become harder to learn; and if no language is learned before the age of five, the ability to acquire one at all is severly impaired. This is not true of any other human skill.

In other words, adults are incapable of learning to speak. Only children can do it. It may conceivably be true that adults nevertheless invented speech but, if so, then some evidence must be advanced in support of the proposition. It is by no means self-evident. What adults are extremely good at is exploiting this tool acquired in childhood to pass on the accumulated knowledge of their later years to the children who come after them."

Morgan, Elaine: The Descent of the Child, Souvenir Press, 1994, chapter 20: Talking, p. 134-135

Human Uniqueness Compared to "Great Apes": Absolute Difference

Humans do not have the pharyngeal air sacs that are present in many non-human primates. This may reflect an adaptation enhancing the robustness of speech communication. Speech is produced by filtering the source of acoustic energy generated at the larynx by the process of phonation in English vowels and “voiced” consonants such as [m], or by turbulent air flow at the larynx (the sound [h]), or at constrictions formed by the tongue such as the sound [s].The airway above the larynx – the supralaryngeal vocal tract (SVT) acts as a filter, allowing maximum energy to pass through it at formant frequencies that are determined by the SVT’ shape and length. Formant frequencies are arguably the primary acoustic features that differentiate the phonetic elements that specify words. Pharyngeal air sacs would absorb energy at specific frequencies, interfering with the complex perceptual processes involved in recovering the formant frequency pattern from the flow of speech. Some of the war gases used in World War One enlarged the vestigial pharyngeal air sacs present in humans. Observations of the gassed soldiers noted that their speech often was distorted and difficult to comprehend. However, if recordings of their speech were made, they have been lost. Computer modeling studies of the acoustic effects of adding pharyngeal air sacs to a human SVT would address this question. [1]

Music and Language Syntax Interact in Broca's Area: An fMRI Study

Richard Kunert cs 2015
PLoS doi 10.1371/journal.pone.0141069 free access

Instrumental music & language are both syntactic systems, employing complex, hierarchically-structured sequences built using implicit structural norms. This organization allows listeners to understand the role of individual words or tones in the context of an unfolding sentence or melody.

Previous studies suggest that the brain mechanisms of syntactic processing may be partly shared between music & language. However, functional neuro-imaging evidence for anatomical overlap of brain activity involved in linguistic & musical syntactic processing has been lacking.

In the present study, we used functional magnetic resonance imaging (fMRI) + an interference paradigm based on sung sentences: the processing demands of musical syntax (harmony) & language syntax interact in Broca's area in the left inferior frontal gyrus (without leading to music & language main effects).

A language main effect in Broca's area only emerged in the complex music harmony condition, suggesting that (with our stimuli & tasks) a language
effect only becomes visible under conditions of increased demands on shared neural resources.

In contrast to previous studies, our design allows us to rule out that the observed neural interaction is due to (1) general attention mechanisms, as a psycho-acoustic auditory anomaly behaved unlike the harmonic manipulation, (2) error processing, as the language & the music stimuli contained no
structural errors.

The current results thus suggest that 2 different cognitive domains (music& language) might draw on the same high level syntactic integration
resources in Broca's area.


Website: F. Mansfield, 2015

Disclaimer: This site is currently under construction. Every effort has been (will be!) made to trace the copyright owners of any images or text used on this site to request permission and to give proper credit. If you are the copyright holder of any images, files or text and have not been contacted, please contact the webmaster in order to rectify this.