Cladistics (ancient Greek Ancient Greek is the historical stage in the development of the Greek language spanning across the Archaic , Classical (c. 5th–4th centuries BC), and Hellenistic (c. 3rd century BC–6th century AD) periods of ancient Greece and the ancient world. It is predated in the 2nd millennium BC by Mycenaean Greek. Its Hellenistic phase is known as Koine: κλάδος, klados, "branch") is a form of biological systematics Biological systematics is the study of the diversification of life on the planet Earth, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees . Phylogenies have two components, branching order (showing group relationships) and branch length (showing amount of evolution) which classifies living organisms on the basis of shared ancestry. It can be distinguished from other taxonomic Taxonomy is the practice and science of classification. The word finds its roots in the Greek τάξις, taxis and νόμος, nomos ('law' or 'science'). Taxonomy uses taxonomic units, known as taxa (singular taxon) systems, such as phenetics In biology, phenetics, also known as numerical taxonomy or taximetrics, is an attempt to classify organisms based on overall similarity, usually in morphology or other observable traits, regardless of their phylogeny or evolutionary relation, by its focus on evolutionary In biology, evolution is the change in the genetic material of a population of organisms from one generation to the next. Though the changes produced in any one generation are small, differences accumulate with each generation and can, over time, cause substantial changes in the organisms. This process can culminate in the emergence of new species relationships; while other systems usually use morphological similarities to group similar species into genera, families and other higher level classification, cladistics tries to construct a tree representing the ancestry of organisms and species. Cladistics is also distinguished by its emphasis on objective, quantitative analysis, rather than subjective decisions that some other taxonomic systems rely upon[1].

Cladistics originated in the work of the German entomologist Entomology is the scientific study of insects, a branch of arthropodology. At some 1.3 million described species, insects account for more than two-thirds of all known organisms,date back some 400 million years, and have many kinds of interactions with humans and other forms of life on earth. It is a specialty within the field of biology. Though Willi Hennig, who himself referred to it as phylogenetic systematics Phylogenetic nomenclature or phylogenetic taxonomy is an alternative to rank-based nomenclature, applying definitions from cladistics (or phylogenetic systematics). Its two defining features are the use of phylogenetic definitions of biological taxon names, and the lack of obligatory ranks. It is currently not regulated, but the PhyloCode (; the use of the terms "cladistics" and "clade Ever since Darwin showed that all organisms share common ancestry, taxonomy has consistently attempted to represent and reflect the evolutionary history of organisms. The DNA and RNA analysis used in modern molecular biology has greatly helped in illuminating this history, by providing large amounts of new phylogenetic information which was" was popularized by other researchers[2]. Cladistics originated in the field of biological systematics Biological systematics is the study of the diversification of life on the planet Earth, both past and present, and the relationships among living things through time. Relationships are visualized as evolutionary trees . Phylogenies have two components, branching order (showing group relationships) and branch length (showing amount of evolution), but has been successfully applied in other disciplines: for example, to determine the relationships between the surviving manuscripts of the Canterbury Tales The Canterbury Tales is a collection of stories written by Geoffrey Chaucer in the 14th century . The tales are contained inside a frame tale and told by a collection of pilgrims on a pilgrimage from London Borough of Southwark to visit the shrine of Saint Thomas Becket at Canterbury Cathedral. The Canterbury Tales are written in Middle English[3].

Cladists use cladograms Cladistics is a form of biological systematics which classifies living organisms on the basis of shared ancestry. It can be distinguished from other taxonomic systems, such as phenetics, by its focus on evolutionary relationships; while other systems usually use morphological similarities to group similar species into genera, families and other, diagrams which show ancestral relations between organisms, to represent the evolutionary tree of life Charles Darwin believed that phylogeny, the ascent of all species through time, was expressible as a metaphor he termed the Tree of Life. The modern development of this idea is called the Phylogenetic tree. Although traditionally such cladograms were generated largely on the basis of morphological characters, DNA Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints or a recipe, or a code, since it contains the instructions needed and RNA Ribonucleic acid is a biologically important type of molecule that consists of a long chain of nucleotide units. Each nucleotide consists of a nitrogenous base, a ribose sugar, and a phosphate. RNA is very similar to DNA, but differs in a few important structural details: in the cell, RNA is usually single-stranded, while DNA is usually double- sequencing data and computational phylogenetics are now very commonly used in the generation of cladograms.

Contents

Clades

The yellow group (sauropsids) is monophyletic, the blue group (reptiles Reptiles, or members of the class Reptilia, are air-breathing, cold-blooded amniotes that have skin covered in scales or scutes as opposed to hair or feathers. They are tetrapods and lay amniote eggs, whose embryos are surrounded by the amnion membrane. Modern reptiles inhabit every continent with the exception of Antarctica, and four living) is paraphyletic In phylogenetics, a group of organisms is said to be paraphyletic if the group contains its most recent common ancestor but does not contain all the descendants of that ancestor, and the red group (warm-blooded animals) is polyphyletic For example, the group consisting of warm-blooded animals is polyphyletic, because it contains both mammals and birds, but the most recent common ancestor of mammals and birds was cold-blooded. Warm-bloodedness evolved separately in the ancestors of mammals and the ancestors of birds, so it is not a true phylogenetic grouping.

The concept of a clade is central to cladistics. A clade consists of a single organism and all of its descendents. In the diagram provided alongside, it is hypothesized that a vertebrate ancestor is the common ancestor of all vertebrates, including fishes (Pisces). A single tetrapod ancestor is the ancestor of all tetropods, including amphibians, reptiles, mammals and birds. The tetropod ancestor was a descendent of the original vertebrate ancestor, but is not a descendent of any fish. We can also think of a vertebrate clade, encompassing all vertebrates, which consists of a fish clade (emcompassing all fish) and a tetrapod clade (consisting of all the tetrapods) and so on.

An important idea here is that this cladogram is an evolutionary hypothesis A hypothesis consists either of a suggested explanation for an observable phenomenon or of a reasoned proposal predicting a possible causal correlation among multiple phenomena. The term derives from the Greek, hypotithenai meaning "to put under" or "to suppose." The scientific method requires that one can test a scientific. It is falsifiable: further genetic or morphological evidence might suggest that fish and amphibians shared a common ancestor which was not an ancestor of the other tetrapods, for instance, which would cause us to define a fish-and-amphibian clade which, along with the tetrapod clade, is descended from the vertebrate ancestor (thus putting both clades into the vertebrate clade).

Three main types of groups can be identified in cladograms:

The following terms are used to identify shared or distinct characters amongst groups:

Clades or species relate to each others in different ways:

Three definitions of clade

The three ways to define a clade.

There are three major ways to define a clade Ever since Darwin showed that all organisms share common ancestry, taxonomy has consistently attempted to represent and reflect the evolutionary history of organisms. The DNA and RNA analysis used in modern molecular biology has greatly helped in illuminating this history, by providing large amounts of new phylogenetic information which was for use in a cladistic taxonomy.[4]

History of cladistics

Hennig's major book, even the 1979 version, does not contain the term cladistics in the index. He referred to his own approach as phylogenetic systematics, as implied by the book's title. A review paper by Dupuis observes that the term clade was introduced in 1958 by Julian Huxley Sir Julian Sorell Huxley FRS was an English evolutionary biologist, humanist and internationalist. He was a proponent of natural selection, and a leading figure in the mid-twentieth century evolutionary synthesis. He was Secretary of the Zoological Society of London (1935–1942), the first Director of UNESCO, and a founding member of the World, cladistic by Cain and Harrison in 1960, and cladist (for an adherent of Hennig's school) by Mayr in 1965.[5]

From the time of Hennig's original formulation until the end of the 1980s cladistics remained a minority approach to classification. However in the 1990s it rapidly became the dominant method of classification in evolutionary biology. Cheap but increasingly powerful personal computers A personal computer is any general-purpose computer whose size, capabilities, and original sales price make it useful for individuals, and which is intended to be operated directly by an end user, with no intervening computer operator made it possible to process large quantities of data about organisms and their characteristics. At about the same time the development of effective polymerase chain reaction In molecular biology, the polymerase chain reaction is a technique to amplify a single or few copies of a piece of DNA across several orders of magnitude, generating millions or more copies of a particular DNA sequence. The method relies on thermal cycling, consisting of cycles of repeated heating and cooling of the reaction for DNA melting and techniques made it possible to apply cladistic methods of analysis to biochemical Biochemistry is the study of the chemical processes in living organisms. It deals with the structure and function of cellular components such as proteins, carbohydrates, lipids, nucleic acids and other biomolecules and molecular genetic Molecular genetics is the field of biology that studies the structure and function of genes at a molecular level. The field studies how the genes are transferred from generation to generation. Molecular genetics employs the methods of genetics and molecular biology. It is so-called to differentiate it from other sub fields of genetics such as features of organisms as well as to anatomical ones.[6]

Cladistics as a successor to phenetics

For some decades in the mid to late twentieth century, a commonly used methodology was phenetics In biology, phenetics, also known as numerical taxonomy or taximetrics, is an attempt to classify organisms based on overall similarity, usually in morphology or other observable traits, regardless of their phylogeny or evolutionary relation ("numerical taxonomy"). This can be seen as a predecessor[7] to some methods of today's cladistics (namely distance matrix methods like neighbor-joining In bioinformatics, neighbor-joining is a bottom-up clustering method used for the construction of phylogenetic trees. Usually used for trees based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa in the tree), but made no attempt to resolve phylogeny In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms , which is discovered through molecular sequencing data and morphological data matrices. The term phylogenetics is of Greek origin from the terms phyle/phylon (φυλή/φῦλον), meaning "tribe, race," and genetikos (γενετικός, only similarities.

Cladograms

It has been suggested that this section be split into a new article entitled Cladogram Cladistics is a form of biological systematics which classifies living organisms on the basis of shared ancestry. It can be distinguished from other taxonomic systems, such as phenetics, by its focus on evolutionary relationships; while other systems usually use morphological similarities to group similar species into genera, families and other. (Discuss)

The starting point of cladistic analysis is a group of species and molecular, morphological, or other data characterizing those species. The end result is a tree-like In mathematics, more specifically graph theory, a tree is a graph in which any two vertices are connected by exactly one path. In other words, any connected graph without cycles is a tree. A forest is a disjoint union of trees relationship diagram called a cladogram,[8] or sometimes a dendrogram (Greek for "tree drawing").[9] The cladogram graphically represents a hypothetical evolutionary process. Cladograms are subject to revision as additional data become available.

Synonyms

The terms evolutionary tree A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and the edge lengths in some trees correspond to, and sometimes phylogenetic tree A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and the edge lengths in some trees correspond to are often used synonymously with cladogram,[10] but others treat phylogenetic tree as a broader term that includes trees generated with a nonevolutionary emphasis.

Subtrees are clades

In cladograms, all organisms lie at the leaves.[11] The two taxa A taxon is a group of (one or more) organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement. Defining what belongs or does not belong to such a taxonomic group is done by a taxonomist. It is not uncommon for one taxonomist to disagree with another on what exactly belongs to on either side of a split are called sister taxa or sister groups. Each subtree, whether it contains only two or a hundred thousand items, is called a clade Ever since Darwin showed that all organisms share common ancestry, taxonomy has consistently attempted to represent and reflect the evolutionary history of organisms. The DNA and RNA analysis used in modern molecular biology has greatly helped in illuminating this history, by providing large amounts of new phylogenetic information which was.

2-way versus 3-way forks

Main article: Polytomy

Many cladists require that all forks in a cladogram be 2-way forks. Some cladograms include 3-way or 4-way forks when there are insufficient data to resolve the forking to a higher level of detail. See phylogenetic tree A phylogenetic tree or evolutionary tree is a tree showing the evolutionary relationships among various biological species or other entities that are believed to have a common ancestor. In a phylogenetic tree, each node with descendants represents the most recent common ancestor of the descendants, and the edge lengths in some trees correspond to for more information about forking choices in trees.

Number of distinct cladograms

For a given set of species, the number of distinct cladograms that can be drawn (ignoring which cladogram best matches the species characteristics) is:[12]

Number of species 2 3 4 5 6 7 8 9 10 N
Number of cladograms 1 3 15 105 945 10,395 135,135 2,027,025 34,459,425 1*3*5*7*...*(2N-3)

This superexponential growth of the number of possible cladograms explains why manual creation of cladograms becomes very difficult when the number of species is large.

Depth

If a cladogram represents N species, the number of levels (the "depth") in the cladogram is on the order of log2(N).[13] For example, if there are 32 species of deer Deer are the ruminant mammals forming the family Cervidae . A number of broadly similar animals from related families within the order Artiodactyla are often also called deer. Male deer of all species (except the Chinese water deer) grow and shed new antlers each year – in this they differ from permanently horned animals such as antelope –, a cladogram representing deer will be around 5 levels deep (because 25 = 32). A cladogram representing the complete tree of life, with about 10 million species, would be about 23 levels deep. This formula gives a lower limit: in most cases the actual depth will be a larger value because the various branches of the cladogram will not be uniformly deep. Conversely, the depth may be shallower if forks larger than 2-way forks are permitted.

Time scale

A cladogram tree has an implicit time axis,[14] with time running forward from the base of the tree to the leaves of the tree. If the approximate date (for example, expressed as millions of years ago) of all the evolutionary forks were known, those dates could be captured in the cladogram. Thus, the time axis of the cladogram could be assigned a time scale (e.g. 1 cm = 1 million years), and the forks of the tree could be graphically located along the time axis. Such cladograms are called scaled cladograms. Many cladograms are not scaled along the time axis, for a variety of reasons:

Extinct species

Cladistics makes no distinction between extinct and extant species,[16] and it is appropriate to include extinct species in the group of organisms being analyzed. Cladograms that are based on DNA/RNA generally do not include extinct species because DNA/RNA samples from extinct species are rare. Cladograms based on morphology, especially morphological characteristics that are preserved in fossils, are more likely to include extinct species.

Cladistics in taxonomy

It has been suggested that this section be split into a new article entitled Phylogenetic nomenclature Phylogenetic nomenclature or phylogenetic taxonomy is an alternative to rank-based nomenclature, applying definitions from cladistics (or phylogenetic systematics). Its two defining features are the use of phylogenetic definitions of biological taxon names, and the lack of obligatory ranks. It is currently not regulated, but the PhyloCode (. (Discuss)

Cladistics contrasted with traditional taxonomy

A highly resolved, automatically generated tree of life Charles Darwin believed that phylogeny, the ascent of all species through time, was expressible as a metaphor he termed the Tree of Life. The modern development of this idea is called the Phylogenetic tree based on completely sequenced genomes[17]

Prior to the advent of cladistics, most taxonomists used Linnaean taxonomy The same applies to "Linnaean name": depending on context this may either be a formal name given by Linnaeus , such as Giraffa camelopardalis Linnaeus, 1758, or a formal name in the accepted nomenclature (as opposed to a modernistic clade name) and later Evolutionary taxonomy to organize life forms. These traditional approaches, still in use by some researchers (especially in works intended for a more general audience[18]) use several fixed levels of a hierarchy, such as kingdom, phylum In biology, a phylum [note 1] is a taxonomic rank above Kingdom and below Class. "Phylum" is equivalent to the botanical term division, class The composition of each class is determined by a taxonomist. Often there is no exact agreement, with different taxonomists taking different positions. There are no hard rules that a taxonomist needs to follow in describing a class, but for well-known animals there is likely to be consensus. For example, dogs are usually assigned to the class, order The Latin suffix -formes meaning "having the form of" is used for the scientific name of orders of birds and reptiles, but not for those of mammals and invertebrates, and family What does and does not belong to each family is determined by a taxonomist. Similarly for the question if a particular family should be recognized at all. Often there is no exact agreement, with different taxonomists each taking a different position. There are no hard rules that a taxonomist needs to follow in describing or recognizing a family. Cladistics does not use those terms, because one of the fundamental premises of cladistics is that the evolutionary tree is so deep and so complex that it is inadvisable to set a fixed number of levels.

Evolutionary taxonomy insists that groups reflect phylogenies In biology, phylogenetics is the study of evolutionary relatedness among various groups of organisms , which is discovered through molecular sequencing data and morphological data matrices. The term phylogenetics is of Greek origin from the terms phyle/phylon (φυλή/φῦλον), meaning "tribe, race," and genetikos (γενετικός. In contrast, Linnean taxonomy allows both monophyletic and paraphyletic groups as taxa. Since the early 20th century, Linnaean taxonomists have generally attempted to make genus-level and lower-level taxa monophyletic. Ernst Mayr drew a distinction between the terms cladistics and phylogeny, using the term cladistics to refer to classifications which only take into account genealogy, as opposed to phylogeny, which had previously been used in a broader sense to refer to the combination of genealogy and amount of divergence from an ancestor (i.e. Evolutionary taxonomy). Mayr wrote, in 1985:

It would seem to me to be quite evident that the two concepts of phylogeny (and their role in the construction of classifications) are sufficiently different to require terminological distinction. The term phylogeny should be retained for the broad concept of phylogeny, promoted by Darwin and adopted by most students of phylogeny in the ensuing 90 years. The concept of phylogeny as mere genealogy should be terminologically distinguished as cladistics. To lump the two concepts together terminologically could not help but produce harmful equivocation.

[19]

Willi Hennig's pioneering work provoked a spirited debate[20] about the relative merits of cladistics versus traditional taxonomy which has continued down to the present.[21] Some of the debates that the cladists engaged in had been running since the 19th century, but they entered these debates with a new fervor,[22] as can be seen from the Foreword to Hennig (1979) by Rosen, Nelson, and Patterson:

Encumbered with vague and slippery ideas about adaptation, fitness, biological species and natural selection, neo-Darwinism (summed up in the "evolutionary" systematics of Mayr and Simpson) not only lacked a definable investigatory method, but came to depend, both for evolutionary interpretation and classification, on consensus or authority.

Foreword, page ix

Cladistics strictly and exclusively follows phylogeny and has arbitrarily deep trees with binary branching: each taxon is a clade. Linnaean taxonomy, while following phylogeny, also subjectively considers morphology and has a fixed hierarchy, whose taxa are not always clades.

Paraphyletic groups discouraged

Many cladists discourage the use of paraphyletic groups in classification of organisms, because they detract from cladistics' emphasis on clades (monophyletic groups). In contrast, proponents of the use of paraphyletic groups argue that any dividing line in a cladogram creates both a monophyletic section above and a paraphyletic section below. They also contend that paraphyletic taxa are necessary for classifying earlier sections of the tree – for instance, the early vertebrates that would someday evolve into the family Hominidae cannot be placed in any other monophyletic family. They also argue that paraphyletic taxa provide information about significant changes in organisms' morphology, ecology, or life history – in short, that both paraphyletic groups and clades are valuable notions with separate purposes.

Complexity of the Tree of Life

One argument in favor of cladistics is that it supports arbitrarily complex, arbitrarily deep trees. Especially when extinct species are considered (both known and unknown), the complexity and depth of the tree can be very large. Every single speciation event, including all the species that are now extinct, represents an additional fork on the hypothetical, complete cladogram representing the full tree of life. Fractals can be used to represent this notion of increasing detail: as a viewpoint zooms into the tree of life, the complexity remains virtually constant[23]. This great complexity of the tree, and the uncertainty associated with the complexity, are among the reasons that cladists cite for the attractiveness of cladistics over traditional taxonomy.

Proponents of noncladistic approaches to taxonomy point to punctuated equilibrium to bolster the case that the tree of life has a finite depth and finite complexity.[citation needed] If the number of species currently alive is finite, and the number of extinct species that we will ever know about is finite, then the depth and complexity of the tree of life is bounded, and there is no need to handle arbitrarily deep trees.

PhyloCode approach to naming species

A formal code of phylogenetic nomenclature, the PhyloCode[24], is currently under development for cladistic taxonomy. It is intended for use by both those who would like to abandon Linnaean taxonomy and those who would like to use taxa and clades side by side. In several instances (see for example Hesperornithes) it has been employed to clarify uncertainties in Linnaean systematics so that in combination they yield a taxonomy that unambiguously places problematic groups in the evolutionary tree in a way that is consistent with current knowledge.

Example

For more details on this topic, see Reptilia#History of classification.

For example, Linnaean taxonomy contains the taxon Tetrapoda, defined morphologically as vertebrates with four limbs (as well as animals with four-limbed ancestors, such as snakes), which is often given the rank of superclass, and divides into the classes Amphibia, Reptilia, Aves, Mammalia, and some extinct families.

Cladistics also contains the taxon Tetrapoda, whose living members can be classified phylogenically as "the clade defined by the common ancestor of amphibians and mammals", or more precisely the clade defined by the common ancestor of a specific amphibian and mammal (or bird or reptile), but whose tree is still being worked out (there are a number of extinct branches). The taxon does not have a rank, and its subtaxa are subclades: these can be contained within one another, but one does not divide the clade into several non-overlapping taxa (as in traditional taxonomy): one can split into two clades at the first branching, but that is all. With regards to the traditional classes, Aves and Mammalia are subclades, contained in the subclade Amniota, but Reptilia* is a paraphyletic taxon, not a clade — "At best, the cladists suggest, we could say that the traditional Reptilia are "non-avian, non-mammalian amniotes"[25] — and instead one divides Amniota into the two clades Sauropsida (which contains birds and all living amniotes other than mammals, including all living traditional reptiles) and Theropsida (mammals and the extinct "mammal-like reptiles"). Similarly, Amphibia* is a paraphyletic taxon.

Summary of advantages of cladistics

Proponents of cladistics enumerate key distinctions between cladistics and Linnaean taxonomy as follows:[26]

Cladistics Linnaean Taxonomy
Handles arbitrarily deep trees. Often must invent new level names (such as superorder, suborder, infraorder, parvorder, magnorder) to accommodate new discoveries. Biased towards trees about 4 to 12 levels deep.
Discourages naming or use of groups that are not monophyletic Acceptable to name and use paraphyletic groups
Primary goal is to reflect actual process of evolution Primary goal is to group species based on morphological similarities
Assumes that the shape of the tree will change frequently, with new discoveries New discoveries often require renaming or releveling of Classes, Orders, and Kingdoms

Summary of criticisms of cladistics

Critics of cladistics include Ashlock,[27] Mayr,[28] Williams[29]. Some of their criticisms include:

Cladistics Linnaean Taxonomy
Limited to entities related by evolution or ancestry Supports groupings without reference to evolution or ancestry
Does not include a process for naming species Includes a process for giving unique names to species
Difficult to understand the essence of a clade, because clade definitions emphasize ancestry at the expense of meaningful characteristics Taxa definitions based on tangible characteristics
Ignores sensible, clearly defined paraphyletic groups such as reptiles Permits clearly defined groups such as reptiles
Difficult to determine if a given species is in a clade or not (e.g. if clade X is defined as "most recent common ancestor of A and B along with its descendants", then the only way to determine if species Y is in the clade is to perform a complex evolutionary analysis) Straightforward process to determine if a given species is in a taxon or not
Limited to organisms that evolved by inherited traits; not applicable to organisms that evolved via complex gene sharing or lateral transfer Applicable to all organisms, regardless of evolutionary mechanism

Process to generate a cladogram

It has been suggested that this section be split into a new article entitled Cladogram. (Discuss)
Unrooted cladogram of the myosin supergene family[30]

A simplified procedure for generating a cladogram is:[31]

  1. Gather and organize data
  2. Consider possible cladograms
  3. Select best cladogram

Step 1

A cladistic analysis begins with the following data:

For example, if analyzing 20 species of birds, the data might be:

Molecular versus morphological data

The characteristics used to create a cladogram can be roughly categorized as either morphological (synapsid skull, warm blooded, notochord, unicellular, etc.) or molecular (DNA, RNA, or other genetic information).[31] Prior to the advent of DNA sequencing, all cladistic analysis used morphological data.

As DNA sequencing has become cheaper and easier, molecular systematics has become a more and more popular way to reconstruct phylogenies.[32] Using a parsimony criterion is only one of several methods to infer a phylogeny from molecular data; maximum likelihood and Bayesian inference, which incorporate explicit models of sequence evolution, are non-Hennigian ways to evaluate sequence data. Another powerful method of reconstructing phylogenies is the use of genomic retrotransposon markers, which are thought to be less prone to the problem of reversion that plagues sequence data. They are also generally assumed to have a low incidence of homoplasies because it was once thought that their integration into the genome was entirely random; this seems at least sometimes not to be the case, however.

Ideally, morphological, molecular, and possibly other phylogenies should be combined into an analysis of total evidence: All have different intrinsic sources of error. For example, character convergence (homoplasy) is much more common in morphological data than in molecular sequence data, but character reversions that are unrecognizable as such are more common in the latter (see long branch attraction). Morphological homoplasies can usually be recognized as such if character states are defined with enough attention to detail.

Plesiomorphies and synapomorphies

The researcher must decide which character states were present before the last common ancestor of the species group (plesiomorphies) and which were present in the last common ancestor (synapomorphies) and does so by comparison to one or more outgroups. The choice of an outgroup is a crucial step in cladistic analysis because different outgroups can produce trees with profoundly different topologies. Note that only synapomorphies are of use in characterizing clades.

Avoid homoplasies

A homoplasy is a character that is shared by multiple species due to some cause other than common ancestry.[33] Typically, homoplasies occur due to convergent evolution. Use of homoplasies when building a cladogram is sometimes unavoidable but is to be avoided when possible.

A well known example of homoplasy due to convergent evolution would be the character, "presence of wings". Though the wings of birds, bats, and insects serve the same function, each evolved independently, as can be seen by their anatomy. If a bird, bat, and a winged insect were scored for the character, "presence of wings", a homoplasy would be introduced into the dataset, and this would confound the analysis, possibly resulting in a false evolutionary scenario.

Homoplasies can often be avoided outright in morphological datasets by defining characters more precisely and increasing their number. When analyzing "supertrees" (datasets incorporating as many taxa of a suspected clade as possible), it may become unavoidable to introduce character definitions that are imprecise, as otherwise the characters might not apply at all to a large number of taxa; to continue with the "wings" example, the presence of wings would hardly be a useful character if attempting a phylogeny of all Metazoa, as most of these don't have wings at all. Cautious choice and definition of characters thus is another important element in cladistic analyses. With a faulty outgroup or character set, no method of evaluation is likely to produce a phylogeny representing the evolutionary reality.

Step 2

Main article: Computational phylogenetics

When there are just a few species being organized, it is possible to do this step manually, but most cases require a computer program. There are scores of computer programs available to support cladistics.[34] See phylogenetic tree for more information about tree-generating computer programs.

Because the total number of possible cladograms grows exponentially with the number of species, it is impractical for a computer program to evaluate every individual cladogram. A typical cladistic program begins by using heuristic techniques to identify a small number of candidate cladograms. Many cladistic programs then continue the search with the following repetitive steps:

  1. Evaluate the candidate cladograms by comparing them to the characteristic data
  2. Identify the best candidates that are most consistent with the characteristic data
  3. Create additional candidates by creating several variants of each of the best candidates from the prior step
  4. Use heuristics to create several new candidate cladograms unrelated to the prior candidates
  5. Repeat these steps until the cladograms stop getting better

Computer programs that generate cladograms use algorithms that are very computationally intensive,[35] because the cladogram problem is NP-hard.

Step 3

There are several algorithms available to identify the "best" cladogram.[36] Most algorithms use a metric to measure how consistent a candidate cladogram is with the data. Most cladogram algorithms use the mathematical techniques of optimization and minimization.

In general, cladogram generation algorithms must be implemented as computer programs, although some algorithms can be performed manually when the data sets are trivial (for example, just a few species and a couple of characteristics).

Some algorithms are useful only when the characteristic data are molecular (DNA, RNA); other algorithms are useful only when the characteristic data are morphological. Other algorithms can be used when the characteristic data includes both molecular and morphological data.

Algorithms for cladograms include least squares, neighbor-joining, parsimony, maximum likelihood, and Bayesian inference.

Biologists sometimes use the term parsimony for a specific kind of cladogram generation algorithm and sometimes as an umbrella term for all cladogram algorithms.[37]

Algorithms that perform optimization tasks (such as building cladograms) can be sensitive to the order in which the input data (the list of species and their characteristics) is presented. Inputting the data in various orders can cause the same algorithm to produce different "best" cladograms. In these situations, the user should input the data in various orders and compare the results.

Using different algorithms on a single data set can sometimes yield different "best" cladograms, because each algorithm may have a unique definition of what is "best".

Because of the astronomical number of possible cladograms, algorithms cannot guarantee that the solution is the overall best solution. A nonoptimal cladogram will be selected if the program settles on a local minimum rather than the desired global minimum.[38] To help solve this problem, many cladogram algorithms use a simulated annealing approach to increase the likelihood that the selected cladogram is the optimal one.[39]

Application to other disciplines

A triple family tree of Linux distributions.

The processes used to generate cladograms are not limited to the field of biology[40]. The generic nature of cladistics means that cladistics can be used to organize groups of items in many different academic realms. The only requirement is that the items have characteristics that can be identified and measured.

Recent attempts in the use of cladistic methods outside of biology attack problems in:

Footnotes

  1. ^ "Natural History Collections: Cladistics". http://www.nhc.ed.ac.uk/index.php?page=236.273.444. Retrieved on 4 July 2009.
  2. ^ Phylogenetic Systematics is the title of Hennig's 1966 book
  3. ^ "Canterbury Tales Project". http://www.canterburytalesproject.org. Retrieved on 4 July 2009.
  4. ^ de Queiroz, K. and J. Gauthier (1994). "Toward a phylogenetic system of biological nomenclature". Trends in Research in Ecology and Evolution 9 (1): 27–31. doi:10.1016/0169-5347(94)90231-3.
  5. ^ Dupuis, Claude (1984). "Willi Hennig's impact on taxonomic thought". Annual Review of Ecology and Systematics 15: 1–24. ISSN 0066-4162.
  6. ^ Baron, C., and Høeg, J.T., (2005). ""Gould, Scharm and the paleontologocal perspective in evolutionary biology"". in Koenemann, S., and Jenner, R.A.. Crustacea and Arthropod Relationships. CRC Press. pp. 3–14. ISBN 0849334985. http://books.google.co.uk/books?id=LalmQ4346O0C&dq=Nielsen,+C.+2001+%22Animal+evolution%22+Chelicerata&source=gbs_summary_s&cad=0. Retrieved on 2008-10-15.
  7. ^ Mayr, Ernst (1982). The growth of biological thought: diversity, evolution and inheritance. Cambridge, MA: Harvard Univ. Press. p. 221. ISBN 0-674-36446-5.
  8. ^ pp. 45, 78 and 555 of Joel Cracraft and Michael J. Donaghue, eds. (2004). Assembling the Tree of Life. Oxford, England: Oxford University Press.
  9. ^ Weygoldt, P. (February 1998). "Evolution and systematics of the Chelicerata". Experimental and Applied Acarology 22 (2): 63–79. doi:10.1023/A:1006037525704.
  10. ^ Singh, Gurcharan (2004). Plant Systematics: An Integrated Approach. Science. pp. 203–4. ISBN 1578083516.
  11. ^ Albert, Victor (2006). Parsimony, Phylogeny, and Genomics. Oxford University Press. p. 3-55. ISBN 0199297304.
  12. ^ Lowe, Andrew (2004). Ecological Genetics: Design, Analysis, and Application. Blackwell Publishing. p. 164. ISBN 1405100338.
  13. ^ Aldous, David (1996), "Probability Distributions on Cladograms", Random Discrete Structures, Springer, pp. 13
  14. ^ Freeman, Scott (1998). Evolutionary Analysis. Prentice Hall. p. 380. ISBN 0135680239.
  15. ^ Carroll, Robert Lynn (1997). Patterns and Processes of Vertebrate Evolution. Cambridge University Press. p. 80. ISBN 052147809X.
  16. ^ Scott-Ram, N. R. (1990). Transformed Cladistics, Taxonomy and Evolution. Cambridge University Press. p. 83. ISBN 0521340861.
  17. ^ Letunic, I (2007). "Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation" (Pubmed). Bioinformatics 23(1): 127–8. doi:10.1093/bioinformatics/btl529. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17050570.
  18. ^ Unwin, David M. (2006). The Pterosaurs: From Deep Time. New York: Pi Press. pp. 246. ISBN ISBN 0-13-146308-X.
  19. ^ Mayr, E. (1985). "Darwin and the Definition of Phylogeny." Systematic Zoology, 34(1): 97-98.
  20. ^ Wheeler, Quentin (2000). Species Concepts and Phylogenetic Theory: A Debate. Columbia University Press. ISBN 0231101430.
  21. ^ Benton, M. (2000). "Stems, nodes, crown clades, and rank-free lists: is Linnaeus dead?". Biological Reviews 75 (4): 633–648.
  22. ^ Hull, David (1988). Science as a Process. University of Chicago Press. p. 232-276. ISBN 0226360512.
  23. ^ Gordon, Richard (1999). The Hierarchical Genome and Differentiation Waves. World Scientific. p. 632. ISBN 9810222688.
  24. ^ Pennisi, E. (2001). "Evolutionary Biology: Preparing the Ground for a Modern 'Tree of Life'". Science 293: 1979–1980. doi:10.1126/science.293.5537.1979.
  25. ^ Colin Tudge (2000). The Variety of Life. Oxford University Press. ISBN 0198604262.
  26. ^ Hennig, Willi (1975). "'Cladistic analysis or cladistic classification': a reply to Ernst Mayr". Systematic Zoology 24: 244–256. doi:10.2307/2412765.
  27. ^ Ashlock PD. 1971. Monophyly and associated terms. Systematic Zoology 20: 63–69. Ashlock PD. 1972. Monophyly again. Systematic Zoology 21: 430–438. Ashlock PD. 1974. The uses of cladistics. Annual Review of Ecology and Systematics 5: 81–89. Ashlock PD. 1979. An evolutionary systematist’s view of classification. Systematic Zoology 28: 441–450.
  28. ^ Mayr E. 1974. Cladistic analysis or cladistic classification? Zeitschrift fűr Zoologische Systematik und Evolutionforschung 12: 94–128. Mayr E. 1978. Origin and history of some terms in systematic and evolutionary biology. Systematic Zoology 27: 83–88. Mayr E, Bock WJ. 2002. Classifications and other ordering systems. Journal of Zoological Systematics and Evolutionary Research 40: 169–194.
  29. ^ Williams, P.A. 1992. Confusion in cladism. Synthese 01:135-132
  30. ^ Hodge T, Cope M (October 1, 2000). "A myosin family tree". J Cell Sci 113 Pt 19 (19): 3353–4. PMID 10984423. http://jcs.biologists.org/cgi/content/full/113/19/3353.
  31. ^ a b DeSalle, Rob (2002). Techniques in Molecular Systematics and Evolution. Birkhauser. ISBN 376436257X.
  32. ^ Hillis, David (1996). Molecular Systematics. Sinaur. ISBN 0878932828.
  33. ^ West-Eberhard, Mary Jane (2003). Developmental Plasticity and Evolution. Oxford Univ. Press. pp. 353–376. ISBN 0195122356.
  34. ^ "List of Cladistics Software Programs". http://evolution.genetics.washington.edu/phylip/software.pars.html.
  35. ^ Hodkinson, Trevor (2006). Reconstructing the Tree of Life: Taxonomy and Systematics of Species Rich Taxa. CRC Press. p. 61-128. ISBN 0849395798.
  36. ^ Kitching, Ian (1998). Cladistics: The Theory and Practice of Parsimony Analysis. Oxford University Press. ISBN 0198501382.
  37. ^ Stewart, Caro-Beth (1993). "The Powers and Pitfalls of Parsimony". Nature 361: 603–607. doi:10.1038/361603a0.
  38. ^ Foley, Peter (1993). Cladistics: A Practical Course in Systematics. Oxford Univ. Press. p. 66. ISBN 0198577664.
  39. ^ Nixon K. C. (1999). "The Parsimony Ratchet: a new method for rapid parsimony analysis". Cladistics 15: 407–414. doi:10.1111/j.1096-0031.1999.tb00277.x.
  40. ^ Mace, Ruth (2005). The Evolution of Cultural Diversity: A Phylogenetic Approach. Routledge Cavendish. ISBN 1844720993.
  41. ^ Lipo, Carl (2005). Mapping Our Ancestors: Phylogenetic Approaches in Anthropology and Prehistory. Aldine Transaction. ISBN 0202307514.
  42. ^ See for example the surveys in Stephen Oppenheimer, The Origins of the British (London: Robinson, 2006), pp. 290-300, 340-56.
  43. ^ Metzger & Ehrman, The text of the New Testament, OUP, 2005, p.207f.
  44. ^ Peter M.W. Robinson, & Robert J. O’Hara. 1996. Cladistic analysis of an Old Norse manuscript tradition. Research in Humanities Computing, 4: 115–137. Available at http://rjohara.net/cv/1996-rhc.
  45. ^ Lundqvist, Andreas; Rodic, Donjan (2009-03-12), GNU/Linux distro timeline: distribution cladogram, http://futurist.se/gldt/, retrieved on 2009-03-19

See also

References

External links

Listen to this article (info/dl) This audio file was created from a revision dated 2005-04-30, and does not reflect subsequent edits to the article. (Audio help) More spoken articles
Topics in phylogenetics
Relevant fields Computational phylogenetics · Molecular phylogeny · Cladistics
Basic concepts Synapomorphy · Phylogenetic tree · Phylogenetic network · Long branch attraction · Clade
Inference methods Maximum parsimony · Maximum likelihood · Neighbor-joining · UPGMA · Bayesian inference · Least squares
Current topics PhyloCode · DNA barcoding
-morphy Symplesiomorphy · Apomorphy · Plesiomorphy · Synapomorphy · Autapomorphy
-phyly Monophyly/Holophyly · Paraphyly · Polyphyly
List of evolutionary biology topics

Categories: Phylogenetics

 

The above information uses material from Wikipedia and is licensed under the GNU Free Documentation License.
Some facts may not have been fully verified for accuracy. [Disclaimers]
This page was last archived by our server on Sun Jul 12 05:24:38 2009. [ refresh local cache ]
Displaying this page or its contents does not use any Wikimedia Foundation's resources.
The owners of this site proudly support the Wikimedia Foundation.