Henry Louis Gates: "Exactly How 'Black' Is Black America?"
Back in 2002, I wrote a pioneering story on DNA testing for racial admixture based on the work of geneticist Mark Shriver: "How White Are Blacks? How Black Are Whites?"
Now, in The Root, Harvard African-American studies professor Henry Louis Gates continues on with his interest in ancestry testing, providing some updating for the preliminary data in my 2002 article. Gates writes:
* According to Ancestry.com, the average African American is 65 percent sub-Saharan African, 29 percent European and 2 percent Native American.
* According to 23andme.com, the average African American is 75 percent sub-Saharan African, 22 percent European and only 0.6 percent Native American.
* According to Family Tree DNA.com, the average African American is 72.95 percent sub-Saharan African, 22.83 percent European and 1.7 percent Native American.
* According to National Geographic's Genographic Project, the average African American is 80 percent sub-Saharan African, 19 percent European and 1 percent Native American.
(Of course National Geographic is lying too, when it comes to race, that's what Albinos do, they lie).
To be sure, much/most(?) of the worlds population is Mulatto of one kind of another - there are many names for racially mixed people. The problem with these admixture studies is that they are made-up lies, by necessity, they have to be, simply because their method and materials are not exclusive.
For example lets do a logical exercise:
If I said that Mr. Jones looked like "Santa Clause": right away, you would in your mind, compile a list of the traits that have been established for Santa Clause. i.e. Middle-aged, Fat White guy, with long White hair on head and Beard, and Red Cheeks, wearing a Red suit with White fur trim, with a wide Black belt. Any one of those things alone would mean nothing, but together, they mean Santa Clause - So now you have a pretty good idea of what Mr. Jones looks like.
As an aside: the actual Santa Clause (Saint Nicholas) was a tall thinnish Black man in Anatolia (Turkey).
So in keeping with the Albino fantasy, the one item that would blow the description up, is the (White Guy) part: because without that how do you establish exclusivity?
Point being: how can you say that a group is 10% Northern European, if you can't establish EXCLUSIVELY just what a "Northern European" IS!
Of course these silly "FOR PROFIT" studies never tell you what what the technical basis for their claims is, they just want you to pay for a genetics test. However, years ago one group did publish the technical aspects of their study:
Volume 63, Issue 6, December 1998, Pages 1839–1851
Estimating African American Admixture Proportions by Use of Population-Specific Alleles
Esteban J. Parra1, Amy Marcini1, Joshua Akey1, Jeremy Martinson2, Mark A. Batzer3, Richard Cooper4, Terrence Forrester5, David B. Allison6, Ranjan Deka7, Robert E. Ferrell2, Mark D. Shriver1,
We analyzed the European genetic contribution to 10 populations of African descent in the United States (Maywood, Illinois; Detroit; New York; Philadelphia; Pittsburgh; Baltimore; Charleston, South Carolina; New Orleans; and Houston) and in Jamaica, using nine autosomal DNA markers. These markers either are population-specific or show frequency differences >45% between the parental populations and are thus especially informative for admixture. European genetic ancestry ranged from 6.8% (Jamaica) to 22.5% (New Orleans). The unique utility of these markers is reflected in the low variance associated with these admixture estimates (SEM 1.3%–2.7%). We also estimated the male and female European contribution to African Americans, on the basis of informative mtDNA (haplogroups H and L) and Y Alu polymorphic markers. Results indicate a sex-biased gene flow from Europeans, the male contribution being substantially greater than the female contribution. mtDNA haplogroups analysis shows no evidence of a significant maternal Amerindian contribution to any of the 10 populations. We detected significant nonrandom association between two markers located 22 cM apart (FY-null and AT3), most likely due to admixture linkage disequilibrium created in the interbreeding of the two parental populations. The strength of this association and the substantial genetic distance between FY and AT3 emphasize the importance of admixed populations as a useful resource for mapping traits with different prevalence in two parental populations.
The history of African Americans can be traced back to 1619, when the first Africans arrived at the British colonies (Jamestown, Virginia), although the presence of African slaves has been reported as early as 1526 in Spanish expeditions to what would become South Carolina, Georgia, Florida, and New Mexico (Piersen 1996). Although institutional slavery began very soon after, it was not until the beginning of the 18th century that the importation of slaves reached significant rates, in parallel with the demand for workers to cultivate the tobacco, indigo, and rice plantations in the southern colonies. The highest peaks occurred during 1790–1800 and the first years of the 19th century. In 1808, slave trade became illegal but continued at a low rate for several more decades (Tanner 1995). Various estimates of the total number of slaves imported into the United States have been offered, with generally accepted numbers in the range 380,000–570,000 (Curtin 1969; Johnson and Campbell 1981). At present, >33 million U.S. residents are of African descent (U.S. Census Bureau).
Although it is very difficult to determine the precise ethnic origins of the African slaves, information from shipping lists has provided an approximate picture of their geographic provenance. The slave trade affected a very wide area of western and west central Africa, mainly the coastline between present-day Senegal in the north and Angola in the south. The most important regions were Senegambia (Gambia and Senegal), Sierra Leone (Guinea and Sierra Leone), Windward Coast (Ivory Coast and Liberia), Gold Coast (Ghana), Bight of Benin (from the Volta River to the Benin River), Bight of Biafra (east of the Benin River to Gabon), and Angola (southwest Africa, including part of Gabon, Congo, and Angola). Curtin (1969) has offered, on the basis of data on English trade during the 18th century (the peak of the Atlantic slave trade), estimates of the proportional contributions by areas. His analysis shows that Angola and Bight of Biafra contributed the highest numbers of slaves imported into the North American mainland (25% each). However, there were significant differences in ethnic origin depending on the port of entry in the United States, and the figures for the colonies of Virginia and South Carolina differed considerably.
The history of African Americans has been marked not only by the forced migration from Africa, but also by admixture with the other ethnic groups they met when they arrived in North America—namely, Europeans and Native Americans. Determination of the extent of that hybridization is of great anthropological, epidemiological, and historical interest. Unfortunately, although the first attempts to characterize admixture proportions in African Americans by means of genetic markers dates back to the 1950s (Glass and Li 1953), the field remains underdeveloped. The main limitations for obtaining precise admixture estimates have been the limited number of classical or DNA markers appropriate for this type of study and the scarcity of data concerning the distribution of allele frequencies in the parental populations, particularly in Africa. In the last few years, however, the number of dimorphic and hypervariable markers showing large frequency differentials between the major geographic or ethnic groups has increased substantially (Shriver et al. 1997). These markers, which we have designated “population-specific alleles” (“PSAs”) are potentially very useful in forensic anthropology, epidemiology, and population genetics.
Recently, we initiated a project to systematically characterize admixture proportions in populations throughout the United States and in Jamaica, using autosomal PSAs. In this article we present data with regard to 10 populations of African descent from nine different areas of the United States and from Jamaica. Two of the markers we have used (FY-Null and ICAM) have alleles that are found only in persons with African ancestry, whereas eight (FY-Null, AT3, APO, GC, LPL, OCA2, RB2300 and Sb19.3) show differences in allele frequency >48% between Africans and Europeans. Using markers with unique alleles (those found in only one population; Chakraborty et al. 1992) and PSAs (those with high levels of allele frequency differential; Shriver et al. 1997), it is possible to generate more precise estimates of the ancestral proportions of an admixed population. In an effort to obtain the best possible estimates of the parental frequencies of these markers, we also analyzed three samples from Africa (two from Nigeria and one from Central African Republic) and three from Europe (England, Ireland, and Germany). We discuss the estimates of admixture in 10 populations of African descent in the context of the history of African American populations and previous genetic studies on admixture proportions in these groups. We also estimated the male and female European contribution to African Americans on the basis of mtDNA (haplogroups H and L) and Y Alu polymorphic (YAP) informative markers.
Differences in the racial distribution of the Duffy antigens were discovered in 1954, when it was found that the overwhelming majority of blacks had the erythrocyte phenotype Fy(a-b-): 68% in African Americans and 88-100% in African blacks (including more than 90% of West African blacks). This phenotype is exceedingly rare in Caucasians. Because the Duffy antigen is uncommon in those of Black African descent, the presence of this antigen has been used to detect genetic admixture. In a sample of unrelated African Americans (n = 235), Afro-Caribbeans (n = 90) and Colombians (n = 93), the frequency of the -46T (Duffy positive) allele was 21.7%, 12.2% and 74.7% respectively.
Overall the frequencies of Fya and Fyb antigens in Caucasians are 66% and 83% respectively, in Asians 99% and 18.5% respectively and in blacks 10% and 23% respectively. The frequency of Fy3 is 100% Caucasians, 99.9% Asians and 32% Blacks. Phenotype frequencies are:
Fy(a+b+): 49% Caucasians, 1% Blacks, 9% Chinese
Fy(a-b+): 34% Caucasians, 22% Blacks, <1% Chinese
Fy(a+b-): 17% Caucasians, 9% Blacks, 91% Chinese
While a possible role in the protection of humans from malaria had been previously suggested, this was only confirmed clinically in 1976. Since then many surveys have been carried out to elucidate the prevalence of Duffy antigen alleles in different populations including: The mutation Ala100Thr (G -> A in the first codon position—base number 298) within the FY*B allele was thought to be purely a Caucasian genotype, but has since been described in Brazilians. However, the study's authors point out that the Brazilian population has arisen from intermarriage between Portuguese, Black Africans, and Indians, which accounts for the presence of this mutation in a few members of Brazil's non-Caucasian groups. Two of the three Afro-Brazilian test subjects that were found to have the mutation (out of a total of 25 Afro-Brazilians tested) were also related to one another, as one was a mother and the other her daughter. This antigen along with other blood group antigens was used to identify the Basque people as a genetically separate group. Its use in forensic science is under consideration. The Andaman and Nicobar Islands, now part of India, were originally inhabited by 14 aboriginal tribes. Several of these have gone extinct. One surviving tribe—the Jarawas—live in three jungle areas of South Andaman and one jungle area in Middle Andaman. The area is endemic for malaria. The causative species is Plasmodium falciparum: there is no evidence for the presence of Plasmodium vivax. Blood grouping revealed an absence of both Fy(a) and Fy(b) antigens in two areas and a low prevalence in two others.
In the Yemenite Jews the frequency of the Fy allele is 0.5879 The frequency of this allele varies from 0.1083 to 0.2191 among Jews from the Middle East, North Africa and Southern Europe. The incidence of Fya among Ashkenazi Jews is 0.44 and among the non-Ashkenazi Jews it is 0.33. The incidence of Fyb is higher in both groups with frequencies of 0.53 and 0.64 respectively.
In the Chinese ethnic populations—the Han and the She people—the frequencies of Fya and Fyb alleles were 0.94 and 0.06 and 0.98 and 0.02 respectively.
The frequency of the Fya allele in most Asian populations is ~95%.
In Grande Comore (also known as Ngazidja) the frequency of the Fy(a- b-) phenotype is 0.86.
In a survey of 115 unrelated Tunisians using both serological and DNA based methods gave the following results: FY*X frequency 0.0174; FY*1 = 0.291 (expressed 0.260, silent 0.031); FY*2 = 0.709 (expressed 0.427; silent 0.282). FY*2 silent is the most common allele in West African blacks and the high prevalence in this sample was interpreted as historical admixture.
The incidence of Fy(a+b-) in northern India among blood donors is 43.85%. In Nouakchott, Mauritania overall 27% of the population are Duffy-positive. 54% of Moors are Duffy antigen positive, while only 2% of black ethnic groups (mainly Poular, Soninke and Wolof) are Duffy positive. The most prevalent allele globally is FY*A. Across sub-Saharan Africa the predominant allele is the silent FY*BES variant. In Iran the Fy (a-b-) phenotype was found in 3.4%. There appears to have been a selective sweep in Africa which reduced the incidence of this antigen there. This sweep appears to have occurred between 6,500 and 97,200 years ago (95% confidence interval).
A map of the Duffy antigen distribution has been produced.
In molecular biology, intercellular adhesion molecules (ICAMs) and vascular cell adhesion molecule-1 (VCAM-1) are part of the immunoglobulin superfamily. They are important in inflammation, immune responses and in intracellular signalling events. The ICAM family consists of five members, designated ICAM-1 to ICAM-5. They are known to bind to leucocyte integrins CD11/CD18 such as LFA-1 and Macrophage-1 antigen, during inflammation and in immune responses. In addition, ICAMs may exist in soluble forms in human plasma, due to activation and proteolysis mechanisms at cell surfaces.
Mammalian intercellular adhesion molecules include:
The White Haplogroup
In human mitochondrial genetics, Haplogroup H is a human mitochondrial DNA (mtDNA) haplogroup that likely originated in Southwest Asia 20,000-25,000 years Before Present. Mitochondrial haplogroup H is a predominantly European haplogroup that originated outside of Europe before the last glacial maximum (LGM). It first expanded in the northern Near East and southern Caucasus between 33,000 and 26,000 years ago, and later migrations from Iberia suggest it reached Europe before the LGM. It has also spread to Siberia and Inner Asia. Today, about 40% of all mitochondrial lineages in Europe are classified as haplogroup H.
The Black haplogroup
In human mitochondrial genetics, L is the mitochondrial DNA macro-haplogroup that is at the root of the human mtDNA phylogenetic tree. As such, it represents the most ancestral mitochondrial lineage of all currently living modern humans. Macro-haplogroup L's origin is connected with Mitochondrial Eve, and thus, is believed to suggest an ultimate African origin of modern humans. Its major sub-clades include L0, L1, L2, L3, L4, L5 and L6, with all non-Africans exclusively descended from just haplogroup L3.
Haplogroup L3 descendants notwithstanding, the designation "haplogroup L" is typically used to designate the family of mtDNA clades that are most frequently found in Sub-Saharan Africa. However, all non-African haplogroups coalesce onto either haplogroup M or haplogroup N, and both these macrohaplogroups are simply sub-branches of haplogroup L3. Consequently, L in its broadest definition is really a paragroup containing all of modern humanity, and all human mitochondrial DNA from around the world are subclades of haplogroup L. Haplogroups M and N are sometimes referred to as haplogroups L3M and L3N respectively. Mitochondrial Eve is defined as the female human ancestor who is the most recent common ancestor of the most deep-rooted lineages of humanity: haplogroups L, L0 and L1-6.
In human genetics, Haplogroup E-M96 is a human Y-chromosome DNA haplogroup. Haplogroup E-M96 is one of the two main branches of the older Haplogroup DE, the other main branch being haplogroup D. The E-M96 clade is divided into two subclades: The more common E-P147 and the less common E-M75.
Underhill (2001) proposed that haplogroup E may have arisen in East Africa. Some authors as Chandrasekar (2007), continue to accept the earlier position of Hammer (1997) that Haplogroup E may have originated in Asia, given that:
E is a clade of Haplogroup DE, with the other major clade, haplogroup D, being Asian.
DE is a clade within M168 with the other two major clades, C and F, considered to have a Eurasian origin.
However, several discoveries made since the Hammer articles are thought to make an Asian origin less likely:
Underhill and Kivisild (2007) demonstrated that C and F have a common ancestor meaning that DE has only one sibling which is non-African.
DE* is found in both Asia and Africa, meaning that not only one, but several siblings of D are found in Asia and Africa.
Karafet (2008), in which Hammer is a co-author, significantly rearranged time estimates leading to "new interpretations on the geographical origin of ancient sub-clades". Amongst other things this article proposed a much older age for haplogroup E-M96 than had been considered previously, giving it a similar age to Haplogroup D, and DE itself, meaning that there is no longer any strong reason to see it as an offshoot of DE which must have happened long after DE came into existence and had entered Asia.
In human genetics, Haplogroup DE is a human Y-chromosome DNA haplogroup. It is defined by the single nucleotide polymorphism (SNP) mutations, or UEPs, M1(YAP), M145(P205), M203, P144, P153, P165, P167, P183. Haplogroup DE is often referred to by the most well-known unique event polymorphism (UEP) which defines it, the Y-chromosome Alu Polymorphism (YAP). The YAP mutation was caused when a strand of DNA called Alu, which copies itself, inserted a copy into the Y chromosome. A Y chromosome that has the YAP mutation is called YAP-positive (YAP+), and a Y chromosome that does not have the YAP mutation is labeled YAP-negative (YAP-). Haplogroup DE is an estimated 65,000 years old. The majority of DE male lines can be categorized as being in either Haplogroup D (Y-DNA), which likely originated in Asia, the only place where it has been found, or haplogroup E, which is believed to have originated in East Africa or the Near East. The remainder are said to be in the paragroup DE*, confirmed cases of which are extremely rare.
The scenarios outlined by Hammer include an out of Africa migration over 100,000 years ago, the YAP+ insertion on an Asian Y-chromosome 55,000 years ago and a back migration of YAP+ from Asia to Africa 31,000 years ago by its subclade haplogroup E. This analysis was based on the fact that older African lineages, such as haplogroups A and B, were YAP negative whereas the younger lineage, haplogroup E was YAP positive. Haplogroup D, which is YAP positive, was clearly an Asian lineage, being found only in East Asia with high frequencies in Japan and Tibet. Because the mutations that define haplogroup E were observed to be in the ancestral state in haplogroup D, and haplogroup D at 55kya, was considerably older than haplogroup E at 31kya, Hammer concluded that haplogroup E was a subclade of haplogroup D.
In human genetics, Haplogroup D-M174 is a Y-chromosome haplogroup. Both D-M174 and E lineages also exhibit the single-nucleotide polymorphism M168 which is present in all Y-chromosome haplogroups except A and B, as well as the YAP unique-event polymorphism, which is unique to Haplogroup DE.
It is found today at high frequency among populations in Tibet, the Japanese archipelago, and the Andaman Islands, though curiously not in India. The Ainu of Japan are notable for possessing almost exclusively Haplogroup D-M174 chromosomes, although Haplogroup C-M217 chromosomes also have been found in 15% (3/20) of sampled Ainu males. Haplogroup D-M174 chromosomes are also found at low to moderate frequencies among populations of Central Asia and northern East Asia as well as the Han and Miao–Yao peoples of China and among several minority populations of Sichuan and Yunnan that speak Tibeto-Burman languages and reside in close proximity to the Tibetans.
Haplogroup D-M174 is also remarkable for its rather extreme geographic differentiation, with a distinct subset of Haplogroup D-M174 chromosomes being found exclusively in each of the populations that contains a large percentage of individuals whose Y-chromosomes belong to Haplogroup D-M174: Haplogroup D-M15 among the Tibetans (as well as among the mainland East Asian populations that display very low frequencies of Haplogroup D-M174 Y-chromosomes), Haplogroup D-M55 among the various populations of the Japanese Archipelago, Haplogroup D-P99 among the inhabitants of Tibet, Tajikistan and other parts of mountainous southern Central Asia, and paragroup D-M174 without tested positive subclades (probably another monophyletic branch of Haplogroup D) among the Andaman Islanders. Another type (or types) of paragroup D-M174 without tested positive subclades is found at a very low frequency among the Turkic and Mongolic populations of Central Asia, amounting to no more than 1% in total. This apparently ancient diversification of Haplogroup D-M174 suggests that it may perhaps be better characterized as a "super-haplogroup" or "macro-haplogroup." In one study, the frequency of Haplogroup D-M174 without tested positive subclades found among Thais was 10%.
Haplogroup B is found frequently in southeastern Asia. A subclade of B4b that is labelled irregularly as B2 is one of five haplogroups found in the indigenous peoples of the Americas, the others being A, C, D, and X. Because the migration to the Americas by the ancestors of Indigenous Americans is generally believed to have been from northeastern Siberia via Beringia, it is surprising that Haplogroup B and Haplogroup X have not been found in Paleo-Siberian tribes of northeastern Siberia. However, Haplogroup B has been found among Turkic, Mongolic, and Tungusic populations of Siberia, such as Tuvans, Altays, Shors, Khakassians, Yakuts, Buryats, Khamnigans, Negidals, and Evenks. This haplogroup is also found among populations in China, Indonesia, Iran, Iraq, Japan, Korea, Laos, Madagascar, Malaysia, Melanesia, Micronesia, Mongolia, the Philippines, Polynesia, Taiwan, Thailand, Tibet, and Vietnam.
Haplogroup X is found in approximately 7% of native Europeans, and 3% of all Native Americans from North America. Overall haplogroup X accounts for about 2% of the population of Europe, the Near East, and North Africa. Sub-group X1 is much less numerous, and is largely restricted to North and East Africa, and also the Near East.
Sub-group X2 appears to have undergone extensive population expansion and dispersal around or soon after the last glacial maximum, about 21,000 years ago. It is more strongly present in the Near East, the Caucasus, and Mediterranean Europe; and somewhat less strongly present in the rest of Europe. Particular concentrations appear in Georgia (8%), the Orkney Islands (in Scotland) (7%), and amongst the Israeli Druze community (27%). Subclades X2a and X2g are found in North America, but are not present in native South Americans.
Haplogroup X is also one of the five haplogroups found in the indigenous peoples of the Americas. Although it occurs only at a frequency of about 3% for the total current indigenous population of the Americas, it is a bigger haplogroup in northern North America, where among the Algonquian peoples it comprises up to 25% of mtDNA types. It is also present in lesser percentages to the west and south of this area—among the Sioux (15%), the Nuu-Chah-Nulth (11%–13%), the Navajo (7%), and the Yakama (5%). Unlike the four main Native American mtDNA haplogroups (A, B, C, D), X is not at all strongly associated with East Asia.
Subjects and Methods
The subjects analyzed in this study came from a number of sources, primarily paternity identity testing labs (the Detroit, Houston, and New Orleans samples), anthropological studies, and volunteers in medical studies. Table 1 shows the names of the populations analyzed and the number of individuals studied. The samples from Maywood (Illinois), Jamaica, and Nigeria-2 (from a traditional Yoruba community in the city of Ibadan, in southwestern Nigeria) were collected as healthy random subjects in an ongoing study of hypertension (see Ataman et al. 1996 and Cooper et al. 1997). The Nigeria-1 sample was collected from a group of civil servants in Benin City, Nigeria. The Central African Republic sample was collected as part of an anthropological survey of a village along the Oubangui river near the capital, Bangui. Related individuals were excluded from the sample. The New York sample comprised case and control subjects in an ongoing study of obesity in African Americans being conducted at Columbia University. Both samples from Philadelphia were collected as healthy control subjects during independent studies of hypertension in the African American population of Philadelphia. The Baltimore sample was collected as part of a study on the dynamics of HIV infection among intravenous drug users. The sample from Charleston was collected as part of a study on efforts for prenatal lead screening. All subjects in the Charleston sample were pregnant women. The samples of Europeans from Germany, Ireland, and England were collected at random as part of anthropological surveys.
Primer Sequences and PCR Conditions
The identified PSA markers were genotyped by standard PCR and electrophoretic separation of DNA fragments. Tables 2 and 3 show the sequences of the PCR primers and the reaction conditions for the autosomal PSA and sex-linked markers, respectively. Most of these markers are restriction site polymorphisms, which are detected by digestion with the appropriate restriction enzyme after PCR. All of these loci, except FY-Null and ICAM, were scored after electrophoresis through agarose gels. The fragments generated by the FY and ICAM digestions were smaller and required electrophoresis through polyacrylamide gels for accurate fragment sizing.Statistical Analysis
The admixture proportions of the African American and European American populations were estimated by means of the weighted least squares (WLS) (Long 1991) and gene identity (Chakraborty 1975) methods. Long's method incorporates the effect of the evolutionary and sampling variance in the admixture estimates and a X2 test of heterogeneity of admixture estimates from the different loci. A computer program implementing this method (ADMIX.PAS) was kindly provided by Dr. Jeffrey C. Long (National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health). Dr. Ranajit Chakraborty (University of Texas Health Science Center, Houston) kindly provided a program (ADMIX2.FOR) for the estimation of admixture proportions by means of the gene identity method.
Haplotype frequencies and gametic disequilibrium coefficients for pairs of loci were estimated by means of an expectation maximization algorithm described by Long et al. (1995). Hypothesis testing was performed with the likelihood ratio statistic (G2), which has a X2 distribution for large sample sizes. Alternatively, by a data-resampling approach, this program estimates the distribution of test statistics for the observed data given there was no association (Long et al. 1995). We used a simulated distribution based on 1,000 replications. A program (3LOCUS.PAS) implementing the aforementioned method was made available by Dr. Long. D′ coefficients, in which the gametic disequilibrium (D) is standardized by the theoretical maximum disequilibrium (Dmax), were calculated on the basis of the estimated haplotype frequencies (Lewontin 1964, 1988; Thomson et al. 1988).
The fit of the genotype frequencies to the Hardy-Weinberg proportions was tested by Guo and Thomson's exact test (Guo and Thomson 1992) with the program ARLEQUIN 1.0 (Schneider et al. 1997), and the heterogeneity in the allele frequencies of the parental populations was analyzed by means of the STRUC program of the GENEPOP 2.0 computer package (Raymond and Rousset 1995).
Admixture Estimates Based on Autosomal PSAs
Using nine autosomal PSA markers, we estimated the admixture proportions in samples from several populations of African descent. We also typed parental population samples from Africa (Nigeria and Central African Republic) and Europe (England, Ireland, and Germany) to verify the PSA status of the loci, to test for intracontinental heterogeneity, and to estimate the parental allele frequencies. Table 4 shows the allele frequencies estimated for the populations typed. All of the loci we used, except GC, are biallelic, and we show the frequency of the *1 allele, following the convention that the *1 allele corresponds to the larger band on the gel because of either the presence of an insertion or the absence of a restriction enzyme cut site. In Table 4, we also show the average of the parental allele frequencies for African and European populations and the levels of allele frequency differential for each marker. We detected no systematic deviations of the genotype frequencies with respect to the Hardy-Weinberg proportions in any population or marker (data not shown).
Table 5 summarizes the admixture results for the African American and Jamaican populations that we have analyzed. Shown is the city and state where the sample was collected and the proportion of European ancestry (m) in the population, obtained by two different methods to estimate admixture—the WLS method (Long 1991) and the gene identity method (Chakraborty 1975). The results of these methods are highly concordant (r=0.9949, P<.001). The level of European admixture in these groups ranges from 6.8% in Jamaica to 22.5% in New Orleans. In the northern urban populations, we observed m values between 12.7% (Philadelphia) and 20.2% (Pittsburgh). It is important to note that two independent samples from African Americans living in Philadelphia point to a relatively low European contribution (12.7% and 13.8%, respectively). Southern African Americans show a wide range of European influence, from 11.6% (Charleston) to 22.5% (New Orleans), the lowest and highest values, respectively, observed for the U.S. populations we analyzed. Finally, the sample from Jamaica shows evidence of a much lower European genetic contribution (6.8%) than that found in any of the African American populations. The variance associated with the admixture estimates is very low for all the populations studied. By Long's method, which incorporates the effect of the evolutionary and sampling variance in the admixture estimates, the standard errors range between 1.3% (Charleston and Jamaica) and 2.7% (Detroit).
We also tested for heterogeneity in the individual locus admixture estimates within the populations sampled. The X2 test showed no evidence of significant heterogeneity in any of the populations (data not shown), and we observed no systematic deviations for any of the loci and therefore no evidence of the action of natural selection in the markers considered in the present analysis.
Admixture Estimates Based on mtDNA and Y Chromosome Data
We analyzed these 10 African American and Jamaican samples for the presence of six population-specific mtDNA haplogroups (L, H, A, B, C, and D) and the YAP element. The relevant data are summarized in Table 6. L and H are the most common haplogroups that are unique to African and European populations, respectively (Torroni et al. 1994, 1996; Chen et al. 1995), and can be used to test the relative African and European maternal contribution to African Americans and Jamaicans. The first two data columns of Table 6 indicate the m values based on the L and H haplogroups, and, in the third data column, we indicate the average mtDNA value. The European maternal contribution is lower than the average estimate obtained for the nine autosomal markers analyzed in this study (see Table 5).
Haplogroups A, B, C, and D are Amerindian-specific haplogroups that together account for almost all Amerindian mtDNAs (Wallace and Torroni 1992) and are thus especially suitable for testing the importance of the Amerindian influence in the African American maternal line. Of the >1,000 African Americans analyzed, we detected only 4 individuals with an Amerindian haplogroup. Two individuals in Maywood, one in Baltimore, and one in Houston showed the Amerindian B haplogroup. Several other samples have the 9-bp deletion, but since it appears to be associated with the L African haplogroup and lacks the characteristic pattern observed in Amerindian B haplogroups for the diagnostic sites DdeI 10394 and AluI 10397 (−−), it is most likely of African origin (Soodyall et al. 1996).
The YAP marker (Hammer 1994) is very useful for the characterization of the male European contribution, given the difference in frequency of the Alu insertion between Europeans and Africans (>80%). The m estimates are also indicated in Table 6. The male European contribution is substantially higher than the female contribution in every population, as is evident from the estimated m values obtained for YAP and mtDNA.
Demonstration of Admixture Linkage Disequilibrium between Two Markers 22 cM Apart
Two of the PSA markers used to estimate admixture (FY and AT3) are located in the same chromosomal band. In fact, mapping data show that FY and AT3 are linked at a distance of 22 cM (male distance = 18 cM and female distance = 23 cM [Cooperative Human Linkage Center, Genetic Location Database]). We created pairwise haplotypes of FY and the other eight loci to test whether there is detectable linkage disequilibrium between FY and AT3 or between FY and any of the other PSAs typed. Haplotype frequencies were estimated by means of the expectation maximization algorithm as implemented in a program provided by Dr. Long (1995). This method has proved capable of generating very accurate estimates of multilocus haplotype frequencies without families. Table 7 shows the level of D′, the likelihood ratio statistic, and the corresponding P value for significant results. A positive D′ indicates a higher-than-expected frequency of haplotypes with both African-specific alleles, and a negative D′ indicates the combination of a European allele in one locus with an African allele in the other locus. In the case of the haplotype frequencies of FY and AT3, a positive disequilibrium is consistently found in all the African American populations (with the exception of Maywood, which is in equilibrium), and in 6 of the 10 populations (New York, Baltimore, Charleston, New Orleans, Houston, and Jamaica) there are significant differences with respect to the expected frequencies. With the Bonferroni correction for multiple tests (α=0.005), the deviations are still significant in two populations (New York and New Orleans, P<.001) and border on significance in Baltimore (P=.006). We constructed haplotypes of the other seven loci with FY to test whether the significant association observed between FY and AT3 is truly a function of the linkage between these two markers or is the result of genomewide association among informative PSA markers due to substructure. In these comparisons we observe both positive and negative D′ values, and only 7 of the 70 tests show significant deviations. After the Bonferroni correction for multiple tests, none of the deviations were significant.
We have estimated the admixture proportions in 10 populations from different geographic areas in the United States and Jamaica, using a set of very informative autosomal markers. These values can be compared with those reported in the literature (Table 8). Our estimate for the Pittsburgh sample (20.2%±1.6%) is not significantly different from the one obtained by Chakraborty et al. (1992) for the same population (25.2%±2.7%), employing the identical statistical method (Long's WLS method). The m value for New York (19.8%) is also consistent with previously reported estimates (18.9%; Reed 1969). However, there are also several discrepancies with respect to data published elsewhere. Our estimate for Baltimore (15.5%) does not seem to agree with the estimates based on Rh, GM, and FY (Glass and Li 1953; Glass 1955; Workman 1968; Reed 1969), >20% in all cases. A similar situation is observed in the sample from Detroit, which shows a lower level (16.3%) in the present study than in previous studies (26%, Reed 1969). With respect to the southern populations, our m value for Charleston (11.6%) is slightly higher than previous estimates (4%–8%, Workman 1968). There are no data concerning the other populations included in this analysis (Maywood, Philadelphia, New Orleans, Houston, and Jamaica).
Previous studies have indicated that northern U.S. populations show a higher level of European ancestry than do southern U.S. populations. Nevertheless, the results of the present study seem to indicate that the situation is much more complex than previously thought. There appears to be a significant degree of variation in the admixture level of northern populations (from 13% in Philadelphia to 20% in Pittsburgh). It is also clear that, in general, the European ancestry of northern African American populations is somewhat lower than previous reports have described. The agreement of estimates based on independent African American population samples from Philadelphia is notable and strengthens the support for the accuracy of these estimates.
The three southern African American populations (New Orleans, Houston, and Charleston) show a wide range of admixture values (11.6%–22.5%). The Charleston population is of special interest because data on admixture proportions in African Americans from the former southern British colonies (South Carolina and Georgia) have been used to postulate differences in gene flow between the northern and southern African American populations. The population of Charleston shows the lowest m value (11.6%) of all the U.S. populations analyzed in the present study, but it is not very different from the estimates of one of the northern African American populations—namely, Philadelphia. It would be very interesting to have data on additional samples of southern African American populations to confirm the existence of a low European contribution in this particular area and to study the extent of heterogeneity in the admixture proportions at this geographical level.
One explanation for the lower-than-expected and heterogeneous levels of European admixture in the urban northern African Americans can be formed on the basis of the demographic history of African American populations. In the period after World War I, there were significant changes in the distribution of African Americans in the United States. In the largest internal migration in the history of North America, southern African Americans, constituting the immense majority (∼90%) of the total African American population, left the rural South in search of new opportunities in the urban areas of the North. It is known that big cities such as Chicago, Detroit, New York, Philadelphia, Pittsburgh, and Baltimore experienced a very significant increase in the number of African American residents, both in absolute and in relative terms (Johnson and Campbell 1981; Tanner 1995). Given the existence of a North/South cline in admixture proportion, the reason for the lower European admixture observed in particular populations may be due to more recent immigrants from the rural South. Unfortunately, we have no data concerning the geographic origin of the individuals in any of our samples, so there is no direct way to test this hypothesis. Further knowledge of the European genetic contribution to African American populations from additional southern states that greatly contributed to the “Great Migration” (the cotton belt states—Mississippi, Alabama, and Georgia) and the availability of northern samples with family demographic information would be important to clarify this point.
In any case, our study shows that not all the southern African American populations have as low a European genetic contribution as that found in the Charleston sample. The estimate for Houston (16.9%) is similar to other values observed in northern urban populations (Detroit and Baltimore), and New Orleans shows the highest m value of the cities studied (22.5%), which deserves special attention. The history of the Louisiana territory has been quite different from the history of other southern regions in the United States. This area was under French rule for a substantial period, until it became part of the Spanish territory in 1763 and, finally, of the United States some decades later, in 1803. Both the geographic origin of the slaves imported to Louisiana and their status during the French domination have been distinct from what happened in the southern British colonies (e.g., South Carolina). There have been historical accounts of more substantial intermixture in the New Orleans area (Williamson 1995; Piersen 1996), so this could partly explain the observed differences in ancestral proportions between Charleston and New Orleans.
Finally, we also characterized the European admixture in a sample from Jamaica, which shows a very low m value (6.8%). Further studies of Caribbean populations of African ancestry are needed to confirm this low European genetic contribution.
The standard errors of the estimates are very small, ranging from 1.3% (Charleston and Jamaica) to 2.7% (Detroit). It is not possible to directly compare the magnitude of the standard errors of our estimates with those of many of the classical estimates in the literature, which were based mainly on single markers and used a different statistical methodology (in which only the sampling error, but not the evolutionary error, was taken into account). However, we may use as a reference the paper of Chakraborty et al. (1992), in which Long's WLS method was used to analyze data on 52 alleles at 15 protein-coding loci in a sample of African Americans living in Pittsburgh. All of our estimates have a lower associated standard error (2.7%) than that reported by Chakraborty et al. (1992). This comparison stresses the importance of an appropriate selection of markers for a precise estimate of the admixture proportions. Another critical factor for admixture estimation is the representative parental population samples that are available. An inadequate selection of parental populations may seriously bias estimates of admixture. An interesting example is the well-known estimate of Glass and Li (1953) of the European gene contribution to the Baltimore population on the basis of the Rh system. The original estimate was 31%, which, in light of new data on African frequencies, was revised, 2 years later, to 22% (Glass 1955). We have typed three African samples (two from Nigeria and one from Central African Republic) and three European samples (from Great Britain, Ireland, and Germany) to estimate the parental frequencies. The aforementioned populations contributed substantially to the origin of the African American populations. These African populations are reasonably good representatives of the populations involved in the slave trade that affected a wide area of western and west central Africa. In addition, none of the markers we tested show any evidence of heterogeneity in the gene frequencies of the three samples representative of the African parental populations, which minimizes the possibility of introducing bias due to the unequal contributions of the different slave areas to the original populations of African descent living in the United States. With respect to the European samples, England, Ireland, and Germany have been main sources for the European migration to North America (Tanner 1995). Other relevant areas of Europe (e.g., Italy) are not represented in this study, but, given the known genetic homogeneity of the European populations, it is unlikely that this would affect the admixture estimates in any significant way. Supporting this is the fact that the gene frequencies of the three European American populations analyzed here are very similar to the European average frequencies (Table 4). The European samples also show homogeneity for the gene frequencies of almost all markers, with the exception of LPL and Sb19.3.
In addition to the data on the autosomal markers, further insight on the nature and dynamics of admixture may be obtained by using maternally and paternally transmitted markers (mtDNA and the nonpseudoautosomal region of the Y chromosome, respectively). The results of this analysis strongly indicate a sex-biased European contribution, in contradiction with the only other information available to date (Hsieh and Sutton 1992). In every population there is evidence of a higher European male contribution, as indicated by the m values obtained for YAP and mtDNA. Therefore, even if marriages between African American men and European American women are currently more common than marriages between African American women and European American men (see, e.g., Wilkinson 1975 and Piersen 1996), it seems clear that during a substantial part of African American history, men of European descent have made a more significant genetic contribution to the African American gene pool than have women of European descent. This is in accordance with the historical data regarding the period of slavery in the United States (Williamson 1995).
We have also tried to clarify the extent of the Amerindian contribution to the African American gene pool. There have been accounts of substantial contact among North American Indians and people of African descent in specific periods of U.S. history, especially in regions such as the Mississippi delta and Florida (Katz 1986). Some early anthropological reports have emphasized the high proportion of African American college students claiming some Amerindian ancestry (Herskovits 1930; Meier 1949). In fact, the importance of the Amerindian contribution to the African American gene pool has been a matter of controversy since the first studies of African American admixture (Roberts 1955; Glass 1955). However, practically all admixture studies of African American populations to date have employed a dihybrid model (African/European) instead of a trihybrid model (African/European/Amerindian). We tested our African American samples for the presence of the common Amerindian-specific mtDNA haplogroups (A, B, C, and D), and detected just 4 individuals with an Amerindian haplogroup, among >1,000 African Americans. This indicates that the contribution from Amerindians has been of little importance in the 10 populations of African descent we have characterized, at least on the maternal line.
We also determined the extent of the African contribution to three European American populations from several areas in the United States: Detroit, Pittsburgh, and Louisiana (Cajuns). The presence of the FY null allele in the three populations clearly indicates an introgression of African genes into the European American gene pool, but the African contribution globally seems to have been very limited, 1% (mean ± SEM: Detroit 0.5%±0.7%; Pittsburgh 1.2%±0.9%; and Cajuns 0.7%±0.6%).
Application of Admixed Populations for Mapping Disease Genes
In the last few years, interest in admixed populations has been increasing. In 1988, Chakraborty and Weiss described a new method for mapping disease genes, using admixed populations. This method is based in the linkage disequilibrium created when two ethnically distinct populations hybridize, and it should be very useful for mapping disease genes showing high prevalence differences among the parental populations. Non–insulin-dependent diabetes mellitus and obesity (disproportionately common among Hispanics and African Americans), hypertension (among African Americans), lung and prostate cancer (among African Americans), and other anthropological traits could be studied by this method. Stephens et al. (1994) and Briscoe et al. (1994) further extended the work of Chakraborty and Weiss 1988, using computer simulations, and introduced the acronym “MALD” (mapping by admixture linkage disequilibrium) to designate this method. Their results indicated that, using sample sizes of 200–300 patients, typed for 200–300 evenly spaced markers, each having >30% frequency difference between the parental populations, one would have a >95% chance of locating the causative gene. Our own simulations (unpublished data) show that microsatellites would be as informative as dimorphic markers for MALD studies. It has also been proposed by McKeigue (1997) and Kaplan et al. (1997) that the linkage disequilibrium that results from recent admixture could also be used to detect disease genes for qualitative or quantitative traits by means of the transmission disequilibrium test (Spielman et al. 1993; Allison 1997). The aforementioned theoretical studies predict that linkage disequilibrium would be detectable in admixed populations even between relatively distant markers (10–20 cM). As described above, we detected a significant nonrandom association between two of the PSAs analyzed in the present study: FY-null and AT3, located on the large arm of chromosome 1 at a distance of 22 cM. The most likely explanation is that this association is the result of admixture linkage disequilibrium that was generated through the hybridization of the parental populations of these African American populations and has persisted, extending over a distance of 22 cM. Population substructure could also potentially result in nonrandom associations among such PSA loci. However, if this were the cause of the FY/AT3 associations observed in these populations, we would expect to detect a higher number of significant associations for the other seven loci and would also expect to observe deviations from Hardy-Weinberg equilibrium. This observation of a significant linkage disequilibrium over such long distances as a result of admixture is encouraging, and it emphasizes the utility of admixed populations as an important resource for mapping disease genes.
The present study indicates that the admixture process that began with the arrival of the first Africans at the British colonies >250 years ago has been very complex. Even if our data tend to corroborate the existence of differences in the extent of European contribution to southern and nonsouthern African Americans, it seems that recent demographic processes that have dramatically changed the distribution of African Americans in the United States have substantially altered the global picture. Of special importance has been the Great Migration, a massive movement of African Americans from the rural areas in the South to the urban areas in the North, which took place after the World War I. Thus, it is possible that the differences in the admixture proportion observed among African American samples in northern cities is a consequence of the different percentages of African Americans of southern origin currently living in those areas. In addition, the substantial differences in the histories of the diverse areas of the United States may account for the variation observed in the admixture proportions. Such seems to be the case for New Orleans, which shows a much higher European contribution than that found in Charleston.
Admixed populations are an important resource that can and should be used to study the genetics of complex disease. A prerequisite to this application is a better understanding of the admixture proportions and dynamics of the admixture process. We have established a panel of genetic markers that have high levels of allele frequency differential between the parental populations and low levels of heterogeneity within continents. We propose that these markers could serve as a core marker panel for future studies of admixture in additional population samples. It is notable that most of these markers also show substantial frequency differences between African and Amerindian populations (data not shown) and should thus also be useful for estimating the African contribution to U.S. Hispanic and Amerindian populations. Use of a common set of informative markers for studies of admixture will make it possible to compile data from a number of populations and laboratories to construct a U.S. admixture map.
Link to this census document.
|Click for Realhistoryww Home Page|