Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.Following the arrival of Columbus and his contemporaries, population expansion in the Americas has proceeded at an exceptionally rapid pace, with factors such as war, slavery, disease and climate shaping human demography.Our first indication that demography could be inferred from genomic sharing among present-day Americans was the relationship we observed between US geography and the projection of state-level IBD summary statistics onto their first two principal components (PCs); PC 1 is correlated with north-south geography, and PC 2 is correlated with east-west (Fig. Following this initial observation, we turned to using IBD to discover previously unidentified population structure. We applied a weight function to each edge, setting the edge weight .
We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records.
Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow.
Points on the map with higher odds ratios indicate geographic locations that are more associated with cluster membership.
These data are made available in the public domain (Creative Commons CC0)..
Recent genetic studies of the United States and North America have drawn insights into ancient human migrations.
These insights have been primarily drawn from modelling variation in allele frequencies (for example, refs 11, 12, 13, 14, 15), which typically diverge slowly.
Principal components (PCs) are computed using kernel PCA, in which the kernel matrix is defined by total IBD between pairs of states, normalized to remove the effect of variation in within-state IBD.
US states that share high levels of IBD on average are placed closer to each other in the projection onto the first two principal components.
Since these small clusters were difficult to interpret and may correspond to subpopulations that have poor representation in our database, or to unusually over-represented families, we did not investigate them further.
To examine finer-scale population structure, we formed five sub-networks corresponding to the five largest clusters, then partitioned these sub-networks using the same clustering algorithm.