Background

The neotropical tree species Cedrela odorata is a target of illegal logging and has been documented in regional timber trade for over 250 years (Pennington and Muellner, 2010). Of these species, Cedrela odorata (Spanish cedar; cedro), is the second most traded neotropical tree species, following bigleaf mahogany (Swietenia macrophylla). In 2001, C. odorata was listed under the protections of CITES Appendix III requiring validated documentation of species identity and source for both export and import documentation, protecting populations in Bolivia, Brazil, Colombia, Guatemala, and Peru (Ferriss, 2014). Consequently, two other species (C. fissilis Vell. and C. lilloi syn. angustifolia) have also been afforded protection under CITES Appendix III, since their survival is threatened due to strong similarities with C. odorata and its trade-demand (Ferriss, 2014; UNEP-WCMC, 2015). Despite the anticipated increase in protection for C. odorata and its “look-alikes”, law enforcement of CITES regulations and the continuance of CITES protections requires credible estimates of species presence in trade, accurate geographic distributions, and up-to-date extinction risk (Text of the Convention; Contracting States, 1973). However, differentiating Cedrela species is challenging, and differentiating species based on wood tissue alone is nearly impossible, even for experts. The data presented here are part of a larger project to provide genomic resources for Cedrela, including a SNP genotyping assay for species and origin identification.

Although the geographic distribution of Cedrela odorata is widespread, occurring from Mexico to northern Argentina (24º N to 27º S) and including the islands of the Caribbean (Pennington and Muellner, 2010), it primarily exists in low-density stands with maybe one individual per hectare in Peru, Costa Rica, Colombia, and Guyana (Tajikistan 2019). Strong light dependency and pests in its native range lead to poor natural regeneration of stands and fragmented populations (Cintron, 1990; Pennington and Muellner, 2010). For example, reduced population connectivity may limit the spread of regional genotypes that are tolerant to the mahogany shoot borer, Hypsipyla grandella Zeller (Newton et al., 1999). Deforestation has led to a nearly 30% decline in the global distribution of C. odorata in the last 100 years, and that estimate is expected to increase by over 10% in the next 100 years (Tajikistan 2019).The primary threat to C. odorata and other Cedrela is decreased population connectivity from exploitative logging (Muellner et al., 2010, 2011; Pennington and Muellner, 2010; Cavers et al., 2013), and the demand for Cedrela wood is insatible.

Oddly, Cedrela grow very well outside of their native range, even being considered invasive in some areas. This could be a example of the, “enemy release hypothesis,” in action, where species escape the primary threat to their population growth outside of their native range (e.g., the mahogany shoot borer). One place where Cedrela are considered invasive is the Galapagos archipeligo (Ecuador). Part of this project is to determine the source of this population, and this is the data presented here and the reseach question for this session.

Data presented here are next generation sequence data from dissertation research (KNF advised by: Andy Jones [Oregon State University], Rich Cronn [USDA Forest Service PNW Research Station]) and an active collaboration with Gonzalo Rivas-Torres and Maria de Lordes Torres of Universidad San Francisco de Quito. Dissertation specimens were collected from the Missouri Botanical Garden Herbarium, and Galapagos/Ecuador specimens of interest were provided by Rivas-Torres and de Lourdes-Torres. Specimen sequence data was either generated by target capture, amplicon sequencing, or both.

The full specimen alignment contained 282 specimens and nearly 1 million SNPs. Variant call format file (vcf) was generated with BCFtools mpileup Filtering was performed with VCFtools: (biallelic, minimum depth = 100, 0% missing data, and 1 SNP per 10kb). For the exercise, I focused on Ecuador.

The vcf provided here contains 127 individuals and 725 SNPs.

#To begin, set your working directory to the directory containing this R script manually or with the following code.

setwd("~/Documents/UW/Finch_R_tutorials/DAPC_ADMIXTURE/")

#read in the meta data for the specimens (contains species and geographic information)
meta<-read.csv("20220213_snp_seminar_samples.csv",header=1)

#we will label the next map with country label, so let's read those in now:
Labs<-read.csv("country_labels_2020.csv",header=1)

#we will use the following packages to build a map
library(ggplot2)
library(maps)
library(maptools)
library(raster)
library(rgdal)
library(rgeos) 
library(scales)
library(sp)

Learn more about ggplot2 from the cheat sheet: https://www.maths.usyd.edu.au/u/UG/SM/STAT3022/r/current/Misc/data-visualization-2.1.pdf

#The following lines bring in the shapefiles for the map (i.e., coutries). Maps were drawn using the base map shapefiles from the World Borders Dataset http://thematicmapping.org/.

worldMap <- readOGR(dsn="map_files/TM_WORLD_BORDERS-0.3.shp", layer="TM_WORLD_BORDERS-0.3")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/finchkri/Documents/UW/Finch_R_tutorials/DAPC_ADMIXTURE/map_files/TM_WORLD_BORDERS-0.3.shp", layer: "TM_WORLD_BORDERS-0.3"
## with 246 features
## It has 11 fields
## Integer64 fields read as strings:  POP2005
worldMap.fort <- fortify(worldMap, region = "ISO3")
idList <- worldMap@data$ISO3
centroids.df <- as.data.frame(coordinates(worldMap))
names(centroids.df) <- c("Longitude", "Latitude")
popList <- worldMap@data$POP2005
pop.df <- data.frame(id = idList, centroids.df)

Zoomed out version of the map showing specimens color coded by species names appearing on herbarium specimen labels or provided by the collector.

ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object 
    geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
    expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
    coord_equal(xlim = c(-103,-55), ylim = c(-25, 22)) + 
    geom_point(data=meta, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
    geom_point(data = meta, aes(map_id=NA,x = long, y = lat, color=species_full), 
               size = 2, alpha=0.7)+
    guides(color = guide_legend("Species name provided"))+
    geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
    theme(legend.text = element_text(face = "italic",size=8),
          axis.text.x=element_text(size=7),
          axis.title.x=element_text(size=9),
          axis.text.y=element_text(size=7),
          axis.title.y=element_text(size=9),
          legend.background = element_blank(),
          legend.key = element_blank(),
          panel.border = element_rect(fill=NA))+
    xlab("Longitude")+ylab("Latitude")

I want to simplify the map, so we’ll make a filter list with minor species, or species that are not included in the current monograph of the genus.

#How many species and how many specimens of each?
table(meta$species_full)
## 
## Cedrela  petiolulata Cedrela angusticarpa Cedrela angustifolia 
##                    2                    4                    8 
##  Cedrela domatifolia      Cedrela falcata     Cedrela fissilis 
##                    5                    4                    3 
##  Cedrela kuelapensis   Cedrela magnifolia       Cedrela minima 
##                    1                    1                    5 
##      Cedrela montana     Cedrela nebulosa      Cedrela odorata 
##                   19                   18                   47 
##    Cedrela pubescens    Cedrela saltensis          Cedrela spp 
##                    5                    3                    2
#show me species names only as a list
levels(factor(meta$species_full))
##  [1] "Cedrela  petiolulata" "Cedrela angusticarpa" "Cedrela angustifolia"
##  [4] "Cedrela domatifolia"  "Cedrela falcata"      "Cedrela fissilis"    
##  [7] "Cedrela kuelapensis"  "Cedrela magnifolia"   "Cedrela minima"      
## [10] "Cedrela montana"      "Cedrela nebulosa"     "Cedrela odorata"     
## [13] "Cedrela pubescens"    "Cedrela saltensis"    "Cedrela spp"
#make filter list
other_spp<-list()
other_spp<-c("Cedrela  petiolulata","Cedrela angusticarpa","Cedrela domatifolia" ,"Cedrela falcata",     
             "Cedrela kuelapensis","Cedrela magnifolia", "Cedrela minima",
             "Cedrela spp","Cedrela pubescens")
library(dplyr) #load package dplyr

Learn more about dplyr (one of the best packages ever for data wrangling) https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf

#make subsetted datasets by filtering to exclude and include "lesser species"
main_taxa<-filter(meta,!species_full%in%other_spp) #! means "not"
other_taxa<-filter(meta,species_full%in%other_spp)

#add a new column called "species_simple" for the greater taxa, this matches species names
main_taxa$species_simple<-main_taxa$species_full
#for the lesser taxa, we just want to populate a column with "Other spp"
other_taxa$species_simple<-"Other spp"

#row bind these two subsets again
meta<-rbind.data.frame(main_taxa,other_taxa)

#view our new names column
head(meta)
##   vcf_index     vcf_id sampleID country inst species         species_full
## 1         5 CEAN_18375    18375 Ecuador USFQ    CEAN Cedrela angustifolia
## 2         6 CEAN_18387    18387 Ecuador USFQ    CEAN Cedrela angustifolia
## 3         7 CEAN_18439    18439 Ecuador USFQ    CEAN Cedrela angustifolia
## 4         8 CEAN_18451    18451 Ecuador USFQ    CEAN Cedrela angustifolia
## 5         9 CEAN_18481    18481 Ecuador USFQ    CEAN Cedrela angustifolia
## 6        10   CEAN_208 CEAN_208 Ecuador   MO    CEAN Cedrela angustifolia
##         lat      long     F_MISS          genetic_spp galapagos
## 1 -2.263100 -78.19230 0.00789937 Cedrela angustifolia        no
## 2 -0.202200 -78.43060 0.01476840 Cedrela angustifolia        no
## 3 -4.349170 -79.75389 0.01828880 Cedrela angustifolia        no
## 4 -0.185000 -78.48690 0.00369210 Cedrela angustifolia        no
## 5 -3.315588 -78.56496 0.00635384 Cedrela angustifolia        no
## 6 -3.830270 -79.42027 0.03769370      Cedrela montana        no
##         species_simple
## 1 Cedrela angustifolia
## 2 Cedrela angustifolia
## 3 Cedrela angustifolia
## 4 Cedrela angustifolia
## 5 Cedrela angustifolia
## 6 Cedrela angustifolia

Zoomed in version of the map with a simplified color scheme.

ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object 
    geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
    expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
    coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
    geom_point(data=meta, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
    geom_point(data = meta, aes(map_id=NA,x = long, y = lat, color=species_simple), 
               size = 2, alpha=0.7)+
    geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
    theme(legend.text = element_text(face = "italic",size=8),
          axis.text.x=element_text(size=7),
          axis.title.x=element_text(size=9),
          axis.text.y=element_text(size=7),
          axis.title.y=element_text(size=9),
          legend.background = element_blank(),
          legend.key = element_blank(),
          panel.border = element_rect(fill=NA))+
    xlab("Longitude")+ylab("Latitude")

Now that we know a little more about the data and sample distribution, we can load in the VCF and do some cool visualizations.

Next we will use vcfR and adegenet. There is so much to learn about adegenet and we will barely scratch the surface here.

Learn more: http://adegenet.r-forge.r-project.org/

Also see excellent tutorials for from code nerds at Oregon State https://knausb.github.io/vcfR_documentation/

Some of the code here was adapted from the same badass Beavers: https://grunwaldlab.github.io/Population_Genetics_in_R/gbs_analysis.html

#load vcfR and adegenet packages
library(adegenet)
library(vcfR)

#load vcf into R
vcf<-read.vcfR("20220213_snp_seminar_fil_thin.recode.vcf")
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 1279
##   header_line: 1280
##   variant count: 725
##   column count: 136
## 
Meta line 1000 read in.
Meta line 1279 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 725
##   Character matrix gt cols: 136
##   skip: 0
##   nrows: 725
##   row_num: 0
## 
Processed variant: 725
## All variants processed
#convert vcf into a genind format object used by adegenet
gind<-vcfR2genind(vcf) 

#use the str() function to examine complex objects
#str(gind)

#we don't have time to go over summary statistics, but check out adegenet materials above to learn how to do that, or ask me later.

#make sure the meta data matches the order of the vcf arrange() sorts by a column
meta<-arrange(meta,vcf_index)

#assign the species names provided by curators and collectors as the populations.
pop(gind)<-c(meta$species_full)

DAPC

DAPC is an ordination method based on genetic distances. With DAPC we can evaluate genetic structure among specimens.

set.seed(20220209) # Setting a seed for a consistent result 

#allow adegenet to find the number of clusters (up to 25) that best fit the data
grp <- find.clusters(gind, max.n.clust = 25, n.pca = 25, 
                     choose.n.clust = FALSE,criterion = "goodfit") 

#adegenet identified 10 clusters
summary(grp$grp) 
##  1  2  3  4  5  6  7  8  9 
## 13 19 12 10  7  4 19 18 25
#save this information as a dataframe
dapc_grp <- data.frame(grp$grp)

#combine the DAPC Cluster/cluster information with the meta data the vcf_id is in the row names, and we need it to be a column to combine.
dapc_grp$vcf_id<-row.names(dapc_grp)
#make a new dataframe called testing that has all our juicy data.
testing <- data.frame(left_join(dapc_grp, meta, by = "vcf_id"))
head(testing)
##   grp.grp     vcf_id vcf_index sampleID country inst species
## 1       8 CEAG_18406         1    18406 Ecuador USFQ    CEAG
## 2       8 CEAG_18407         2    18407 Ecuador USFQ    CEAG
## 3       8 CEAG_18411         3    18411 Ecuador USFQ    CEAG
## 4       8 CEAG_18445         4    18445 Ecuador USFQ    CEAG
## 5       8 CEAN_18375         5    18375 Ecuador USFQ    CEAN
## 6       8 CEAN_18387         6    18387 Ecuador USFQ    CEAN
##           species_full      lat      long     F_MISS          genetic_spp
## 1 Cedrela angusticarpa -0.02140 -78.88610 0.01210660 Cedrela angusticarpa
## 2 Cedrela angusticarpa -0.04340 -78.96030 0.07972350 Cedrela angusticarpa
## 3 Cedrela angusticarpa -0.17900 -79.03060 0.01249300 Cedrela angusticarpa
## 4 Cedrela angusticarpa  0.02583 -78.85944 0.00957369 Cedrela angusticarpa
## 5 Cedrela angustifolia -2.26310 -78.19230 0.00789937 Cedrela angustifolia
## 6 Cedrela angustifolia -0.20220 -78.43060 0.01476840 Cedrela angustifolia
##   galapagos       species_simple
## 1        no            Other spp
## 2        no            Other spp
## 3        no            Other spp
## 4        no            Other spp
## 5        no Cedrela angustifolia
## 6        no Cedrela angustifolia
#rename the DAPC Cluster column to something more intuitive
names(testing)[1]<-"dapc_grp"

DAPC is sensitive to missing information. Specimens with a lot of missing information usually cluster together and in the middle (spatially) of the DAPC scatter plot, which we will generate next.

#use dplyr to calculate mean individual-missingness ("F_MISS") for each DAPC group.
(imiss<-testing[c(1,11)]%>%group_by(dapc_grp)%>%summarise_all(funs(mean)))
## # A tibble: 9 × 2
##   dapc_grp   F_MISS
##   <fct>       <dbl>
## 1 1        0.000258
## 2 2        0.00134 
## 3 3        0.00465 
## 4 4        0.00331 
## 5 5        0.00609 
## 6 6        0.000676
## 7 7        0.0105  
## 8 8        0.0157  
## 9 9        0.00711

This level of missingness should not be an issue >10% would be problematic. Technically we don’t have any missing information because I filtered those SNPs out, but I included the estimates from the full vcf (282 individuals) to demonstrate this code.

#build the DAPC ordination
dapc1 <- dapc(gind, grp$grp, n.pca = 20, n.da = 6) 

#plot dapc
scatter(dapc1, posi.da="bottomleft")

These colors make no sense to me because they are from R base plots. I want to use the ggplot2 default color scheme and keep the color assignments consistent for the rest of our plots. We’ll make a scale of 10 hues using hex codes.

library(scales) 
n1 <- 10  # Amount of default colors you want (10 because 10 clusters)
hex_codes1 <- hue_pal()(n1) # Identify hex codes
hex_codes1 #our color list
##  [1] "#F8766D" "#D89000" "#A3A500" "#39B600" "#00BF7D" "#00BFC4" "#00B0F6"
##  [8] "#9590FF" "#E76BF3" "#FF62BC"
show_col(hex_codes1) #ggplot2 default is an inaccessible palette for colorblind readers and should be avoided in final versions of figures. 

#apply the colors
scatter(dapc1, posi.da="bottomleft",col=hex_codes1)

The bar plot shows us that the largest proportion of variation (genetic distance) is captured by discriminant axis (DA) 1.

#let's filter with the DAPC Clusters to see what species are in each and where the
#Galapagos specimens ended up.
filter(testing,dapc_grp ==1)#C. nebulosa + C. odorata s.l. 
##    dapc_grp   vcf_id vcf_index sampleID country inst species     species_full
## 1         1 CENE_129        51 CENE_129 Ecuador   MO    CENE Cedrela nebulosa
## 2         1 CENE_199        61 CENE_199 Ecuador   MO    CENE Cedrela nebulosa
## 3         1 CENE_209        62 CENE_209 Ecuador   MO    CENE Cedrela nebulosa
## 4         1 CENE_218        63 CENE_218    Peru   MO    CENE Cedrela nebulosa
## 5         1 CENE_280        64 CENE_280 Ecuador   MO    CENE Cedrela nebulosa
## 6         1 CENE_306        65 CENE_306 Ecuador   MO    CENE Cedrela nebulosa
## 7         1  CENE_39        66  CENE_39 Ecuador   MO    CENE Cedrela nebulosa
## 8         1   CENE_8        68   CENE_8 Ecuador   MO    CENE Cedrela nebulosa
## 9         1 CEOD_168        79 CEOD_168 Ecuador   MO    CEOD  Cedrela odorata
## 10        1 CEOD_241        86 CEOD_241 Ecuador   MO    CEOD  Cedrela odorata
## 11        1 CEOD_282        93 CEOD_282    Peru   MO    CEOD  Cedrela odorata
## 12        1  CEOD_47       102  CEOD_47 Ecuador   MO    CEOD  Cedrela odorata
## 13        1  CEOD_73       108  CEOD_73 Ecuador   MO    CEOD  Cedrela odorata
##         lat      long      F_MISS        genetic_spp galapagos   species_simple
## 1  -0.66666 -77.66666 0.000214657   Cedrela nebulosa        no Cedrela nebulosa
## 2  -0.36666 -78.86666 0.000171725   Cedrela nebulosa        no Cedrela nebulosa
## 3  -0.30000 -77.78333 0.000729833   Cedrela nebulosa        no Cedrela nebulosa
## 4  -5.01666 -78.90000 0.000000000   Cedrela nebulosa        no Cedrela nebulosa
## 5  -4.18333 -78.91666 0.000000000   Cedrela nebulosa        no Cedrela nebulosa
## 6  -4.71694 -78.96305 0.000085900   Cedrela nebulosa        no Cedrela nebulosa
## 7  -0.63333 -77.85000 0.000042900   Cedrela nebulosa        no Cedrela nebulosa
## 8  -1.96666 -77.81666 0.000472245   Cedrela nebulosa        no Cedrela nebulosa
## 9  -1.96667 -77.81667 0.000300519 Cedrela odorata sl        no  Cedrela odorata
## 10 -3.41666 -78.58333 0.000128794 Cedrela odorata sl        no  Cedrela odorata
## 11 -5.75000 -77.66666 0.000472245   Cedrela nebulosa        no  Cedrela odorata
## 12 -1.96666 -77.81666 0.000085900 Cedrela odorata sl        no  Cedrela odorata
## 13 -0.71666 -77.65000 0.000643970   Cedrela nebulosa        no  Cedrela odorata
filter(testing,dapc_grp ==2)#C. odorata s.l. 
##    dapc_grp   vcf_id vcf_index sampleID country inst species     species_full
## 1         2 CEFI_112        22 CEFI_112 Ecuador   MO    CEFI Cedrela fissilis
## 2         2 CEOD_118        70 CEOD_118    Peru   MO    CEOD  Cedrela odorata
## 3         2 CEOD_126        71 CEOD_126    Peru   MO    CEOD  Cedrela odorata
## 4         2 CEOD_142        73 CEOD_142    Peru   MO    CEOD  Cedrela odorata
## 5         2 CEOD_146        74 CEOD_146    Peru   MO    CEOD  Cedrela odorata
## 6         2 CEOD_157        75 CEOD_157    Peru   MO    CEOD  Cedrela odorata
## 7         2  CEOD_15        76  CEOD_15    Peru   MO    CEOD  Cedrela odorata
## 8         2 CEOD_164        77 CEOD_164 Ecuador   MO    CEOD  Cedrela odorata
## 9         2 CEOD_165        78 CEOD_165    Peru   MO    CEOD  Cedrela odorata
## 10        2 CEOD_173        81 CEOD_173 Ecuador   MO    CEOD  Cedrela odorata
## 11        2 CEOD_189        84 CEOD_189    Peru   MO    CEOD  Cedrela odorata
## 12        2 CEOD_260        89 CEOD_260 Ecuador   MO    CEOD  Cedrela odorata
## 13        2 CEOD_261        90 CEOD_261 Ecuador   MO    CEOD  Cedrela odorata
## 14        2 CEOD_295        96 CEOD_295 Ecuador   MO    CEOD  Cedrela odorata
## 15        2  CEOD_49       103  CEOD_49 Ecuador   MO    CEOD  Cedrela odorata
## 16        2   CEOD_4       104   CEOD_4    Peru   MO    CEOD  Cedrela odorata
## 17        2  CEOD_66       106  CEOD_66 Ecuador   MO    CEOD  Cedrela odorata
## 18        2  CEOD_93       109  CEOD_93    Peru   MO    CEOD  Cedrela odorata
## 19        2  CEOD_95       110  CEOD_95    Peru   MO    CEOD  Cedrela odorata
##         lat      long      F_MISS        genetic_spp galapagos   species_simple
## 1  -0.03333 -76.70000 0.002747610 Cedrela odorata sl        no Cedrela fissilis
## 2  -3.83333 -73.33333 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 3  -4.16666 -73.50000 0.000214657 Cedrela odorata sl        no  Cedrela odorata
## 4  -3.83333 -73.33333 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 5  -3.75000 -73.25000 0.000042900 Cedrela odorata sl        no  Cedrela odorata
## 6  -3.91666 -73.58333 0.003219850 Cedrela odorata sl        no  Cedrela odorata
## 7  -3.74805 -73.24722 0.014510800 Cedrela odorata sl        no  Cedrela odorata
## 8  -0.92111 -75.36639 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 9  -3.80000 -73.41666 0.000901558 Cedrela odorata sl        no  Cedrela odorata
## 10 -1.48333 -77.45000 0.000042900 Cedrela odorata sl        no  Cedrela odorata
## 11 -3.66666 -72.91666 0.000858627 Cedrela odorata sl        no  Cedrela odorata
## 12 -2.33333 -76.33333 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 13 -1.56666 -77.41666 0.001030350 Cedrela odorata sl        no  Cedrela odorata
## 14 -0.50000 -76.83333 0.000085900 Cedrela odorata sl        no  Cedrela odorata
## 15 -1.50000 -78.50000 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 16 -3.85444 -73.41388 0.001073280 Cedrela odorata sl        no  Cedrela odorata
## 17 -1.06666 -77.61666 0.000729833 Cedrela odorata sl        no  Cedrela odorata
## 18 -2.50000 -75.83333 0.000042900 Cedrela odorata sl        no  Cedrela odorata
## 19 -3.33333 -71.81666 0.000042900 Cedrela odorata sl        no  Cedrela odorata
filter(testing,dapc_grp ==3)#C. odorata s.l.  
##    dapc_grp   vcf_id vcf_index sampleID  country inst species     species_full
## 1         3  CEFI_61        24  CEFI_61     Peru   MO    CEFI Cedrela fissilis
## 2         3 CEOD_107        69 CEOD_107  Ecuador   MO    CEOD  Cedrela odorata
## 3         3 CEOD_128        72 CEOD_128 Colombia   MO    CEOD  Cedrela odorata
## 4         3 CEOD_172        80 CEOD_172     Peru   MO    CEOD  Cedrela odorata
## 5         3 CEOD_242        87 CEOD_242 Colombia   MO    CEOD  Cedrela odorata
## 6         3  CEOD_25        88  CEOD_25  Ecuador   MO    CEOD  Cedrela odorata
## 7         3 CEOD_285        94 CEOD_285  Ecuador   MO    CEOD  Cedrela odorata
## 8         3 CEOD_296        97 CEOD_296 Colombia   MO    CEOD  Cedrela odorata
## 9         3   CEOD_2        98   CEOD_2  Ecuador   MO    CEOD  Cedrela odorata
## 10        3 CEOD_301        99 CEOD_301     Peru   MO    CEOD  Cedrela odorata
## 11        3  CEOD_38       101  CEOD_38     Peru   MO    CEOD  Cedrela odorata
## 12        3  CEOD_51       105  CEOD_51  Ecuador   MO    CEOD  Cedrela odorata
##         lat      long      F_MISS        genetic_spp galapagos   species_simple
## 1  -2.91667 -76.25000 0.000085900 Cedrela odorata sl        no Cedrela fissilis
## 2  -0.43333 -76.51667 0.014811300 Cedrela odorata sl        no  Cedrela odorata
## 3   4.88333 -75.83333 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 4  -5.16000 -78.28889 0.029966100 Cedrela odorata sl        no  Cedrela odorata
## 5   3.63333 -76.71667 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 6  -1.06666 -77.60000 0.000515176 Cedrela odorata sl        no  Cedrela odorata
## 7  -1.50000 -77.95000 0.007770570 Cedrela odorata sl        no  Cedrela odorata
## 8   6.91666 -75.06666 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 9  -0.36667 -76.55000 0.000000000 Cedrela odorata sl        no  Cedrela odorata
## 10 -4.91666 -78.31666 0.001974840 Cedrela odorata sl        no  Cedrela odorata
## 11 -5.01000 -78.34000 0.000686902 Cedrela odorata sl        no  Cedrela odorata
## 12 -0.65000 -76.43333 0.000000000 Cedrela odorata sl        no  Cedrela odorata
filter(testing,dapc_grp ==4)#USFQ provided, similar to Galapagos
##    dapc_grp     vcf_id vcf_index sampleID country inst species     species_full
## 1         4 CEFA_18332        18    18332 Ecuador USFQ    CEFA  Cedrela falcata
## 2         4 CEMI_18423        27    18423 Ecuador USFQ    CEMI   Cedrela minima
## 3         4 CEMI_18424        28    18424 Ecuador USFQ    CEMI   Cedrela minima
## 4         4 CEMI_18426        29    18426 Ecuador USFQ    CEMI   Cedrela minima
## 5         4 CEMI_18429        30    18429 Ecuador USFQ    CEMI   Cedrela minima
## 6         4 CEMI_18453        31    18453 Ecuador USFQ    CEMI   Cedrela minima
## 7         4 CENE_18343        55    18343 Ecuador USFQ    CENE Cedrela nebulosa
## 8         4 CENE_18366        56    18366 Ecuador USFQ    CENE Cedrela nebulosa
## 9         4 CENE_18372        57    18372 Ecuador USFQ    CENE Cedrela nebulosa
## 10        4 CEOD_18377        83    18377 Ecuador USFQ    CEOD  Cedrela odorata
##         lat      long     F_MISS        genetic_spp galapagos   species_simple
## 1   0.07380 -77.23960 0.00300519    Cedrela falcata        no        Other spp
## 2   0.21922 -79.88104 0.00180312     Cedrela minima        no        Other spp
## 3   0.24578 -79.86882 0.00120208     Cedrela minima        no        Other spp
## 4   0.64255 -79.96195 0.00111622     Cedrela minima        no        Other spp
## 5   0.38845 -79.64418 0.00201777     Cedrela minima        no        Other spp
## 6   0.23878 -78.81811 0.02193790     Cedrela minima        no        Other spp
## 7  -0.72889 -77.49560 0.00000000   Cedrela nebulosa        no Cedrela nebulosa
## 8  -1.36650 -77.95270 0.00141673   Cedrela nebulosa        no Cedrela nebulosa
## 9  -1.75210 -77.84550 0.00064397   Cedrela nebulosa        no Cedrela nebulosa
## 10 -2.02000 -77.94720 0.00000000 Cedrela odorata sl        no  Cedrela odorata
filter(testing,dapc_grp ==5)#contains Galapagos Cedrela + C. petiolutata 
##   dapc_grp       vcf_id vcf_index sampleID country inst species
## 1        5     CEOD_332       100 CEOD_302 Ecuador   MO    CEOD
## 2        5   CEPE_18422       111    18422 Ecuador USFQ    CEPE
## 3        5   CEPE_18425       112    18425 Ecuador USFQ    CEPE
## 4        5     CO26_STX       123 CO26.STX Ecuador USFQ    CEOD
## 5        5      CO7_SCR       124  CO7.SCR Ecuador USFQ    CEOD
## 6        5  COFL20_S220       125   COFL20 Ecuador USFQ    CEOD
## 7        5 COISA10_S234       126  COISA10 Ecuador USFQ    CEOD
##           species_full       lat      long     F_MISS          genetic_spp
## 1      Cedrela odorata -0.416660 -76.61666 0.00553814   Cedrela odorata sl
## 2 Cedrela  petiolulata  0.035840 -79.95113 0.00000000 Cedrela  petiolulata
## 3 Cedrela  petiolulata  0.622910 -79.90772 0.00141673 Cedrela  petiolulata
## 4      Cedrela odorata -0.615320 -90.36913 0.01365220    galapagos odorata
## 5      Cedrela odorata -0.877033 -89.43773 0.00905852    galapagos odorata
## 6      Cedrela odorata -1.315420 -90.44492 0.01111920    galapagos odorata
## 7      Cedrela odorata -0.812400 -91.05247 0.00184605    galapagos odorata
##   galapagos  species_simple
## 1        no Cedrela odorata
## 2        no       Other spp
## 3        no       Other spp
## 4       yes Cedrela odorata
## 5       yes Cedrela odorata
## 6       yes Cedrela odorata
## 7       yes Cedrela odorata
filter(testing,dapc_grp ==6)#random - need more data to explain these than included here 
##   dapc_grp   vcf_id vcf_index sampleID country inst species     species_full
## 1        6  CEFI_35        23  CEFI_35    Peru   MO    CEFI Cedrela fissilis
## 2        6  CEOD_18        85  CEOD_18 Ecuador   MO    CEOD  Cedrela odorata
## 3        6  CEOD_26        91  CEOD_26 Ecuador   MO    CEOD  Cedrela odorata
## 4        6 CEOD_291        95 CEOD_291 Ecuador   MO    CEOD  Cedrela odorata
##        lat      long      F_MISS        genetic_spp galapagos   species_simple
## 1 -4.15000 -80.61666 0.002275360   Cedrela fissilis        no Cedrela fissilis
## 2 -0.01666 -77.50000 0.000300519 Cedrela odorata sl        no  Cedrela odorata
## 3 -0.01667 -77.50000 0.000128794 Cedrela odorata sl        no  Cedrela odorata
## 4 -2.00000 -79.96666 0.000000000 Cedrela odorata ss        no  Cedrela odorata
filter(testing,dapc_grp ==7)#C. montana
##    dapc_grp   vcf_id vcf_index sampleID  country inst species
## 1         7 CEAN_208        10 CEAN_208  Ecuador   MO    CEAN
## 2         7  CEAN_58        11  CEAN_58     Peru   MO    CEAN
## 3         7 CEMO_108        32 CEMO_108  Ecuador   MO    CEMO
## 4         7 CEMO_109        33 CEMO_109  Ecuador   MO    CEMO
## 5         7 CEMO_133        34 CEMO_133  Ecuador   MO    CEMO
## 6         7 CEMO_215        39 CEMO_215  Ecuador   MO    CEMO
## 7         7 CEMO_225        40 CEMO_225  Ecuador   MO    CEMO
## 8         7 CEMO_232        41 CEMO_232  Ecuador   MO    CEMO
## 9         7 CEMO_239        42 CEMO_239  Ecuador   MO    CEMO
## 10        7 CEMO_248        43 CEMO_248  Ecuador   MO    CEMO
## 11        7  CEMO_27        44  CEMO_27  Ecuador   MO    CEMO
## 12        7 CEMO_304        45 CEMO_304 Colombia   MO    CEMO
## 13        7  CEMO_50        46  CEMO_50 Colombia   MO    CEMO
## 14        7  CEMO_55        47  CEMO_55  Ecuador   MO    CEMO
## 15        7  CEMO_62        48  CEMO_62  Ecuador   MO    CEMO
## 16        7  CEMO_68        49  CEMO_68  Ecuador   MO    CEMO
## 17        7  CEMO_77        50  CEMO_77  Ecuador   MO    CEMO
## 18        7 CEOD_271        92 CEOD_271  Ecuador   MO    CEOD
## 19        7   CEOD_6       107   CEOD_6  Ecuador   MO    CEOD
##            species_full      lat      long      F_MISS     genetic_spp
## 1  Cedrela angustifolia -3.83027 -79.42027 0.037693700 Cedrela montana
## 2  Cedrela angustifolia -9.13333 -77.75000 0.000515176 Cedrela montana
## 3       Cedrela montana -0.21666 -78.80000 0.001803120 Cedrela montana
## 4       Cedrela montana -0.45000 -77.91667 0.000128794 Cedrela montana
## 5       Cedrela montana  0.08333 -78.50000 0.001631390 Cedrela montana
## 6       Cedrela montana  0.21666 -78.25000 0.000643970 Cedrela montana
## 7       Cedrela montana -0.25000 -78.50000 0.001502600 Cedrela montana
## 8       Cedrela montana -0.33333 -78.50000 0.089211400 Cedrela montana
## 9       Cedrela montana -3.60916 -79.69666 0.002704680 Cedrela montana
## 10      Cedrela montana -0.23138 -78.80277 0.001330870 Cedrela montana
## 11      Cedrela montana -4.38500 -79.12555 0.000128794 Cedrela montana
## 12      Cedrela montana  4.31667 -74.30000 0.001202080 Cedrela montana
## 13      Cedrela montana  4.01805 -75.88027 0.000214657 Cedrela montana
## 14      Cedrela montana  0.33333 -78.43333 0.000257588 Cedrela montana
## 15      Cedrela montana -1.50000 -78.45000 0.057012800 Cedrela montana
## 16      Cedrela montana -0.38333 -78.01667 0.000558108 Cedrela montana
## 17      Cedrela montana -0.60000 -78.45000 0.000987421 Cedrela montana
## 18      Cedrela odorata -0.21666 -78.80000 0.000300519 Cedrela montana
## 19      Cedrela odorata -0.71666 -77.66666 0.000901558 Cedrela montana
##    galapagos       species_simple
## 1         no Cedrela angustifolia
## 2         no Cedrela angustifolia
## 3         no      Cedrela montana
## 4         no      Cedrela montana
## 5         no      Cedrela montana
## 6         no      Cedrela montana
## 7         no      Cedrela montana
## 8         no      Cedrela montana
## 9         no      Cedrela montana
## 10        no      Cedrela montana
## 11        no      Cedrela montana
## 12        no      Cedrela montana
## 13        no      Cedrela montana
## 14        no      Cedrela montana
## 15        no      Cedrela montana
## 16        no      Cedrela montana
## 17        no      Cedrela montana
## 18        no      Cedrela odorata
## 19        no      Cedrela odorata
filter(testing,dapc_grp ==8)#C. angustifolia + C. montana 
##    dapc_grp     vcf_id vcf_index sampleID country inst species
## 1         8 CEAG_18406         1    18406 Ecuador USFQ    CEAG
## 2         8 CEAG_18407         2    18407 Ecuador USFQ    CEAG
## 3         8 CEAG_18411         3    18411 Ecuador USFQ    CEAG
## 4         8 CEAG_18445         4    18445 Ecuador USFQ    CEAG
## 5         8 CEAN_18375         5    18375 Ecuador USFQ    CEAN
## 6         8 CEAN_18387         6    18387 Ecuador USFQ    CEAN
## 7         8 CEAN_18439         7    18439 Ecuador USFQ    CEAN
## 8         8 CEAN_18451         8    18451 Ecuador USFQ    CEAN
## 9         8 CEAN_18481         9    18481 Ecuador USFQ    CEAN
## 10        8 CEMO_18317        35    18317 Ecuador USFQ    CEMO
## 11        8 CEMO_18319        36    18319 Ecuador USFQ    CEMO
## 12        8 CEMO_18324        37    18324 Ecuador USFQ    CEMO
## 13        8 CEMO_18383        38    18383 Ecuador USFQ    CEMO
## 14        8 CEPU_18398       113    18398 Ecuador USFQ    CEPU
## 15        8 CEPU_18418       114    18418 Ecuador USFQ    CEPU
## 16        8 CEPU_18420       115    18420 Ecuador USFQ    CEPU
## 17        8 CEPU_18444       116    18444 Ecuador USFQ    CEPU
## 18        8 CEPU_18458       117    18458 Ecuador USFQ    CEPU
##            species_full       lat      long     F_MISS          genetic_spp
## 1  Cedrela angusticarpa -0.021400 -78.88610 0.01210660 Cedrela angusticarpa
## 2  Cedrela angusticarpa -0.043400 -78.96030 0.07972350 Cedrela angusticarpa
## 3  Cedrela angusticarpa -0.179000 -79.03060 0.01249300 Cedrela angusticarpa
## 4  Cedrela angusticarpa  0.025830 -78.85944 0.00957369 Cedrela angusticarpa
## 5  Cedrela angustifolia -2.263100 -78.19230 0.00789937 Cedrela angustifolia
## 6  Cedrela angustifolia -0.202200 -78.43060 0.01476840 Cedrela angustifolia
## 7  Cedrela angustifolia -4.349170 -79.75389 0.01828880 Cedrela angustifolia
## 8  Cedrela angustifolia -0.185000 -78.48690 0.00369210 Cedrela angustifolia
## 9  Cedrela angustifolia -3.315588 -78.56496 0.00635384 Cedrela angustifolia
## 10      Cedrela montana -0.399600 -78.04770 0.00197484      Cedrela montana
## 11      Cedrela montana -0.451600 -77.94320 0.01008890      Cedrela montana
## 12      Cedrela montana -0.467050 -77.91430 0.07362730      Cedrela montana
## 13      Cedrela montana -1.227200 -78.62090 0.00523763      Cedrela montana
## 14    Cedrela pubescens  0.156200 -78.81880 0.00459365    Cedrela pubescens
## 15    Cedrela pubescens -0.231040 -78.75646 0.00403555    Cedrela pubescens
## 16    Cedrela pubescens -0.929580 -79.05002 0.00249002    Cedrela pubescens
## 17    Cedrela pubescens  0.034440 -78.72139 0.00970249    Cedrela pubescens
## 18    Cedrela pubescens  0.323850 -78.75555 0.00674022    Cedrela pubescens
##    galapagos       species_simple
## 1         no            Other spp
## 2         no            Other spp
## 3         no            Other spp
## 4         no            Other spp
## 5         no Cedrela angustifolia
## 6         no Cedrela angustifolia
## 7         no Cedrela angustifolia
## 8         no Cedrela angustifolia
## 9         no Cedrela angustifolia
## 10        no      Cedrela montana
## 11        no      Cedrela montana
## 12        no      Cedrela montana
## 13        no      Cedrela montana
## 14        no            Other spp
## 15        no            Other spp
## 16        no            Other spp
## 17        no            Other spp
## 18        no            Other spp
filter(testing,dapc_grp ==9)#C. angustifolia + C. montana
##    dapc_grp     vcf_id vcf_index  sampleID country inst species
## 1         9    CEAN_64        12   CEAN_64    Peru   MO    CEAN
## 2         9 CEDO_18329        13     18329 Ecuador USFQ    CEDO
## 3         9 CEDO_18330        14     18330 Ecuador USFQ    CEDO
## 4         9 CEDO_18379        15     18379 Ecuador USFQ    CEDO
## 5         9 CEDO_18448        16     18448 Ecuador USFQ    CEDO
## 6         9 CEDO_18449        17     18449 Ecuador USFQ    CEDO
## 7         9 CEFA_18351        19     18351 Ecuador USFQ    CEFA
## 8         9 CEFA_18352        20     18352 Ecuador USFQ    CEFA
## 9         9 CEFA_18358        21     18358 Ecuador USFQ    CEFA
## 10        9 CEKU_18288        25     18288 Ecuador USFQ    CEKU
## 11        9 CEMA_18480        26     18480 Ecuador USFQ    CEMA
## 12        9 CENE_18296        52     18296 Ecuador USFQ    CENE
## 13        9 CENE_18326        53     18326 Ecuador USFQ    CENE
## 14        9 CENE_18334        54     18334 Ecuador USFQ    CENE
## 15        9 CENE_18474        58     18474 Ecuador USFQ    CENE
## 16        9 CENE_18476        59     18476 Ecuador USFQ    CENE
## 17        9 CENE_18479        60     18479 Ecuador USFQ    CENE
## 18        9    CENE_43        67   CENE_43 Ecuador   MO    CENE
## 19        9 CEOD_18359        82     18359 Ecuador USFQ    CEOD
## 20        9 CESA_18469       118     18469 Ecuador USFQ    CESA
## 21        9 CESA_18470       119     18470 Ecuador USFQ    CESA
## 22        9 CESA_18472       120     18472 Ecuador USFQ    CESA
## 23        9   CLOSF1_1       121 CLOSF1-01    Peru PERU    UNKN
## 24        9   CLOSF3_1       122 CLOSF3-01    Peru PERU    UNKN
## 25        9  COT8_S215       127      COT8 Ecuador USFQ    CEOD
##            species_full       lat      long      F_MISS          genetic_spp
## 1  Cedrela angustifolia -5.681380 -79.26694 0.000558108 Cedrela angustifolia
## 2   Cedrela domatifolia -0.038890 -77.44500 0.000000000  Cedrela domatifolia
## 3   Cedrela domatifolia -0.038500 -77.44540 0.003048130  Cedrela domatifolia
## 4   Cedrela domatifolia -2.043800 -77.90540 0.003219850  Cedrela domatifolia
## 5   Cedrela domatifolia -0.038150 -77.44522 0.018675100  Cedrela domatifolia
## 6   Cedrela domatifolia -1.414346 -77.72346 0.007470060  Cedrela domatifolia
## 7       Cedrela falcata -0.813360 -76.89020 0.001502600      Cedrela falcata
## 8       Cedrela falcata -0.836990 -76.89780 0.001245010      Cedrela falcata
## 9       Cedrela falcata -0.706780 -77.08010 0.000386382      Cedrela falcata
## 10  Cedrela kuelapensis -4.359000 -79.82940 0.001287940  Cedrela kuelapensis
## 11   Cedrela magnifolia -3.315290 -78.56510 0.000257588   Cedrela magnifolia
## 12     Cedrela nebulosa -0.671080 -77.79249 0.000171725     Cedrela nebulosa
## 13     Cedrela nebulosa -0.450100 -77.87540 0.000000000     Cedrela nebulosa
## 14     Cedrela nebulosa  0.313200 -77.46400 0.000085900     Cedrela nebulosa
## 15     Cedrela nebulosa -4.398020 -78.93543 0.001202080     Cedrela nebulosa
## 16     Cedrela nebulosa -3.813780 -78.76078 0.000515176     Cedrela nebulosa
## 17     Cedrela nebulosa -3.335330 -78.56057 0.002447090     Cedrela nebulosa
## 18     Cedrela nebulosa -2.983330 -78.38333 0.041342900     Cedrela nebulosa
## 19      Cedrela odorata -0.892290 -77.18190 0.004336070   Cedrela odorata sl
## 20    Cedrela saltensis -3.867930 -79.10236 0.000042900    Cedrela saltensis
## 21    Cedrela saltensis -3.874294 -79.09540 0.000000000    Cedrela saltensis
## 22    Cedrela saltensis -3.890730 -79.09817 0.000858627    Cedrela saltensis
## 23          Cedrela spp -4.119167 -72.18444 0.000343451          Cedrela spp
## 24          Cedrela spp -4.255340 -71.76125 0.088138100          Cedrela spp
## 25      Cedrela odorata -0.634750 -76.15532 0.000515176    galapagos odorata
##    galapagos       species_simple
## 1         no Cedrela angustifolia
## 2         no            Other spp
## 3         no            Other spp
## 4         no            Other spp
## 5         no            Other spp
## 6         no            Other spp
## 7         no            Other spp
## 8         no            Other spp
## 9         no            Other spp
## 10        no            Other spp
## 11        no            Other spp
## 12        no     Cedrela nebulosa
## 13        no     Cedrela nebulosa
## 14        no     Cedrela nebulosa
## 15        no     Cedrela nebulosa
## 16        no     Cedrela nebulosa
## 17        no     Cedrela nebulosa
## 18        no     Cedrela nebulosa
## 19        no      Cedrela odorata
## 20        no    Cedrela saltensis
## 21        no    Cedrela saltensis
## 22        no    Cedrela saltensis
## 23        no            Other spp
## 24        no            Other spp
## 25        no      Cedrela odorata
filter(testing,dapc_grp ==10)#one Galapagos + C. nebulosa + C. saltensis
##  [1] dapc_grp       vcf_id         vcf_index      sampleID       country       
##  [6] inst           species        species_full   lat            long          
## [11] F_MISS         genetic_spp    galapagos      species_simple
## <0 rows> (or 0-length row.names)

DAPC Visualizations

library(reshape2)
#melt the data (convert short form to long form) and examine composition of species
counts <- table(testing$dapc_grp, testing$species_simple)
m_counts<-melt(counts)
#ignore warning that melt() is soon deprecated :(

ggplot()+geom_bar(aes(x=Var2,y=value,fill=factor(Var1)),data=m_counts,stat="identity")+
  guides(fill = guide_legend("DAPC Cluster"))+
  theme_classic()+
  theme(
    legend.position = "top",
    axis.text.y=element_text(face="italic"))+
  labs(x="Species",y="Genotype Counts")+coord_flip()

counts <- table(testing$dapc_grp, testing$genetic_spp)
m_counts<-melt(counts)

In the following plot, I have reassigned some specimens to genetic groups where available, and I have labeled “galapagos odorata” as we saw, group 5 and group 9 contain Galapagos trees.

ggplot()+geom_bar(aes(x=Var2,y=value,fill=factor(Var1)),data=m_counts,stat="identity")+
  guides(fill = guide_legend("DAPC Cluster"))+
  theme_classic()+
  theme(
    legend.position = "top",
    axis.text.y=element_text(face="italic"))+
  labs(x="Genetic Species",y="Genotype Counts")+coord_flip()

#do the same for provided species names
counts <- table(testing$dapc_grp, testing$species_full)
m_counts<-melt(counts)

As you can see Cedrela is a cluster fuck. Very few species groups are exclusively one cluster. Are these nomial species distinct genetic lineages? TBD!

#We can do the same for country
counts <- table(testing$dapc_grp, testing$country)
m_counts<-melt(counts)

ggplot()+geom_bar(aes(x=Var2,y=value,fill=factor(Var1)),data=m_counts,stat="identity")+
  guides(fill = guide_legend("DAPC Cluster"))+
  theme_classic()+
  theme(
    legend.position = "top",
    axis.text.y=element_text(face="italic"))+
  labs(x="Country",y="Genotype Counts")+coord_flip()

Let’s plot the map again to see the distribution fo the DAPC clusters.

ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object 
    geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
    expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
    coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
    geom_point(data=testing, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
    geom_point(data = testing, aes(map_id=NA,x = long, y = lat, color=dapc_grp), 
               size = 2, alpha=0.7)+
    geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
  guides(color = guide_legend("DAPC Cluster"))+
  theme(axis.text.x=element_text(size=7),
          axis.title.x=element_text(size=9),
          axis.text.y=element_text(size=7),
          axis.title.y=element_text(size=9),
          legend.background = element_blank(),
          legend.key = element_blank(),
          panel.border = element_rect(fill=NA))+
    xlab("Longitude")+ylab("Latitude")

It looks like a potential source of the Galapagos Cedrela is coastal Ecuador, but let’s make the map a bit less busy so we can view this better.

#subset data to include clusters we saw to be close to the Galapagos samples in DAPC space (genetic distance).
#use subset() with | meaning "or"
galapagos<-subset(testing,dapc_grp == 5 | dapc_grp == 9 | dapc_grp == 4)

ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object 
    geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
    expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
    coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
    geom_point(data=galapagos, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
    geom_point(data = galapagos, aes(map_id=NA,x = long, y = lat, color=dapc_grp), 
               size = 2, alpha=0.7)+
    scale_color_manual(values=c("#F8766D","#00BFC4","#00B0F6"), name="DAPC Cluster")+
    geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
    theme(legend.text = element_text(face = "italic",size=8),
          axis.text.x=element_text(size=7),
          axis.title.x=element_text(size=9),
          axis.text.y=element_text(size=7),
          axis.title.y=element_text(size=9),
          legend.background = element_blank(),
          legend.key = element_blank(),
          panel.border = element_rect(fill=NA))+
    xlab("Longitude")+ylab("Latitude")

ADMIXTURE

Let’s dig into this a bit more with ADMIXTURE. Like STRUCTURE, ADMIXTURE estimates the optimal number of ancestral populations (K) among individuals and membership coefficients to ancestral populations. I performed ancestry prediction with ADMIXTURE for K values of 1 (one ancestral population) through to 10 (ten ancestral populations). The optimal K (number of ancestral populations) was determined via five-fold cross-validation and measured in prediction error (Alexander et al., 2009). I plot ancestry proportions for each individual as bar plots.

Today we will just look at K = 2, 3, 4, and 7. These are, to me, the most interesting.

#load cross validation estimates for each K
cv<-read.table("snp_seminar_admixture/snp_seminar_fil_thin_cv.txt",header=1)

#plot CV errors
ggplot(aes(x=K,y=CV),data=cv)+geom_line()+geom_point()+
  scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10))+
  theme_bw()+ylab("ADMIXTURE Cross Validation Error")

#low error indicates better fitting K. 

#read in ADMIXTURE outputs ending in .Q. Q is the ancestry proportion for each ancestral population.
K2Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.2.Q",header=FALSE)
K3Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.3.Q",header=FALSE)
K4Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.4.Q",header=FALSE)
K7Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.7.Q",header=FALSE)

#add vcf IS from meta, we know meta is sorted by vcf_id because we did that
#earlier.
K2Q$vcf_id<-meta$vcf_id
K3Q$vcf_id<-meta$vcf_id
K4Q$vcf_id<-meta$vcf_id
K7Q$vcf_id<-meta$vcf_id

#join the ADMIXTURE outputs with the rest of our data. 
K2Q<-left_join(testing,K2Q)
K3Q<-left_join(testing,K3Q)
K4Q<-left_join(testing,K4Q)
K7Q<-left_join(testing,K7Q)

#melt the data into long form for plotting. We just want the vcf_id, dapc
#clusters, and the K columns (V1-V10) containing the Q proportions.
m_K2Q<-melt(K2Q[,c("vcf_id","dapc_grp","V1","V2")])
m_K3Q<-melt(K3Q[,c("vcf_id","dapc_grp","V1","V2","V3")])
m_K4Q<-melt(K4Q[,c("vcf_id","dapc_grp","V1","V2","V3","V4")])
m_K7Q<-melt(K7Q[,c("vcf_id","dapc_grp","V1","V2","V3","V4","V5","V6","V7")])
#ignore errors

#fix the names
names(m_K2Q)<-c("vcf_id","dapc_grp","K_layer","Q")
names(m_K3Q)<-c("vcf_id","dapc_grp","K_layer","Q")
names(m_K4Q)<-c("vcf_id","dapc_grp","K_layer","Q")
names(m_K7Q)<-c("vcf_id","dapc_grp","K_layer","Q")

head(m_K2Q)
##       vcf_id dapc_grp K_layer        Q
## 1 CEAG_18406        8      V1 0.999990
## 2 CEAG_18407        8      V1 0.999990
## 3 CEAG_18411        8      V1 0.999990
## 4 CEAG_18445        8      V1 0.980049
## 5 CEAN_18375        8      V1 0.999990
## 6 CEAN_18387        8      V1 0.999990
head(m_K7Q)
##       vcf_id dapc_grp K_layer        Q
## 1 CEAG_18406        8      V1 0.000010
## 2 CEAG_18407        8      V1 0.100886
## 3 CEAG_18411        8      V1 0.000010
## 4 CEAG_18445        8      V1 0.000010
## 5 CEAN_18375        8      V1 0.000010
## 6 CEAN_18387        8      V1 0.000010

ADMIXTURE bar plots GALAPAGOS SPECIMENS ARE UNDER DAPC CLUSTER 7

note:ADMIXTURE doesn’t know the color scheme we have assigned, and CANNOT keep color assignments consistent between runs. The same group of specimens might be assigned ancestral population 4 out of 6 for K=6 and ancestral population 2 out of 7 for K=7. For your final manuscript figures, it is up to you to make the colors cohesive. This is typically done manually, and it is a real pain. Good luck.

(K2<-ggplot(m_K2Q, aes(x=vcf_id, y=Q, fill=K_layer))+
   geom_bar(stat='identity',show.legend=FALSE) + 
   #scale_fill_manual(values = c("#bf812d","#4daf4a")) + 
   facet_grid(~dapc_grp, scales = "free")+
   theme_classic()+
   labs(x="Species")+
   geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
             size=3, data=m_K2Q)+
   #theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
   theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
   ggtitle("K=2"))

(K3<-ggplot(m_K3Q, aes(x=vcf_id, y=Q, fill=K_layer))+
   geom_bar(stat='identity',show.legend=FALSE) + 
   #scale_fill_manual(values = c("#bf812d","#4daf4a")) + 
   facet_grid(~dapc_grp, scales = "free")+
   theme_classic()+
   labs(x="Species")+
   geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
             size=3, data=m_K3Q)+
   #theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
   theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
   ggtitle("K=3"))

(K4<-ggplot(m_K4Q, aes(x=vcf_id, y=Q, fill=K_layer))+
   geom_bar(stat='identity',show.legend=FALSE) + 
   #scale_fill_manual(values = c("#bf812d","#4daf4a")) + 
   facet_grid(~dapc_grp, scales = "free")+
   theme_classic()+
   labs(x="Species")+
   geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
             size=3, data=m_K4Q)+
   #theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
   theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
   ggtitle("K=4"))

(K7<-ggplot(m_K7Q, aes(x=vcf_id, y=Q, fill=K_layer))+
   geom_bar(stat='identity',show.legend=FALSE) + 
   #scale_fill_manual(values = c("#bf812d","#4daf4a")) + 
   facet_grid(~dapc_grp, scales = "free")+
   theme_classic()+
   labs(x="Species")+
   geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
             size=3, data=m_K7Q)+
   #theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
   theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
   ggtitle("K=7"))

Use grid.arrange to look at these barplots side by side (or rather vertically).

library(gridExtra)
grid.arrange(K2, K3, K4, K7,ncol=1)

Let’s make some ultra-classic pie chart maps with package scatterpie.

library(scatterpie)

#we can plot the pies with or without black outline
(K2_map<-ggplot(pop.df, aes(map_id = id)) + 
    geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
    expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
    coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
    theme_bw()+
    geom_scatterpie(aes(x=long, y=lat),color="black",
                    data=K2Q, cols=c("V1","V2"),alpha=.7)+
    #geom_scatterpie(aes(x=long, y=lat),color="black",
    #                data=K2Q, cols=c("V1","V2"),alpha=.7)+
    guides(shape = guide_legend("Species"))+
    theme(axis.text.x=element_text(size=7),
          axis.title.x=element_text(size=9),
          axis.text.y=element_text(size=7),
          axis.title.y=element_text(size=9),
          legend.position = "none")+
    xlab("Longitude")+ylab("Latitude"))

(K3_map<-ggplot(pop.df, aes(map_id = id)) + 
   geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
   expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
   coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
   theme_bw()+
   geom_scatterpie(aes(x=long, y=lat),color="black",
                   data=K3Q, cols=c("V1","V2","V3"),alpha=.7)+
   guides(shape = guide_legend("Species"))+
   theme(axis.text.x=element_text(size=7),
         axis.title.x=element_text(size=9),
         axis.text.y=element_text(size=7),
         axis.title.y=element_text(size=9),
         legend.position = "none")+
   xlab("Longitude")+ylab("Latitude"))

(K4_map<-ggplot(pop.df, aes(map_id = id)) + 
   geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
   expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
   coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
   theme_bw()+
   geom_scatterpie(aes(x=long, y=lat),color="black",
                   data=K4Q, cols=c("V1","V2","V3","V4"),alpha=.7)+
   theme(axis.text.x=element_text(size=7),
         axis.title.x=element_text(size=9),
         axis.text.y=element_text(size=7),
         axis.title.y=element_text(size=9),
         legend.position = "none")+
   xlab("Longitude")+ylab("Latitude"))

(K7_map<-ggplot(pop.df, aes(map_id = id)) + 
   geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
   expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
   coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) + 
   theme_bw()+
   geom_scatterpie(aes(x=long, y=lat),color="black",
                   data=K7Q, cols=c("V1","V2","V3","V4","V5","V6","V7"),alpha=.7)+
   guides(shape = guide_legend("Species"))+
   theme(axis.text.x=element_text(size=7),
         axis.title.x=element_text(size=9),
         axis.text.y=element_text(size=7),
         axis.title.y=element_text(size=9),
         legend.position = "none")+
   xlab("Longitude")+ylab("Latitude"))

For fun, let’s plot the barplots and the maps together.

grid.arrange(K2_map,K2,ncol=1)

grid.arrange(K3_map,K3,ncol=1)

grid.arrange(K4_map,K4,ncol=1)

grid.arrange(K7_map,K7,ncol=1)

If you would like to quit R forever after this demonstration, you are in luck. You can perform these ADMIXTURE visualizations with an R shiny app available online: https://nicocriscuolo.shinyapps.io/StructuRly/

You simply upload your .Q file outputs from ADMIXTURE and ba-da-bing, you have plots. However, the issue with color tracking genetically similar groups remains.

Literature Cited

Alexander, D. H., J. Novembre, and K. Lange. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19: 1655–1664. Cintron, B. B. 1990. Cedrela odorata L. Cedro hembra, Spanish cedar. Silvics of North America 2: 250. Ferriss, S. 2014. An analysis of trade in five CITES-listed taxa. The Royal Institute of International Affairs, Chatham House, London, England, UK. Newton, A. C., T. R. Allnutt, A. C. M. Gillies, A. J. Lowe, R. A. Ennos, A. C. Newton, T. R. Allnutt, et al. 1999. Molecular phylogeography, intraspecific variation and the conservation of tree species. Trends in Ecology & Evolution 14: 140–145. Pennington, T. D., and A. N. Muellner. 2010. A monograph of Cedrela (Meliaceae). dh books, Milborne Port, England, UK. Tajikistan, M. 2019. Proposal for amendment of Appendix I or II for CITES CoP18 Prop. 57. Consideration of proposals for amendment of appendices I and II, 1–26. CITES, Colombo, Sri Lanka. UNEP-WCMC. 2015. Overview of CITES Appendix III listings. UNEP-WCMC, Cambridge, U. K.

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] scatterpie_0.1.7 gridExtra_2.3    reshape2_1.4.4   vcfR_1.12.0     
##  [5] adegenet_2.1.5   ade4_1.7-18      dplyr_1.0.7      scales_1.1.1    
##  [9] rgeos_0.5-8      rgdal_1.5-27     raster_3.5-2     maptools_1.1-2  
## [13] sp_1.4-5         maps_3.4.0       ggplot2_3.3.5   
## 
## loaded via a namespace (and not attached):
##  [1] tidyr_1.1.4       sass_0.4.0        jsonlite_1.7.2    viridisLite_0.4.0
##  [5] splines_4.1.1     bslib_0.3.1       shiny_1.7.1       memuse_4.1-0     
##  [9] highr_0.9         yaml_2.2.1        pillar_1.6.3      lattice_0.20-45  
## [13] glue_1.4.2        digest_0.6.28     polyclip_1.10-0   promises_1.2.0.1 
## [17] ggfun_0.0.4       colorspace_2.0-2  htmltools_0.5.2   httpuv_1.6.3     
## [21] Matrix_1.3-4      plyr_1.8.6        pkgconfig_2.0.3   purrr_0.3.4      
## [25] xtable_1.8-4      tweenr_1.0.2      terra_1.4-11      later_1.3.0      
## [29] ggforce_0.3.3     tibble_3.1.5      mgcv_1.8-36       generics_0.1.0   
## [33] farver_2.1.0      ellipsis_0.3.2    withr_2.4.2       cli_3.0.1        
## [37] magrittr_2.0.1    crayon_1.4.1      mime_0.12         evaluate_0.14    
## [41] fansi_0.5.0       nlme_3.1-153      MASS_7.3-54       foreign_0.8-81   
## [45] vegan_2.5-7       tools_4.1.1       lifecycle_1.0.1   stringr_1.4.0    
## [49] munsell_0.5.0     cluster_2.1.2     compiler_4.1.1    jquerylib_0.1.4  
## [53] rlang_0.4.11      grid_4.1.1        rstudioapi_0.13   igraph_1.2.6     
## [57] labeling_0.4.2    rmarkdown_2.11    gtable_0.3.0      codetools_0.2-18 
## [61] DBI_1.1.1         R6_2.5.1          knitr_1.36        pinfsc50_1.2.0   
## [65] fastmap_1.1.0     seqinr_4.2-8      utf8_1.2.2        permute_0.9-5    
## [69] ape_5.5           stringi_1.7.5     parallel_4.1.1    Rcpp_1.0.7       
## [73] vctrs_0.3.8       tidyselect_1.1.1  xfun_0.26