The neotropical tree species Cedrela odorata is a target of illegal logging and has been documented in regional timber trade for over 250 years (Pennington and Muellner, 2010). Of these species, Cedrela odorata (Spanish cedar; cedro), is the second most traded neotropical tree species, following bigleaf mahogany (Swietenia macrophylla). In 2001, C. odorata was listed under the protections of CITES Appendix III requiring validated documentation of species identity and source for both export and import documentation, protecting populations in Bolivia, Brazil, Colombia, Guatemala, and Peru (Ferriss, 2014). Consequently, two other species (C. fissilis Vell. and C. lilloi syn. angustifolia) have also been afforded protection under CITES Appendix III, since their survival is threatened due to strong similarities with C. odorata and its trade-demand (Ferriss, 2014; UNEP-WCMC, 2015). Despite the anticipated increase in protection for C. odorata and its “look-alikes”, law enforcement of CITES regulations and the continuance of CITES protections requires credible estimates of species presence in trade, accurate geographic distributions, and up-to-date extinction risk (Text of the Convention; Contracting States, 1973). However, differentiating Cedrela species is challenging, and differentiating species based on wood tissue alone is nearly impossible, even for experts. The data presented here are part of a larger project to provide genomic resources for Cedrela, including a SNP genotyping assay for species and origin identification.
Although the geographic distribution of Cedrela odorata is widespread, occurring from Mexico to northern Argentina (24º N to 27º S) and including the islands of the Caribbean (Pennington and Muellner, 2010), it primarily exists in low-density stands with maybe one individual per hectare in Peru, Costa Rica, Colombia, and Guyana (Tajikistan 2019). Strong light dependency and pests in its native range lead to poor natural regeneration of stands and fragmented populations (Cintron, 1990; Pennington and Muellner, 2010). For example, reduced population connectivity may limit the spread of regional genotypes that are tolerant to the mahogany shoot borer, Hypsipyla grandella Zeller (Newton et al., 1999). Deforestation has led to a nearly 30% decline in the global distribution of C. odorata in the last 100 years, and that estimate is expected to increase by over 10% in the next 100 years (Tajikistan 2019).The primary threat to C. odorata and other Cedrela is decreased population connectivity from exploitative logging (Muellner et al., 2010, 2011; Pennington and Muellner, 2010; Cavers et al., 2013), and the demand for Cedrela wood is insatible.
Oddly, Cedrela grow very well outside of their native range, even being considered invasive in some areas. This could be a example of the, “enemy release hypothesis,” in action, where species escape the primary threat to their population growth outside of their native range (e.g., the mahogany shoot borer). One place where Cedrela are considered invasive is the Galapagos archipeligo (Ecuador). Part of this project is to determine the source of this population, and this is the data presented here and the reseach question for this session.
Data presented here are next generation sequence data from dissertation research (KNF advised by: Andy Jones [Oregon State University], Rich Cronn [USDA Forest Service PNW Research Station]) and an active collaboration with Gonzalo Rivas-Torres and Maria de Lordes Torres of Universidad San Francisco de Quito. Dissertation specimens were collected from the Missouri Botanical Garden Herbarium, and Galapagos/Ecuador specimens of interest were provided by Rivas-Torres and de Lourdes-Torres. Specimen sequence data was either generated by target capture, amplicon sequencing, or both.
The full specimen alignment contained 282 specimens and nearly 1 million SNPs. Variant call format file (vcf) was generated with BCFtools mpileup Filtering was performed with VCFtools: (biallelic, minimum depth = 100, 0% missing data, and 1 SNP per 10kb). For the exercise, I focused on Ecuador.
The vcf provided here contains 127 individuals and 725 SNPs.
#To begin, set your working directory to the directory containing this R script manually or with the following code.
setwd("~/Documents/UW/Finch_R_tutorials/DAPC_ADMIXTURE/")
#read in the meta data for the specimens (contains species and geographic information)
meta<-read.csv("20220213_snp_seminar_samples.csv",header=1)
#we will label the next map with country label, so let's read those in now:
Labs<-read.csv("country_labels_2020.csv",header=1)
#we will use the following packages to build a map
library(ggplot2)
library(maps)
library(maptools)
library(raster)
library(rgdal)
library(rgeos)
library(scales)
library(sp)
Learn more about ggplot2 from the cheat sheet: https://www.maths.usyd.edu.au/u/UG/SM/STAT3022/r/current/Misc/data-visualization-2.1.pdf
#The following lines bring in the shapefiles for the map (i.e., coutries). Maps were drawn using the base map shapefiles from the World Borders Dataset http://thematicmapping.org/.
worldMap <- readOGR(dsn="map_files/TM_WORLD_BORDERS-0.3.shp", layer="TM_WORLD_BORDERS-0.3")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/finchkri/Documents/UW/Finch_R_tutorials/DAPC_ADMIXTURE/map_files/TM_WORLD_BORDERS-0.3.shp", layer: "TM_WORLD_BORDERS-0.3"
## with 246 features
## It has 11 fields
## Integer64 fields read as strings: POP2005
worldMap.fort <- fortify(worldMap, region = "ISO3")
idList <- worldMap@data$ISO3
centroids.df <- as.data.frame(coordinates(worldMap))
names(centroids.df) <- c("Longitude", "Latitude")
popList <- worldMap@data$POP2005
pop.df <- data.frame(id = idList, centroids.df)
Zoomed out version of the map showing specimens color coded by species names appearing on herbarium specimen labels or provided by the collector.
ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-103,-55), ylim = c(-25, 22)) +
geom_point(data=meta, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
geom_point(data = meta, aes(map_id=NA,x = long, y = lat, color=species_full),
size = 2, alpha=0.7)+
guides(color = guide_legend("Species name provided"))+
geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
theme(legend.text = element_text(face = "italic",size=8),
axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.background = element_blank(),
legend.key = element_blank(),
panel.border = element_rect(fill=NA))+
xlab("Longitude")+ylab("Latitude")
I want to simplify the map, so we’ll make a filter list with minor species, or species that are not included in the current monograph of the genus.
#How many species and how many specimens of each?
table(meta$species_full)
##
## Cedrela petiolulata Cedrela angusticarpa Cedrela angustifolia
## 2 4 8
## Cedrela domatifolia Cedrela falcata Cedrela fissilis
## 5 4 3
## Cedrela kuelapensis Cedrela magnifolia Cedrela minima
## 1 1 5
## Cedrela montana Cedrela nebulosa Cedrela odorata
## 19 18 47
## Cedrela pubescens Cedrela saltensis Cedrela spp
## 5 3 2
#show me species names only as a list
levels(factor(meta$species_full))
## [1] "Cedrela petiolulata" "Cedrela angusticarpa" "Cedrela angustifolia"
## [4] "Cedrela domatifolia" "Cedrela falcata" "Cedrela fissilis"
## [7] "Cedrela kuelapensis" "Cedrela magnifolia" "Cedrela minima"
## [10] "Cedrela montana" "Cedrela nebulosa" "Cedrela odorata"
## [13] "Cedrela pubescens" "Cedrela saltensis" "Cedrela spp"
#make filter list
other_spp<-list()
other_spp<-c("Cedrela petiolulata","Cedrela angusticarpa","Cedrela domatifolia" ,"Cedrela falcata",
"Cedrela kuelapensis","Cedrela magnifolia", "Cedrela minima",
"Cedrela spp","Cedrela pubescens")
library(dplyr) #load package dplyr
Learn more about dplyr (one of the best packages ever for data wrangling) https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
#make subsetted datasets by filtering to exclude and include "lesser species"
main_taxa<-filter(meta,!species_full%in%other_spp) #! means "not"
other_taxa<-filter(meta,species_full%in%other_spp)
#add a new column called "species_simple" for the greater taxa, this matches species names
main_taxa$species_simple<-main_taxa$species_full
#for the lesser taxa, we just want to populate a column with "Other spp"
other_taxa$species_simple<-"Other spp"
#row bind these two subsets again
meta<-rbind.data.frame(main_taxa,other_taxa)
#view our new names column
head(meta)
## vcf_index vcf_id sampleID country inst species species_full
## 1 5 CEAN_18375 18375 Ecuador USFQ CEAN Cedrela angustifolia
## 2 6 CEAN_18387 18387 Ecuador USFQ CEAN Cedrela angustifolia
## 3 7 CEAN_18439 18439 Ecuador USFQ CEAN Cedrela angustifolia
## 4 8 CEAN_18451 18451 Ecuador USFQ CEAN Cedrela angustifolia
## 5 9 CEAN_18481 18481 Ecuador USFQ CEAN Cedrela angustifolia
## 6 10 CEAN_208 CEAN_208 Ecuador MO CEAN Cedrela angustifolia
## lat long F_MISS genetic_spp galapagos
## 1 -2.263100 -78.19230 0.00789937 Cedrela angustifolia no
## 2 -0.202200 -78.43060 0.01476840 Cedrela angustifolia no
## 3 -4.349170 -79.75389 0.01828880 Cedrela angustifolia no
## 4 -0.185000 -78.48690 0.00369210 Cedrela angustifolia no
## 5 -3.315588 -78.56496 0.00635384 Cedrela angustifolia no
## 6 -3.830270 -79.42027 0.03769370 Cedrela montana no
## species_simple
## 1 Cedrela angustifolia
## 2 Cedrela angustifolia
## 3 Cedrela angustifolia
## 4 Cedrela angustifolia
## 5 Cedrela angustifolia
## 6 Cedrela angustifolia
Zoomed in version of the map with a simplified color scheme.
ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
geom_point(data=meta, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
geom_point(data = meta, aes(map_id=NA,x = long, y = lat, color=species_simple),
size = 2, alpha=0.7)+
geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
theme(legend.text = element_text(face = "italic",size=8),
axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.background = element_blank(),
legend.key = element_blank(),
panel.border = element_rect(fill=NA))+
xlab("Longitude")+ylab("Latitude")
Now that we know a little more about the data and sample distribution, we can load in the VCF and do some cool visualizations.
Next we will use vcfR and adegenet. There is so much to learn about adegenet and we will barely scratch the surface here.
Learn more: http://adegenet.r-forge.r-project.org/
Also see excellent tutorials for from code nerds at Oregon State https://knausb.github.io/vcfR_documentation/
Some of the code here was adapted from the same badass Beavers: https://grunwaldlab.github.io/Population_Genetics_in_R/gbs_analysis.html
#load vcfR and adegenet packages
library(adegenet)
library(vcfR)
#load vcf into R
vcf<-read.vcfR("20220213_snp_seminar_fil_thin.recode.vcf")
## Scanning file to determine attributes.
## File attributes:
## meta lines: 1279
## header_line: 1280
## variant count: 725
## column count: 136
##
Meta line 1000 read in.
Meta line 1279 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 725
## Character matrix gt cols: 136
## skip: 0
## nrows: 725
## row_num: 0
##
Processed variant: 725
## All variants processed
#convert vcf into a genind format object used by adegenet
gind<-vcfR2genind(vcf)
#use the str() function to examine complex objects
#str(gind)
#we don't have time to go over summary statistics, but check out adegenet materials above to learn how to do that, or ask me later.
#make sure the meta data matches the order of the vcf arrange() sorts by a column
meta<-arrange(meta,vcf_index)
#assign the species names provided by curators and collectors as the populations.
pop(gind)<-c(meta$species_full)
DAPC is an ordination method based on genetic distances. With DAPC we can evaluate genetic structure among specimens.
set.seed(20220209) # Setting a seed for a consistent result
#allow adegenet to find the number of clusters (up to 25) that best fit the data
grp <- find.clusters(gind, max.n.clust = 25, n.pca = 25,
choose.n.clust = FALSE,criterion = "goodfit")
#adegenet identified 10 clusters
summary(grp$grp)
## 1 2 3 4 5 6 7 8 9
## 13 19 12 10 7 4 19 18 25
#save this information as a dataframe
dapc_grp <- data.frame(grp$grp)
#combine the DAPC Cluster/cluster information with the meta data the vcf_id is in the row names, and we need it to be a column to combine.
dapc_grp$vcf_id<-row.names(dapc_grp)
#make a new dataframe called testing that has all our juicy data.
testing <- data.frame(left_join(dapc_grp, meta, by = "vcf_id"))
head(testing)
## grp.grp vcf_id vcf_index sampleID country inst species
## 1 8 CEAG_18406 1 18406 Ecuador USFQ CEAG
## 2 8 CEAG_18407 2 18407 Ecuador USFQ CEAG
## 3 8 CEAG_18411 3 18411 Ecuador USFQ CEAG
## 4 8 CEAG_18445 4 18445 Ecuador USFQ CEAG
## 5 8 CEAN_18375 5 18375 Ecuador USFQ CEAN
## 6 8 CEAN_18387 6 18387 Ecuador USFQ CEAN
## species_full lat long F_MISS genetic_spp
## 1 Cedrela angusticarpa -0.02140 -78.88610 0.01210660 Cedrela angusticarpa
## 2 Cedrela angusticarpa -0.04340 -78.96030 0.07972350 Cedrela angusticarpa
## 3 Cedrela angusticarpa -0.17900 -79.03060 0.01249300 Cedrela angusticarpa
## 4 Cedrela angusticarpa 0.02583 -78.85944 0.00957369 Cedrela angusticarpa
## 5 Cedrela angustifolia -2.26310 -78.19230 0.00789937 Cedrela angustifolia
## 6 Cedrela angustifolia -0.20220 -78.43060 0.01476840 Cedrela angustifolia
## galapagos species_simple
## 1 no Other spp
## 2 no Other spp
## 3 no Other spp
## 4 no Other spp
## 5 no Cedrela angustifolia
## 6 no Cedrela angustifolia
#rename the DAPC Cluster column to something more intuitive
names(testing)[1]<-"dapc_grp"
DAPC is sensitive to missing information. Specimens with a lot of missing information usually cluster together and in the middle (spatially) of the DAPC scatter plot, which we will generate next.
#use dplyr to calculate mean individual-missingness ("F_MISS") for each DAPC group.
(imiss<-testing[c(1,11)]%>%group_by(dapc_grp)%>%summarise_all(funs(mean)))
## # A tibble: 9 × 2
## dapc_grp F_MISS
## <fct> <dbl>
## 1 1 0.000258
## 2 2 0.00134
## 3 3 0.00465
## 4 4 0.00331
## 5 5 0.00609
## 6 6 0.000676
## 7 7 0.0105
## 8 8 0.0157
## 9 9 0.00711
This level of missingness should not be an issue >10% would be problematic. Technically we don’t have any missing information because I filtered those SNPs out, but I included the estimates from the full vcf (282 individuals) to demonstrate this code.
#build the DAPC ordination
dapc1 <- dapc(gind, grp$grp, n.pca = 20, n.da = 6)
#plot dapc
scatter(dapc1, posi.da="bottomleft")
These colors make no sense to me because they are from R base plots. I want to use the ggplot2 default color scheme and keep the color assignments consistent for the rest of our plots. We’ll make a scale of 10 hues using hex codes.
library(scales)
n1 <- 10 # Amount of default colors you want (10 because 10 clusters)
hex_codes1 <- hue_pal()(n1) # Identify hex codes
hex_codes1 #our color list
## [1] "#F8766D" "#D89000" "#A3A500" "#39B600" "#00BF7D" "#00BFC4" "#00B0F6"
## [8] "#9590FF" "#E76BF3" "#FF62BC"
show_col(hex_codes1) #ggplot2 default is an inaccessible palette for colorblind readers and should be avoided in final versions of figures.
#apply the colors
scatter(dapc1, posi.da="bottomleft",col=hex_codes1)
The bar plot shows us that the largest proportion of variation (genetic distance) is captured by discriminant axis (DA) 1.
#let's filter with the DAPC Clusters to see what species are in each and where the
#Galapagos specimens ended up.
filter(testing,dapc_grp ==1)#C. nebulosa + C. odorata s.l.
## dapc_grp vcf_id vcf_index sampleID country inst species species_full
## 1 1 CENE_129 51 CENE_129 Ecuador MO CENE Cedrela nebulosa
## 2 1 CENE_199 61 CENE_199 Ecuador MO CENE Cedrela nebulosa
## 3 1 CENE_209 62 CENE_209 Ecuador MO CENE Cedrela nebulosa
## 4 1 CENE_218 63 CENE_218 Peru MO CENE Cedrela nebulosa
## 5 1 CENE_280 64 CENE_280 Ecuador MO CENE Cedrela nebulosa
## 6 1 CENE_306 65 CENE_306 Ecuador MO CENE Cedrela nebulosa
## 7 1 CENE_39 66 CENE_39 Ecuador MO CENE Cedrela nebulosa
## 8 1 CENE_8 68 CENE_8 Ecuador MO CENE Cedrela nebulosa
## 9 1 CEOD_168 79 CEOD_168 Ecuador MO CEOD Cedrela odorata
## 10 1 CEOD_241 86 CEOD_241 Ecuador MO CEOD Cedrela odorata
## 11 1 CEOD_282 93 CEOD_282 Peru MO CEOD Cedrela odorata
## 12 1 CEOD_47 102 CEOD_47 Ecuador MO CEOD Cedrela odorata
## 13 1 CEOD_73 108 CEOD_73 Ecuador MO CEOD Cedrela odorata
## lat long F_MISS genetic_spp galapagos species_simple
## 1 -0.66666 -77.66666 0.000214657 Cedrela nebulosa no Cedrela nebulosa
## 2 -0.36666 -78.86666 0.000171725 Cedrela nebulosa no Cedrela nebulosa
## 3 -0.30000 -77.78333 0.000729833 Cedrela nebulosa no Cedrela nebulosa
## 4 -5.01666 -78.90000 0.000000000 Cedrela nebulosa no Cedrela nebulosa
## 5 -4.18333 -78.91666 0.000000000 Cedrela nebulosa no Cedrela nebulosa
## 6 -4.71694 -78.96305 0.000085900 Cedrela nebulosa no Cedrela nebulosa
## 7 -0.63333 -77.85000 0.000042900 Cedrela nebulosa no Cedrela nebulosa
## 8 -1.96666 -77.81666 0.000472245 Cedrela nebulosa no Cedrela nebulosa
## 9 -1.96667 -77.81667 0.000300519 Cedrela odorata sl no Cedrela odorata
## 10 -3.41666 -78.58333 0.000128794 Cedrela odorata sl no Cedrela odorata
## 11 -5.75000 -77.66666 0.000472245 Cedrela nebulosa no Cedrela odorata
## 12 -1.96666 -77.81666 0.000085900 Cedrela odorata sl no Cedrela odorata
## 13 -0.71666 -77.65000 0.000643970 Cedrela nebulosa no Cedrela odorata
filter(testing,dapc_grp ==2)#C. odorata s.l.
## dapc_grp vcf_id vcf_index sampleID country inst species species_full
## 1 2 CEFI_112 22 CEFI_112 Ecuador MO CEFI Cedrela fissilis
## 2 2 CEOD_118 70 CEOD_118 Peru MO CEOD Cedrela odorata
## 3 2 CEOD_126 71 CEOD_126 Peru MO CEOD Cedrela odorata
## 4 2 CEOD_142 73 CEOD_142 Peru MO CEOD Cedrela odorata
## 5 2 CEOD_146 74 CEOD_146 Peru MO CEOD Cedrela odorata
## 6 2 CEOD_157 75 CEOD_157 Peru MO CEOD Cedrela odorata
## 7 2 CEOD_15 76 CEOD_15 Peru MO CEOD Cedrela odorata
## 8 2 CEOD_164 77 CEOD_164 Ecuador MO CEOD Cedrela odorata
## 9 2 CEOD_165 78 CEOD_165 Peru MO CEOD Cedrela odorata
## 10 2 CEOD_173 81 CEOD_173 Ecuador MO CEOD Cedrela odorata
## 11 2 CEOD_189 84 CEOD_189 Peru MO CEOD Cedrela odorata
## 12 2 CEOD_260 89 CEOD_260 Ecuador MO CEOD Cedrela odorata
## 13 2 CEOD_261 90 CEOD_261 Ecuador MO CEOD Cedrela odorata
## 14 2 CEOD_295 96 CEOD_295 Ecuador MO CEOD Cedrela odorata
## 15 2 CEOD_49 103 CEOD_49 Ecuador MO CEOD Cedrela odorata
## 16 2 CEOD_4 104 CEOD_4 Peru MO CEOD Cedrela odorata
## 17 2 CEOD_66 106 CEOD_66 Ecuador MO CEOD Cedrela odorata
## 18 2 CEOD_93 109 CEOD_93 Peru MO CEOD Cedrela odorata
## 19 2 CEOD_95 110 CEOD_95 Peru MO CEOD Cedrela odorata
## lat long F_MISS genetic_spp galapagos species_simple
## 1 -0.03333 -76.70000 0.002747610 Cedrela odorata sl no Cedrela fissilis
## 2 -3.83333 -73.33333 0.000000000 Cedrela odorata sl no Cedrela odorata
## 3 -4.16666 -73.50000 0.000214657 Cedrela odorata sl no Cedrela odorata
## 4 -3.83333 -73.33333 0.000000000 Cedrela odorata sl no Cedrela odorata
## 5 -3.75000 -73.25000 0.000042900 Cedrela odorata sl no Cedrela odorata
## 6 -3.91666 -73.58333 0.003219850 Cedrela odorata sl no Cedrela odorata
## 7 -3.74805 -73.24722 0.014510800 Cedrela odorata sl no Cedrela odorata
## 8 -0.92111 -75.36639 0.000000000 Cedrela odorata sl no Cedrela odorata
## 9 -3.80000 -73.41666 0.000901558 Cedrela odorata sl no Cedrela odorata
## 10 -1.48333 -77.45000 0.000042900 Cedrela odorata sl no Cedrela odorata
## 11 -3.66666 -72.91666 0.000858627 Cedrela odorata sl no Cedrela odorata
## 12 -2.33333 -76.33333 0.000000000 Cedrela odorata sl no Cedrela odorata
## 13 -1.56666 -77.41666 0.001030350 Cedrela odorata sl no Cedrela odorata
## 14 -0.50000 -76.83333 0.000085900 Cedrela odorata sl no Cedrela odorata
## 15 -1.50000 -78.50000 0.000000000 Cedrela odorata sl no Cedrela odorata
## 16 -3.85444 -73.41388 0.001073280 Cedrela odorata sl no Cedrela odorata
## 17 -1.06666 -77.61666 0.000729833 Cedrela odorata sl no Cedrela odorata
## 18 -2.50000 -75.83333 0.000042900 Cedrela odorata sl no Cedrela odorata
## 19 -3.33333 -71.81666 0.000042900 Cedrela odorata sl no Cedrela odorata
filter(testing,dapc_grp ==3)#C. odorata s.l.
## dapc_grp vcf_id vcf_index sampleID country inst species species_full
## 1 3 CEFI_61 24 CEFI_61 Peru MO CEFI Cedrela fissilis
## 2 3 CEOD_107 69 CEOD_107 Ecuador MO CEOD Cedrela odorata
## 3 3 CEOD_128 72 CEOD_128 Colombia MO CEOD Cedrela odorata
## 4 3 CEOD_172 80 CEOD_172 Peru MO CEOD Cedrela odorata
## 5 3 CEOD_242 87 CEOD_242 Colombia MO CEOD Cedrela odorata
## 6 3 CEOD_25 88 CEOD_25 Ecuador MO CEOD Cedrela odorata
## 7 3 CEOD_285 94 CEOD_285 Ecuador MO CEOD Cedrela odorata
## 8 3 CEOD_296 97 CEOD_296 Colombia MO CEOD Cedrela odorata
## 9 3 CEOD_2 98 CEOD_2 Ecuador MO CEOD Cedrela odorata
## 10 3 CEOD_301 99 CEOD_301 Peru MO CEOD Cedrela odorata
## 11 3 CEOD_38 101 CEOD_38 Peru MO CEOD Cedrela odorata
## 12 3 CEOD_51 105 CEOD_51 Ecuador MO CEOD Cedrela odorata
## lat long F_MISS genetic_spp galapagos species_simple
## 1 -2.91667 -76.25000 0.000085900 Cedrela odorata sl no Cedrela fissilis
## 2 -0.43333 -76.51667 0.014811300 Cedrela odorata sl no Cedrela odorata
## 3 4.88333 -75.83333 0.000000000 Cedrela odorata sl no Cedrela odorata
## 4 -5.16000 -78.28889 0.029966100 Cedrela odorata sl no Cedrela odorata
## 5 3.63333 -76.71667 0.000000000 Cedrela odorata sl no Cedrela odorata
## 6 -1.06666 -77.60000 0.000515176 Cedrela odorata sl no Cedrela odorata
## 7 -1.50000 -77.95000 0.007770570 Cedrela odorata sl no Cedrela odorata
## 8 6.91666 -75.06666 0.000000000 Cedrela odorata sl no Cedrela odorata
## 9 -0.36667 -76.55000 0.000000000 Cedrela odorata sl no Cedrela odorata
## 10 -4.91666 -78.31666 0.001974840 Cedrela odorata sl no Cedrela odorata
## 11 -5.01000 -78.34000 0.000686902 Cedrela odorata sl no Cedrela odorata
## 12 -0.65000 -76.43333 0.000000000 Cedrela odorata sl no Cedrela odorata
filter(testing,dapc_grp ==4)#USFQ provided, similar to Galapagos
## dapc_grp vcf_id vcf_index sampleID country inst species species_full
## 1 4 CEFA_18332 18 18332 Ecuador USFQ CEFA Cedrela falcata
## 2 4 CEMI_18423 27 18423 Ecuador USFQ CEMI Cedrela minima
## 3 4 CEMI_18424 28 18424 Ecuador USFQ CEMI Cedrela minima
## 4 4 CEMI_18426 29 18426 Ecuador USFQ CEMI Cedrela minima
## 5 4 CEMI_18429 30 18429 Ecuador USFQ CEMI Cedrela minima
## 6 4 CEMI_18453 31 18453 Ecuador USFQ CEMI Cedrela minima
## 7 4 CENE_18343 55 18343 Ecuador USFQ CENE Cedrela nebulosa
## 8 4 CENE_18366 56 18366 Ecuador USFQ CENE Cedrela nebulosa
## 9 4 CENE_18372 57 18372 Ecuador USFQ CENE Cedrela nebulosa
## 10 4 CEOD_18377 83 18377 Ecuador USFQ CEOD Cedrela odorata
## lat long F_MISS genetic_spp galapagos species_simple
## 1 0.07380 -77.23960 0.00300519 Cedrela falcata no Other spp
## 2 0.21922 -79.88104 0.00180312 Cedrela minima no Other spp
## 3 0.24578 -79.86882 0.00120208 Cedrela minima no Other spp
## 4 0.64255 -79.96195 0.00111622 Cedrela minima no Other spp
## 5 0.38845 -79.64418 0.00201777 Cedrela minima no Other spp
## 6 0.23878 -78.81811 0.02193790 Cedrela minima no Other spp
## 7 -0.72889 -77.49560 0.00000000 Cedrela nebulosa no Cedrela nebulosa
## 8 -1.36650 -77.95270 0.00141673 Cedrela nebulosa no Cedrela nebulosa
## 9 -1.75210 -77.84550 0.00064397 Cedrela nebulosa no Cedrela nebulosa
## 10 -2.02000 -77.94720 0.00000000 Cedrela odorata sl no Cedrela odorata
filter(testing,dapc_grp ==5)#contains Galapagos Cedrela + C. petiolutata
## dapc_grp vcf_id vcf_index sampleID country inst species
## 1 5 CEOD_332 100 CEOD_302 Ecuador MO CEOD
## 2 5 CEPE_18422 111 18422 Ecuador USFQ CEPE
## 3 5 CEPE_18425 112 18425 Ecuador USFQ CEPE
## 4 5 CO26_STX 123 CO26.STX Ecuador USFQ CEOD
## 5 5 CO7_SCR 124 CO7.SCR Ecuador USFQ CEOD
## 6 5 COFL20_S220 125 COFL20 Ecuador USFQ CEOD
## 7 5 COISA10_S234 126 COISA10 Ecuador USFQ CEOD
## species_full lat long F_MISS genetic_spp
## 1 Cedrela odorata -0.416660 -76.61666 0.00553814 Cedrela odorata sl
## 2 Cedrela petiolulata 0.035840 -79.95113 0.00000000 Cedrela petiolulata
## 3 Cedrela petiolulata 0.622910 -79.90772 0.00141673 Cedrela petiolulata
## 4 Cedrela odorata -0.615320 -90.36913 0.01365220 galapagos odorata
## 5 Cedrela odorata -0.877033 -89.43773 0.00905852 galapagos odorata
## 6 Cedrela odorata -1.315420 -90.44492 0.01111920 galapagos odorata
## 7 Cedrela odorata -0.812400 -91.05247 0.00184605 galapagos odorata
## galapagos species_simple
## 1 no Cedrela odorata
## 2 no Other spp
## 3 no Other spp
## 4 yes Cedrela odorata
## 5 yes Cedrela odorata
## 6 yes Cedrela odorata
## 7 yes Cedrela odorata
filter(testing,dapc_grp ==6)#random - need more data to explain these than included here
## dapc_grp vcf_id vcf_index sampleID country inst species species_full
## 1 6 CEFI_35 23 CEFI_35 Peru MO CEFI Cedrela fissilis
## 2 6 CEOD_18 85 CEOD_18 Ecuador MO CEOD Cedrela odorata
## 3 6 CEOD_26 91 CEOD_26 Ecuador MO CEOD Cedrela odorata
## 4 6 CEOD_291 95 CEOD_291 Ecuador MO CEOD Cedrela odorata
## lat long F_MISS genetic_spp galapagos species_simple
## 1 -4.15000 -80.61666 0.002275360 Cedrela fissilis no Cedrela fissilis
## 2 -0.01666 -77.50000 0.000300519 Cedrela odorata sl no Cedrela odorata
## 3 -0.01667 -77.50000 0.000128794 Cedrela odorata sl no Cedrela odorata
## 4 -2.00000 -79.96666 0.000000000 Cedrela odorata ss no Cedrela odorata
filter(testing,dapc_grp ==7)#C. montana
## dapc_grp vcf_id vcf_index sampleID country inst species
## 1 7 CEAN_208 10 CEAN_208 Ecuador MO CEAN
## 2 7 CEAN_58 11 CEAN_58 Peru MO CEAN
## 3 7 CEMO_108 32 CEMO_108 Ecuador MO CEMO
## 4 7 CEMO_109 33 CEMO_109 Ecuador MO CEMO
## 5 7 CEMO_133 34 CEMO_133 Ecuador MO CEMO
## 6 7 CEMO_215 39 CEMO_215 Ecuador MO CEMO
## 7 7 CEMO_225 40 CEMO_225 Ecuador MO CEMO
## 8 7 CEMO_232 41 CEMO_232 Ecuador MO CEMO
## 9 7 CEMO_239 42 CEMO_239 Ecuador MO CEMO
## 10 7 CEMO_248 43 CEMO_248 Ecuador MO CEMO
## 11 7 CEMO_27 44 CEMO_27 Ecuador MO CEMO
## 12 7 CEMO_304 45 CEMO_304 Colombia MO CEMO
## 13 7 CEMO_50 46 CEMO_50 Colombia MO CEMO
## 14 7 CEMO_55 47 CEMO_55 Ecuador MO CEMO
## 15 7 CEMO_62 48 CEMO_62 Ecuador MO CEMO
## 16 7 CEMO_68 49 CEMO_68 Ecuador MO CEMO
## 17 7 CEMO_77 50 CEMO_77 Ecuador MO CEMO
## 18 7 CEOD_271 92 CEOD_271 Ecuador MO CEOD
## 19 7 CEOD_6 107 CEOD_6 Ecuador MO CEOD
## species_full lat long F_MISS genetic_spp
## 1 Cedrela angustifolia -3.83027 -79.42027 0.037693700 Cedrela montana
## 2 Cedrela angustifolia -9.13333 -77.75000 0.000515176 Cedrela montana
## 3 Cedrela montana -0.21666 -78.80000 0.001803120 Cedrela montana
## 4 Cedrela montana -0.45000 -77.91667 0.000128794 Cedrela montana
## 5 Cedrela montana 0.08333 -78.50000 0.001631390 Cedrela montana
## 6 Cedrela montana 0.21666 -78.25000 0.000643970 Cedrela montana
## 7 Cedrela montana -0.25000 -78.50000 0.001502600 Cedrela montana
## 8 Cedrela montana -0.33333 -78.50000 0.089211400 Cedrela montana
## 9 Cedrela montana -3.60916 -79.69666 0.002704680 Cedrela montana
## 10 Cedrela montana -0.23138 -78.80277 0.001330870 Cedrela montana
## 11 Cedrela montana -4.38500 -79.12555 0.000128794 Cedrela montana
## 12 Cedrela montana 4.31667 -74.30000 0.001202080 Cedrela montana
## 13 Cedrela montana 4.01805 -75.88027 0.000214657 Cedrela montana
## 14 Cedrela montana 0.33333 -78.43333 0.000257588 Cedrela montana
## 15 Cedrela montana -1.50000 -78.45000 0.057012800 Cedrela montana
## 16 Cedrela montana -0.38333 -78.01667 0.000558108 Cedrela montana
## 17 Cedrela montana -0.60000 -78.45000 0.000987421 Cedrela montana
## 18 Cedrela odorata -0.21666 -78.80000 0.000300519 Cedrela montana
## 19 Cedrela odorata -0.71666 -77.66666 0.000901558 Cedrela montana
## galapagos species_simple
## 1 no Cedrela angustifolia
## 2 no Cedrela angustifolia
## 3 no Cedrela montana
## 4 no Cedrela montana
## 5 no Cedrela montana
## 6 no Cedrela montana
## 7 no Cedrela montana
## 8 no Cedrela montana
## 9 no Cedrela montana
## 10 no Cedrela montana
## 11 no Cedrela montana
## 12 no Cedrela montana
## 13 no Cedrela montana
## 14 no Cedrela montana
## 15 no Cedrela montana
## 16 no Cedrela montana
## 17 no Cedrela montana
## 18 no Cedrela odorata
## 19 no Cedrela odorata
filter(testing,dapc_grp ==8)#C. angustifolia + C. montana
## dapc_grp vcf_id vcf_index sampleID country inst species
## 1 8 CEAG_18406 1 18406 Ecuador USFQ CEAG
## 2 8 CEAG_18407 2 18407 Ecuador USFQ CEAG
## 3 8 CEAG_18411 3 18411 Ecuador USFQ CEAG
## 4 8 CEAG_18445 4 18445 Ecuador USFQ CEAG
## 5 8 CEAN_18375 5 18375 Ecuador USFQ CEAN
## 6 8 CEAN_18387 6 18387 Ecuador USFQ CEAN
## 7 8 CEAN_18439 7 18439 Ecuador USFQ CEAN
## 8 8 CEAN_18451 8 18451 Ecuador USFQ CEAN
## 9 8 CEAN_18481 9 18481 Ecuador USFQ CEAN
## 10 8 CEMO_18317 35 18317 Ecuador USFQ CEMO
## 11 8 CEMO_18319 36 18319 Ecuador USFQ CEMO
## 12 8 CEMO_18324 37 18324 Ecuador USFQ CEMO
## 13 8 CEMO_18383 38 18383 Ecuador USFQ CEMO
## 14 8 CEPU_18398 113 18398 Ecuador USFQ CEPU
## 15 8 CEPU_18418 114 18418 Ecuador USFQ CEPU
## 16 8 CEPU_18420 115 18420 Ecuador USFQ CEPU
## 17 8 CEPU_18444 116 18444 Ecuador USFQ CEPU
## 18 8 CEPU_18458 117 18458 Ecuador USFQ CEPU
## species_full lat long F_MISS genetic_spp
## 1 Cedrela angusticarpa -0.021400 -78.88610 0.01210660 Cedrela angusticarpa
## 2 Cedrela angusticarpa -0.043400 -78.96030 0.07972350 Cedrela angusticarpa
## 3 Cedrela angusticarpa -0.179000 -79.03060 0.01249300 Cedrela angusticarpa
## 4 Cedrela angusticarpa 0.025830 -78.85944 0.00957369 Cedrela angusticarpa
## 5 Cedrela angustifolia -2.263100 -78.19230 0.00789937 Cedrela angustifolia
## 6 Cedrela angustifolia -0.202200 -78.43060 0.01476840 Cedrela angustifolia
## 7 Cedrela angustifolia -4.349170 -79.75389 0.01828880 Cedrela angustifolia
## 8 Cedrela angustifolia -0.185000 -78.48690 0.00369210 Cedrela angustifolia
## 9 Cedrela angustifolia -3.315588 -78.56496 0.00635384 Cedrela angustifolia
## 10 Cedrela montana -0.399600 -78.04770 0.00197484 Cedrela montana
## 11 Cedrela montana -0.451600 -77.94320 0.01008890 Cedrela montana
## 12 Cedrela montana -0.467050 -77.91430 0.07362730 Cedrela montana
## 13 Cedrela montana -1.227200 -78.62090 0.00523763 Cedrela montana
## 14 Cedrela pubescens 0.156200 -78.81880 0.00459365 Cedrela pubescens
## 15 Cedrela pubescens -0.231040 -78.75646 0.00403555 Cedrela pubescens
## 16 Cedrela pubescens -0.929580 -79.05002 0.00249002 Cedrela pubescens
## 17 Cedrela pubescens 0.034440 -78.72139 0.00970249 Cedrela pubescens
## 18 Cedrela pubescens 0.323850 -78.75555 0.00674022 Cedrela pubescens
## galapagos species_simple
## 1 no Other spp
## 2 no Other spp
## 3 no Other spp
## 4 no Other spp
## 5 no Cedrela angustifolia
## 6 no Cedrela angustifolia
## 7 no Cedrela angustifolia
## 8 no Cedrela angustifolia
## 9 no Cedrela angustifolia
## 10 no Cedrela montana
## 11 no Cedrela montana
## 12 no Cedrela montana
## 13 no Cedrela montana
## 14 no Other spp
## 15 no Other spp
## 16 no Other spp
## 17 no Other spp
## 18 no Other spp
filter(testing,dapc_grp ==9)#C. angustifolia + C. montana
## dapc_grp vcf_id vcf_index sampleID country inst species
## 1 9 CEAN_64 12 CEAN_64 Peru MO CEAN
## 2 9 CEDO_18329 13 18329 Ecuador USFQ CEDO
## 3 9 CEDO_18330 14 18330 Ecuador USFQ CEDO
## 4 9 CEDO_18379 15 18379 Ecuador USFQ CEDO
## 5 9 CEDO_18448 16 18448 Ecuador USFQ CEDO
## 6 9 CEDO_18449 17 18449 Ecuador USFQ CEDO
## 7 9 CEFA_18351 19 18351 Ecuador USFQ CEFA
## 8 9 CEFA_18352 20 18352 Ecuador USFQ CEFA
## 9 9 CEFA_18358 21 18358 Ecuador USFQ CEFA
## 10 9 CEKU_18288 25 18288 Ecuador USFQ CEKU
## 11 9 CEMA_18480 26 18480 Ecuador USFQ CEMA
## 12 9 CENE_18296 52 18296 Ecuador USFQ CENE
## 13 9 CENE_18326 53 18326 Ecuador USFQ CENE
## 14 9 CENE_18334 54 18334 Ecuador USFQ CENE
## 15 9 CENE_18474 58 18474 Ecuador USFQ CENE
## 16 9 CENE_18476 59 18476 Ecuador USFQ CENE
## 17 9 CENE_18479 60 18479 Ecuador USFQ CENE
## 18 9 CENE_43 67 CENE_43 Ecuador MO CENE
## 19 9 CEOD_18359 82 18359 Ecuador USFQ CEOD
## 20 9 CESA_18469 118 18469 Ecuador USFQ CESA
## 21 9 CESA_18470 119 18470 Ecuador USFQ CESA
## 22 9 CESA_18472 120 18472 Ecuador USFQ CESA
## 23 9 CLOSF1_1 121 CLOSF1-01 Peru PERU UNKN
## 24 9 CLOSF3_1 122 CLOSF3-01 Peru PERU UNKN
## 25 9 COT8_S215 127 COT8 Ecuador USFQ CEOD
## species_full lat long F_MISS genetic_spp
## 1 Cedrela angustifolia -5.681380 -79.26694 0.000558108 Cedrela angustifolia
## 2 Cedrela domatifolia -0.038890 -77.44500 0.000000000 Cedrela domatifolia
## 3 Cedrela domatifolia -0.038500 -77.44540 0.003048130 Cedrela domatifolia
## 4 Cedrela domatifolia -2.043800 -77.90540 0.003219850 Cedrela domatifolia
## 5 Cedrela domatifolia -0.038150 -77.44522 0.018675100 Cedrela domatifolia
## 6 Cedrela domatifolia -1.414346 -77.72346 0.007470060 Cedrela domatifolia
## 7 Cedrela falcata -0.813360 -76.89020 0.001502600 Cedrela falcata
## 8 Cedrela falcata -0.836990 -76.89780 0.001245010 Cedrela falcata
## 9 Cedrela falcata -0.706780 -77.08010 0.000386382 Cedrela falcata
## 10 Cedrela kuelapensis -4.359000 -79.82940 0.001287940 Cedrela kuelapensis
## 11 Cedrela magnifolia -3.315290 -78.56510 0.000257588 Cedrela magnifolia
## 12 Cedrela nebulosa -0.671080 -77.79249 0.000171725 Cedrela nebulosa
## 13 Cedrela nebulosa -0.450100 -77.87540 0.000000000 Cedrela nebulosa
## 14 Cedrela nebulosa 0.313200 -77.46400 0.000085900 Cedrela nebulosa
## 15 Cedrela nebulosa -4.398020 -78.93543 0.001202080 Cedrela nebulosa
## 16 Cedrela nebulosa -3.813780 -78.76078 0.000515176 Cedrela nebulosa
## 17 Cedrela nebulosa -3.335330 -78.56057 0.002447090 Cedrela nebulosa
## 18 Cedrela nebulosa -2.983330 -78.38333 0.041342900 Cedrela nebulosa
## 19 Cedrela odorata -0.892290 -77.18190 0.004336070 Cedrela odorata sl
## 20 Cedrela saltensis -3.867930 -79.10236 0.000042900 Cedrela saltensis
## 21 Cedrela saltensis -3.874294 -79.09540 0.000000000 Cedrela saltensis
## 22 Cedrela saltensis -3.890730 -79.09817 0.000858627 Cedrela saltensis
## 23 Cedrela spp -4.119167 -72.18444 0.000343451 Cedrela spp
## 24 Cedrela spp -4.255340 -71.76125 0.088138100 Cedrela spp
## 25 Cedrela odorata -0.634750 -76.15532 0.000515176 galapagos odorata
## galapagos species_simple
## 1 no Cedrela angustifolia
## 2 no Other spp
## 3 no Other spp
## 4 no Other spp
## 5 no Other spp
## 6 no Other spp
## 7 no Other spp
## 8 no Other spp
## 9 no Other spp
## 10 no Other spp
## 11 no Other spp
## 12 no Cedrela nebulosa
## 13 no Cedrela nebulosa
## 14 no Cedrela nebulosa
## 15 no Cedrela nebulosa
## 16 no Cedrela nebulosa
## 17 no Cedrela nebulosa
## 18 no Cedrela nebulosa
## 19 no Cedrela odorata
## 20 no Cedrela saltensis
## 21 no Cedrela saltensis
## 22 no Cedrela saltensis
## 23 no Other spp
## 24 no Other spp
## 25 no Cedrela odorata
filter(testing,dapc_grp ==10)#one Galapagos + C. nebulosa + C. saltensis
## [1] dapc_grp vcf_id vcf_index sampleID country
## [6] inst species species_full lat long
## [11] F_MISS genetic_spp galapagos species_simple
## <0 rows> (or 0-length row.names)
library(reshape2)
#melt the data (convert short form to long form) and examine composition of species
counts <- table(testing$dapc_grp, testing$species_simple)
m_counts<-melt(counts)
#ignore warning that melt() is soon deprecated :(
ggplot()+geom_bar(aes(x=Var2,y=value,fill=factor(Var1)),data=m_counts,stat="identity")+
guides(fill = guide_legend("DAPC Cluster"))+
theme_classic()+
theme(
legend.position = "top",
axis.text.y=element_text(face="italic"))+
labs(x="Species",y="Genotype Counts")+coord_flip()
counts <- table(testing$dapc_grp, testing$genetic_spp)
m_counts<-melt(counts)
In the following plot, I have reassigned some specimens to genetic groups where available, and I have labeled “galapagos odorata” as we saw, group 5 and group 9 contain Galapagos trees.
ggplot()+geom_bar(aes(x=Var2,y=value,fill=factor(Var1)),data=m_counts,stat="identity")+
guides(fill = guide_legend("DAPC Cluster"))+
theme_classic()+
theme(
legend.position = "top",
axis.text.y=element_text(face="italic"))+
labs(x="Genetic Species",y="Genotype Counts")+coord_flip()
#do the same for provided species names
counts <- table(testing$dapc_grp, testing$species_full)
m_counts<-melt(counts)
As you can see Cedrela is a cluster fuck. Very few species groups are exclusively one cluster. Are these nomial species distinct genetic lineages? TBD!
#We can do the same for country
counts <- table(testing$dapc_grp, testing$country)
m_counts<-melt(counts)
ggplot()+geom_bar(aes(x=Var2,y=value,fill=factor(Var1)),data=m_counts,stat="identity")+
guides(fill = guide_legend("DAPC Cluster"))+
theme_classic()+
theme(
legend.position = "top",
axis.text.y=element_text(face="italic"))+
labs(x="Country",y="Genotype Counts")+coord_flip()
Let’s plot the map again to see the distribution fo the DAPC clusters.
ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
geom_point(data=testing, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
geom_point(data = testing, aes(map_id=NA,x = long, y = lat, color=dapc_grp),
size = 2, alpha=0.7)+
geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
guides(color = guide_legend("DAPC Cluster"))+
theme(axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.background = element_blank(),
legend.key = element_blank(),
panel.border = element_rect(fill=NA))+
xlab("Longitude")+ylab("Latitude")
It looks like a potential source of the Galapagos Cedrela is coastal Ecuador, but let’s make the map a bit less busy so we can view this better.
#subset data to include clusters we saw to be close to the Galapagos samples in DAPC space (genetic distance).
#use subset() with | meaning "or"
galapagos<-subset(testing,dapc_grp == 5 | dapc_grp == 9 | dapc_grp == 4)
ggplot(pop.df, aes(map_id = id)) + #"id" is col in your pop.df, not in the map object
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
geom_point(data=galapagos, aes(map_id=NA,long, lat), shape=1,size = 2.5,color="black",alpha=.7)+
geom_point(data = galapagos, aes(map_id=NA,x = long, y = lat, color=dapc_grp),
size = 2, alpha=0.7)+
scale_color_manual(values=c("#F8766D","#00BFC4","#00B0F6"), name="DAPC Cluster")+
geom_label(aes(map_id=NA, x=LONG, y=LAT,label=lab),size=3, data=Labs)+
theme(legend.text = element_text(face = "italic",size=8),
axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.background = element_blank(),
legend.key = element_blank(),
panel.border = element_rect(fill=NA))+
xlab("Longitude")+ylab("Latitude")
Let’s dig into this a bit more with ADMIXTURE. Like STRUCTURE, ADMIXTURE estimates the optimal number of ancestral populations (K) among individuals and membership coefficients to ancestral populations. I performed ancestry prediction with ADMIXTURE for K values of 1 (one ancestral population) through to 10 (ten ancestral populations). The optimal K (number of ancestral populations) was determined via five-fold cross-validation and measured in prediction error (Alexander et al., 2009). I plot ancestry proportions for each individual as bar plots.
Today we will just look at K = 2, 3, 4, and 7. These are, to me, the most interesting.
#load cross validation estimates for each K
cv<-read.table("snp_seminar_admixture/snp_seminar_fil_thin_cv.txt",header=1)
#plot CV errors
ggplot(aes(x=K,y=CV),data=cv)+geom_line()+geom_point()+
scale_x_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10))+
theme_bw()+ylab("ADMIXTURE Cross Validation Error")
#low error indicates better fitting K.
#read in ADMIXTURE outputs ending in .Q. Q is the ancestry proportion for each ancestral population.
K2Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.2.Q",header=FALSE)
K3Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.3.Q",header=FALSE)
K4Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.4.Q",header=FALSE)
K7Q<-read.table("snp_seminar_admixture/20220213_snp_seminar_fil_thin.7.Q",header=FALSE)
#add vcf IS from meta, we know meta is sorted by vcf_id because we did that
#earlier.
K2Q$vcf_id<-meta$vcf_id
K3Q$vcf_id<-meta$vcf_id
K4Q$vcf_id<-meta$vcf_id
K7Q$vcf_id<-meta$vcf_id
#join the ADMIXTURE outputs with the rest of our data.
K2Q<-left_join(testing,K2Q)
K3Q<-left_join(testing,K3Q)
K4Q<-left_join(testing,K4Q)
K7Q<-left_join(testing,K7Q)
#melt the data into long form for plotting. We just want the vcf_id, dapc
#clusters, and the K columns (V1-V10) containing the Q proportions.
m_K2Q<-melt(K2Q[,c("vcf_id","dapc_grp","V1","V2")])
m_K3Q<-melt(K3Q[,c("vcf_id","dapc_grp","V1","V2","V3")])
m_K4Q<-melt(K4Q[,c("vcf_id","dapc_grp","V1","V2","V3","V4")])
m_K7Q<-melt(K7Q[,c("vcf_id","dapc_grp","V1","V2","V3","V4","V5","V6","V7")])
#ignore errors
#fix the names
names(m_K2Q)<-c("vcf_id","dapc_grp","K_layer","Q")
names(m_K3Q)<-c("vcf_id","dapc_grp","K_layer","Q")
names(m_K4Q)<-c("vcf_id","dapc_grp","K_layer","Q")
names(m_K7Q)<-c("vcf_id","dapc_grp","K_layer","Q")
head(m_K2Q)
## vcf_id dapc_grp K_layer Q
## 1 CEAG_18406 8 V1 0.999990
## 2 CEAG_18407 8 V1 0.999990
## 3 CEAG_18411 8 V1 0.999990
## 4 CEAG_18445 8 V1 0.980049
## 5 CEAN_18375 8 V1 0.999990
## 6 CEAN_18387 8 V1 0.999990
head(m_K7Q)
## vcf_id dapc_grp K_layer Q
## 1 CEAG_18406 8 V1 0.000010
## 2 CEAG_18407 8 V1 0.100886
## 3 CEAG_18411 8 V1 0.000010
## 4 CEAG_18445 8 V1 0.000010
## 5 CEAN_18375 8 V1 0.000010
## 6 CEAN_18387 8 V1 0.000010
ADMIXTURE bar plots GALAPAGOS SPECIMENS ARE UNDER DAPC CLUSTER 7
note:ADMIXTURE doesn’t know the color scheme we have assigned, and CANNOT keep color assignments consistent between runs. The same group of specimens might be assigned ancestral population 4 out of 6 for K=6 and ancestral population 2 out of 7 for K=7. For your final manuscript figures, it is up to you to make the colors cohesive. This is typically done manually, and it is a real pain. Good luck.
(K2<-ggplot(m_K2Q, aes(x=vcf_id, y=Q, fill=K_layer))+
geom_bar(stat='identity',show.legend=FALSE) +
#scale_fill_manual(values = c("#bf812d","#4daf4a")) +
facet_grid(~dapc_grp, scales = "free")+
theme_classic()+
labs(x="Species")+
geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
size=3, data=m_K2Q)+
#theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
ggtitle("K=2"))
(K3<-ggplot(m_K3Q, aes(x=vcf_id, y=Q, fill=K_layer))+
geom_bar(stat='identity',show.legend=FALSE) +
#scale_fill_manual(values = c("#bf812d","#4daf4a")) +
facet_grid(~dapc_grp, scales = "free")+
theme_classic()+
labs(x="Species")+
geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
size=3, data=m_K3Q)+
#theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
ggtitle("K=3"))
(K4<-ggplot(m_K4Q, aes(x=vcf_id, y=Q, fill=K_layer))+
geom_bar(stat='identity',show.legend=FALSE) +
#scale_fill_manual(values = c("#bf812d","#4daf4a")) +
facet_grid(~dapc_grp, scales = "free")+
theme_classic()+
labs(x="Species")+
geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
size=3, data=m_K4Q)+
#theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
ggtitle("K=4"))
(K7<-ggplot(m_K7Q, aes(x=vcf_id, y=Q, fill=K_layer))+
geom_bar(stat='identity',show.legend=FALSE) +
#scale_fill_manual(values = c("#bf812d","#4daf4a")) +
facet_grid(~dapc_grp, scales = "free")+
theme_classic()+
labs(x="Species")+
geom_text(aes(x=vcf_id, y=.5,label=vcf_id),angle=90,
size=3, data=m_K7Q)+
#theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8))+
theme(axis.text.x = element_blank(),axis.title.x = element_blank())+
ggtitle("K=7"))
Use grid.arrange to look at these barplots side by side (or rather vertically).
library(gridExtra)
grid.arrange(K2, K3, K4, K7,ncol=1)
Let’s make some ultra-classic pie chart maps with package scatterpie.
library(scatterpie)
#we can plot the pies with or without black outline
(K2_map<-ggplot(pop.df, aes(map_id = id)) +
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
theme_bw()+
geom_scatterpie(aes(x=long, y=lat),color="black",
data=K2Q, cols=c("V1","V2"),alpha=.7)+
#geom_scatterpie(aes(x=long, y=lat),color="black",
# data=K2Q, cols=c("V1","V2"),alpha=.7)+
guides(shape = guide_legend("Species"))+
theme(axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.position = "none")+
xlab("Longitude")+ylab("Latitude"))
(K3_map<-ggplot(pop.df, aes(map_id = id)) +
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
theme_bw()+
geom_scatterpie(aes(x=long, y=lat),color="black",
data=K3Q, cols=c("V1","V2","V3"),alpha=.7)+
guides(shape = guide_legend("Species"))+
theme(axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.position = "none")+
xlab("Longitude")+ylab("Latitude"))
(K4_map<-ggplot(pop.df, aes(map_id = id)) +
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
theme_bw()+
geom_scatterpie(aes(x=long, y=lat),color="black",
data=K4Q, cols=c("V1","V2","V3","V4"),alpha=.7)+
theme(axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.position = "none")+
xlab("Longitude")+ylab("Latitude"))
(K7_map<-ggplot(pop.df, aes(map_id = id)) +
geom_map(fill="lightgray",colour= "black", map = worldMap.fort) +
expand_limits(x = worldMap.fort$long, y = worldMap.fort$lat) +
coord_equal(xlim = c(-95,-69), ylim = c(-10, 10)) +
theme_bw()+
geom_scatterpie(aes(x=long, y=lat),color="black",
data=K7Q, cols=c("V1","V2","V3","V4","V5","V6","V7"),alpha=.7)+
guides(shape = guide_legend("Species"))+
theme(axis.text.x=element_text(size=7),
axis.title.x=element_text(size=9),
axis.text.y=element_text(size=7),
axis.title.y=element_text(size=9),
legend.position = "none")+
xlab("Longitude")+ylab("Latitude"))
For fun, let’s plot the barplots and the maps together.
grid.arrange(K2_map,K2,ncol=1)
grid.arrange(K3_map,K3,ncol=1)
grid.arrange(K4_map,K4,ncol=1)
grid.arrange(K7_map,K7,ncol=1)
If you would like to quit R forever after this demonstration, you are in luck. You can perform these ADMIXTURE visualizations with an R shiny app available online: https://nicocriscuolo.shinyapps.io/StructuRly/
You simply upload your .Q file outputs from ADMIXTURE and ba-da-bing, you have plots. However, the issue with color tracking genetically similar groups remains.
Alexander, D. H., J. Novembre, and K. Lange. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19: 1655–1664. Cintron, B. B. 1990. Cedrela odorata L. Cedro hembra, Spanish cedar. Silvics of North America 2: 250. Ferriss, S. 2014. An analysis of trade in five CITES-listed taxa. The Royal Institute of International Affairs, Chatham House, London, England, UK. Newton, A. C., T. R. Allnutt, A. C. M. Gillies, A. J. Lowe, R. A. Ennos, A. C. Newton, T. R. Allnutt, et al. 1999. Molecular phylogeography, intraspecific variation and the conservation of tree species. Trends in Ecology & Evolution 14: 140–145. Pennington, T. D., and A. N. Muellner. 2010. A monograph of Cedrela (Meliaceae). dh books, Milborne Port, England, UK. Tajikistan, M. 2019. Proposal for amendment of Appendix I or II for CITES CoP18 Prop. 57. Consideration of proposals for amendment of appendices I and II, 1–26. CITES, Colombo, Sri Lanka. UNEP-WCMC. 2015. Overview of CITES Appendix III listings. UNEP-WCMC, Cambridge, U. K.
sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] scatterpie_0.1.7 gridExtra_2.3 reshape2_1.4.4 vcfR_1.12.0
## [5] adegenet_2.1.5 ade4_1.7-18 dplyr_1.0.7 scales_1.1.1
## [9] rgeos_0.5-8 rgdal_1.5-27 raster_3.5-2 maptools_1.1-2
## [13] sp_1.4-5 maps_3.4.0 ggplot2_3.3.5
##
## loaded via a namespace (and not attached):
## [1] tidyr_1.1.4 sass_0.4.0 jsonlite_1.7.2 viridisLite_0.4.0
## [5] splines_4.1.1 bslib_0.3.1 shiny_1.7.1 memuse_4.1-0
## [9] highr_0.9 yaml_2.2.1 pillar_1.6.3 lattice_0.20-45
## [13] glue_1.4.2 digest_0.6.28 polyclip_1.10-0 promises_1.2.0.1
## [17] ggfun_0.0.4 colorspace_2.0-2 htmltools_0.5.2 httpuv_1.6.3
## [21] Matrix_1.3-4 plyr_1.8.6 pkgconfig_2.0.3 purrr_0.3.4
## [25] xtable_1.8-4 tweenr_1.0.2 terra_1.4-11 later_1.3.0
## [29] ggforce_0.3.3 tibble_3.1.5 mgcv_1.8-36 generics_0.1.0
## [33] farver_2.1.0 ellipsis_0.3.2 withr_2.4.2 cli_3.0.1
## [37] magrittr_2.0.1 crayon_1.4.1 mime_0.12 evaluate_0.14
## [41] fansi_0.5.0 nlme_3.1-153 MASS_7.3-54 foreign_0.8-81
## [45] vegan_2.5-7 tools_4.1.1 lifecycle_1.0.1 stringr_1.4.0
## [49] munsell_0.5.0 cluster_2.1.2 compiler_4.1.1 jquerylib_0.1.4
## [53] rlang_0.4.11 grid_4.1.1 rstudioapi_0.13 igraph_1.2.6
## [57] labeling_0.4.2 rmarkdown_2.11 gtable_0.3.0 codetools_0.2-18
## [61] DBI_1.1.1 R6_2.5.1 knitr_1.36 pinfsc50_1.2.0
## [65] fastmap_1.1.0 seqinr_4.2-8 utf8_1.2.2 permute_0.9-5
## [69] ape_5.5 stringi_1.7.5 parallel_4.1.1 Rcpp_1.0.7
## [73] vctrs_0.3.8 tidyselect_1.1.1 xfun_0.26