Applying High-Throughput Genomics
to Crops for the Developing World
Jason Wallace
Cornell University
The big picture: Global food security
Photo credit: NASA
• Food security means reliable access to food of sufficient
quality and quantity to lead an active and healthy life1
• 842 million people worldwide
are food insecure2
• Increasing food security is one
of the surest ways to improve
health, educational attainment,
and political stability
1 Paraphrased from FAO, Declaration of the World Summit on Food Security, 2009
2 FAO, The State of Food Insecurity in the World, 2013
Major constraints on food security
Environmental variability
Projected surface temperature change3
Negative side-effects
Erosion Pollution
NOAA
Deforestation
Rhett Butler
Changing consumption habits
Fat & oil
Fish
Dairy
Meat
Fruits
Cereals
Vegetables
1.0 2.0 3.0
Consumption (Billion tonnes/year) 2
1 UN Department of Economic and Social Affairs, World Population Prospects: The 2012 Revision.
3NOAA GFDL Climate Research Highlights Image Gallery2Kearney 2010, Phil Trans Roy Soc B 365
Increasing population
4
Population(billions)1
6
8
~9 billion
by 2050
10
12
2
2010 2030 2050
Today
Reaching the goal
Improved
crops
Government
Policies
Agronomic
Practices
Infrastructure
development
Technology
Development
Agroecology
Consumer
habits
Market
Incentives
Cost/megabase
$1
$0.1
$10
$100
$1K
$10K
Year
2000 2005 2010 2015
The golden age of crop genetics
• Modern sequencing is opening the floodgates to
genetic analysis
0
10
20
30
40
50
60
Genomessequenced
Total plant genomes
sequenced2
Moore’s Law
Cost of sequencing1
Sequencing trends over time
2 Michael & Jackson 2013, The Plant Genome 61 Wetterstrand KA. DNA Sequencing Costs, available at: www.genome.gov
Case studies outline
Barnyard Millet
Diversity Analysis
Pearl Millet
Genetic Map Creation
Maize
Trait Mapping
Shramajeevi Agri Films
Case studies outline
Barnyard Millet
Diversity Analysis
Pearl Millet
Genetic Map Creation
Maize
Trait Mapping
Shramajeevi Agri Films
Case Study 1:
Barnyard millet diversity
Shramajeevi Agri Films
Barnyard Millet
(Echinochloa spp.)
• Barnyard millet (Echinochloa spp.) is an important
alternative crop in southern and eastern Asia
• Two species: E. colona (India) and E. crus-galli (Japan)
• Also grown as a forage crop in the US and Japan
(“billion-dollar grass”)
• Goal: Characterize the newly created core collection at
ICRISAT using genome-wide marker data
Genotyping-by-sequencing GBS
• Created for high-throughput, semi-automated
genotyping
Sequencing adaptor
Barcode
Sticky ends
Genomic DNA
Images: Qiagen, Illumina, Elshire et al 2011, PLoS
Restriction
digest
SequenceLigate
adaptors
Isolate
DNA
Pool &
amplify
Sample
plants
• Advantages
• One step SNP discovery + genotyping
• Simple protocol; no reference required
• Large numbers of SNPs found cheaply
• Broadly applicable
• Drawbacks
• False SNPs from
sequencing errors
• Missing data from
stochastic sampling
Cleaning up the data
• Have ~20,000 SNPs after basic filtering
• Problem: Both barnyard millet species are hexaploid -> false
SNPs due to paralogs
Minor Allele Frequency
Relativeabundance
Minor Allele Frequency
Relativeabundance
Combined pop.
E. colona
E. crus-galli
Differentially
segregating alleles
Filter by
“heterozygosity”
Site Frequency Spectrum (raw) Site Frequency Spectrum (filtered)
Wallace et al. 2015, Plant Genome (in press)
Ideal
Paralogs
Phylogenetics
• Phylogeny splits the two species as expected
• Population structure within species closely
matches phylogeny and geography
E. colona E. crus-galli
Potential hybrids
Wallace et al. 2015, Plant Genome (in press)
Outline
Barnyard Millet
Diversity Analysis
Pearl Millet
Genetic Map Creation
Maize
Trait Mapping
Shramajeevi Agri Films
Genetic Maps for Pearl Millet
• Staple crop for India and Sub-saharan Africa
• Large (2.3 GB), diverse genome
• Reference genome in process
Pearl Millet
(Pennisetum glaucum)
• Goal: Assemble genetic maps to
anchor scaffolds into
pseudochromosomes
Mapping Populations
• 3 biparental populations used for genetic mapping:
• 841 x 863 (“Patancheru”)
• ~ 100 RILs from ICRISAT-Patancheru
• Tift 99B x Tift 454 (“Tifton”)
• ~ 180 RILs from Som Punnuri, Ft. Valley State University, USA
• Wild x Domestic F2s (“Sadore”)
• ~ 300 F2 plants from Boubacar Kountche, ICRISAT-Niamey
Summary statistics
Comparison of Genotyping Depths
#genotypes(logscale)
Call depth (= # reads)
100
102
104
106
108
SNP counts
0
20k
40k
60k
48k
75k 76k80k
Fewer SNPs =
less diversity
Tifton
Patancheru
Sadore
Best read depth
Making individual maps
1. Call SNPs
SNPs
1. Call SNPs
2. Group via
hierarchical
clustering
Making individual maps
1. Call SNPs
2. Group via
hierarchical
clustering
3. Merge linkage
groups
Making individual maps
1. Call SNPs
2. Group via
hierarchical
clustering
3. Merge linkage
groups
4. Order markers
Making individual maps
1. Call SNPs
2. Group via
hierarchical
clustering
3. Merge linkage
groups
4. Order markers
5. Cleanup
Making individual maps
Merge maps for final assembly
• 4824 contigs assembled
into 1.68 GB reference
• 92.8% of sequence data
• 60% have putative
orientations
• Not perfect, but pretty good
Outline
Barnyard Millet
Diversity Analysis
Pearl Millet
Genetic Map Creation
Maize
Trait Mapping
Shramajeevi Agri Films
Case Study 3:
TraitMappingintheCIMMYT WEMA Populations
• WEMA = Water-Efficient Maize for Africa
• ~20 biparental families, ~200 lines each
• Goal: Use data from across
families to map trait loci with
high resolution
3D PCA plot of the WEMA families
PC1PC2
PC3
• Two approaches to mapping traits in WEMA
Trait mapping
Env 3 Env 4Env 2Env 1
Unified Posterior Probabilities
Bayesian GWASTraditional Joint GWAS
merge
Both methods get similar results
Traditional GWAS
(-log10 p-value)
Bayesian GWAS
(cumulative Bayes factor)
• Mappings in both methods are roughly equivalent
Preliminary trait-mapping results
ZCN8
VGT1
ZmRAP2.7
?
?
GIGZ1A?
0 MB 100 MB 150 MB50 MB
?
-log10p-value
Association for Days to Anthesis (well-watered) on Chromosome 8
Conclusions
Photo credit: NASA
• Genomic technology can
rapidly characterize
almost any crop
• These genetic tools help
breed crops faster and
better
• Genotyping is basically
solved; the bottlenecks
are now phenotyping and
selection
Future Need 1:
High-throughput phenotyping
Photo credits: CIMMYT & Michael Gore
• Genotyping frequently cheaper than dirt (field space)
• Phenotyping is now the limiting factor
Manual recording Rapid phenotyping High-throughput phenotyping
Future Need 2:
Data infrastructure
• Both genotyping and
phenotyping threaten to
drown us in data.
• Data is only useful if it is
usable
• Need to develop solutions
so genotypes, phenotypes,
and germplasm are
integrated and linked
SERVER FARM IMAGE
Torkild Retvedt
Make crosses
Phenotype
yi = m +
Smzijujdj + ei
(Re)train model
Predict via modelGenotype
Standard breeding cycle
Selection cycle
(faster, less expensive)
Training cycle
(slower, expensive)
Future Need 3:
Faster breeding methods
Genomic Selection scheme
Acknowledgements
The Buckler Lab
Collaborators
• C. Tom Hash (ICRISAT-Niamey)
• Boubacar Kountche (ICRISAT-Niamey)
• Som Punnuri (Fort Valley State University)
• Hari Upadhyaya (ICRISAT-Patancheru)
• Rajeev Varshney (ICRISAT-Patancheru)
• Xin Liu (BGI)
• Xuecai Zhang (CIMMYT-Mexico)
• The Institute for Genomic Diversity (Cornell)
• The Maize Diversity Project
• The Pearl Millet Genome Sequencing Consortium
Funding
• National Science Foundation (NSF)
• Plant Genome Research Program
• Basic Research to Enable Agricultural
Development (BREAD)
• The International Crops Research Institute
for the Semi-Arid Tropics (ICRISAT)
• The International Maize and Wheat
Improvement Center (CIMMYT)
• The United States Agency for International
Development (USAID)
• The United States Department of Agriculture
Agricultural Research Service (USDA-ARS)

2015. Jason Wallace. Applying high throughput genomics to crops for the developing world

  • 1.
    Applying High-Throughput Genomics toCrops for the Developing World Jason Wallace Cornell University
  • 2.
    The big picture:Global food security Photo credit: NASA • Food security means reliable access to food of sufficient quality and quantity to lead an active and healthy life1 • 842 million people worldwide are food insecure2 • Increasing food security is one of the surest ways to improve health, educational attainment, and political stability 1 Paraphrased from FAO, Declaration of the World Summit on Food Security, 2009 2 FAO, The State of Food Insecurity in the World, 2013
  • 3.
    Major constraints onfood security Environmental variability Projected surface temperature change3 Negative side-effects Erosion Pollution NOAA Deforestation Rhett Butler Changing consumption habits Fat & oil Fish Dairy Meat Fruits Cereals Vegetables 1.0 2.0 3.0 Consumption (Billion tonnes/year) 2 1 UN Department of Economic and Social Affairs, World Population Prospects: The 2012 Revision. 3NOAA GFDL Climate Research Highlights Image Gallery2Kearney 2010, Phil Trans Roy Soc B 365 Increasing population 4 Population(billions)1 6 8 ~9 billion by 2050 10 12 2 2010 2030 2050 Today
  • 4.
  • 5.
    Cost/megabase $1 $0.1 $10 $100 $1K $10K Year 2000 2005 20102015 The golden age of crop genetics • Modern sequencing is opening the floodgates to genetic analysis 0 10 20 30 40 50 60 Genomessequenced Total plant genomes sequenced2 Moore’s Law Cost of sequencing1 Sequencing trends over time 2 Michael & Jackson 2013, The Plant Genome 61 Wetterstrand KA. DNA Sequencing Costs, available at: www.genome.gov
  • 6.
    Case studies outline BarnyardMillet Diversity Analysis Pearl Millet Genetic Map Creation Maize Trait Mapping Shramajeevi Agri Films
  • 7.
    Case studies outline BarnyardMillet Diversity Analysis Pearl Millet Genetic Map Creation Maize Trait Mapping Shramajeevi Agri Films
  • 8.
    Case Study 1: Barnyardmillet diversity Shramajeevi Agri Films Barnyard Millet (Echinochloa spp.) • Barnyard millet (Echinochloa spp.) is an important alternative crop in southern and eastern Asia • Two species: E. colona (India) and E. crus-galli (Japan) • Also grown as a forage crop in the US and Japan (“billion-dollar grass”) • Goal: Characterize the newly created core collection at ICRISAT using genome-wide marker data
  • 9.
    Genotyping-by-sequencing GBS • Createdfor high-throughput, semi-automated genotyping Sequencing adaptor Barcode Sticky ends Genomic DNA Images: Qiagen, Illumina, Elshire et al 2011, PLoS Restriction digest SequenceLigate adaptors Isolate DNA Pool & amplify Sample plants • Advantages • One step SNP discovery + genotyping • Simple protocol; no reference required • Large numbers of SNPs found cheaply • Broadly applicable • Drawbacks • False SNPs from sequencing errors • Missing data from stochastic sampling
  • 10.
    Cleaning up thedata • Have ~20,000 SNPs after basic filtering • Problem: Both barnyard millet species are hexaploid -> false SNPs due to paralogs Minor Allele Frequency Relativeabundance Minor Allele Frequency Relativeabundance Combined pop. E. colona E. crus-galli Differentially segregating alleles Filter by “heterozygosity” Site Frequency Spectrum (raw) Site Frequency Spectrum (filtered) Wallace et al. 2015, Plant Genome (in press) Ideal Paralogs
  • 11.
    Phylogenetics • Phylogeny splitsthe two species as expected • Population structure within species closely matches phylogeny and geography E. colona E. crus-galli Potential hybrids Wallace et al. 2015, Plant Genome (in press)
  • 12.
    Outline Barnyard Millet Diversity Analysis PearlMillet Genetic Map Creation Maize Trait Mapping Shramajeevi Agri Films
  • 13.
    Genetic Maps forPearl Millet • Staple crop for India and Sub-saharan Africa • Large (2.3 GB), diverse genome • Reference genome in process Pearl Millet (Pennisetum glaucum) • Goal: Assemble genetic maps to anchor scaffolds into pseudochromosomes
  • 14.
    Mapping Populations • 3biparental populations used for genetic mapping: • 841 x 863 (“Patancheru”) • ~ 100 RILs from ICRISAT-Patancheru • Tift 99B x Tift 454 (“Tifton”) • ~ 180 RILs from Som Punnuri, Ft. Valley State University, USA • Wild x Domestic F2s (“Sadore”) • ~ 300 F2 plants from Boubacar Kountche, ICRISAT-Niamey
  • 15.
    Summary statistics Comparison ofGenotyping Depths #genotypes(logscale) Call depth (= # reads) 100 102 104 106 108 SNP counts 0 20k 40k 60k 48k 75k 76k80k Fewer SNPs = less diversity Tifton Patancheru Sadore Best read depth
  • 16.
  • 17.
    1. Call SNPs 2.Group via hierarchical clustering Making individual maps
  • 18.
    1. Call SNPs 2.Group via hierarchical clustering 3. Merge linkage groups Making individual maps
  • 19.
    1. Call SNPs 2.Group via hierarchical clustering 3. Merge linkage groups 4. Order markers Making individual maps
  • 20.
    1. Call SNPs 2.Group via hierarchical clustering 3. Merge linkage groups 4. Order markers 5. Cleanup Making individual maps
  • 21.
    Merge maps forfinal assembly • 4824 contigs assembled into 1.68 GB reference • 92.8% of sequence data • 60% have putative orientations • Not perfect, but pretty good
  • 22.
    Outline Barnyard Millet Diversity Analysis PearlMillet Genetic Map Creation Maize Trait Mapping Shramajeevi Agri Films
  • 23.
    Case Study 3: TraitMappingintheCIMMYTWEMA Populations • WEMA = Water-Efficient Maize for Africa • ~20 biparental families, ~200 lines each • Goal: Use data from across families to map trait loci with high resolution 3D PCA plot of the WEMA families PC1PC2 PC3
  • 24.
    • Two approachesto mapping traits in WEMA Trait mapping Env 3 Env 4Env 2Env 1 Unified Posterior Probabilities Bayesian GWASTraditional Joint GWAS merge
  • 25.
    Both methods getsimilar results Traditional GWAS (-log10 p-value) Bayesian GWAS (cumulative Bayes factor) • Mappings in both methods are roughly equivalent
  • 26.
    Preliminary trait-mapping results ZCN8 VGT1 ZmRAP2.7 ? ? GIGZ1A? 0MB 100 MB 150 MB50 MB ? -log10p-value Association for Days to Anthesis (well-watered) on Chromosome 8
  • 27.
    Conclusions Photo credit: NASA •Genomic technology can rapidly characterize almost any crop • These genetic tools help breed crops faster and better • Genotyping is basically solved; the bottlenecks are now phenotyping and selection
  • 28.
    Future Need 1: High-throughputphenotyping Photo credits: CIMMYT & Michael Gore • Genotyping frequently cheaper than dirt (field space) • Phenotyping is now the limiting factor Manual recording Rapid phenotyping High-throughput phenotyping
  • 29.
    Future Need 2: Datainfrastructure • Both genotyping and phenotyping threaten to drown us in data. • Data is only useful if it is usable • Need to develop solutions so genotypes, phenotypes, and germplasm are integrated and linked SERVER FARM IMAGE Torkild Retvedt
  • 30.
    Make crosses Phenotype yi =m + Smzijujdj + ei (Re)train model Predict via modelGenotype Standard breeding cycle Selection cycle (faster, less expensive) Training cycle (slower, expensive) Future Need 3: Faster breeding methods Genomic Selection scheme
  • 31.
    Acknowledgements The Buckler Lab Collaborators •C. Tom Hash (ICRISAT-Niamey) • Boubacar Kountche (ICRISAT-Niamey) • Som Punnuri (Fort Valley State University) • Hari Upadhyaya (ICRISAT-Patancheru) • Rajeev Varshney (ICRISAT-Patancheru) • Xin Liu (BGI) • Xuecai Zhang (CIMMYT-Mexico) • The Institute for Genomic Diversity (Cornell) • The Maize Diversity Project • The Pearl Millet Genome Sequencing Consortium Funding • National Science Foundation (NSF) • Plant Genome Research Program • Basic Research to Enable Agricultural Development (BREAD) • The International Crops Research Institute for the Semi-Arid Tropics (ICRISAT) • The International Maize and Wheat Improvement Center (CIMMYT) • The United States Agency for International Development (USAID) • The United States Department of Agriculture Agricultural Research Service (USDA-ARS)