A cost-effective approach to diploid genome assembly using
sequence graphs
Shilpa Garg
Postdoc
Church lab
Harvard Medical School
1
Diploid genome assembly
A C G T C T
C C G T A T C G T
A C G T A T
T A C G T A T
C C G T A T G T
Input: Reads
Diploid assemblies
Applications:
● Personalized medicine
● Human migration, evolutionary selection and population structure
2
Different approaches
Reference-based haplotyping
Reference genome
Aligned reads
Haplotypes
Drawbacks
● Reference bias
● Does not work for structural variants (SVs)
Tools: WhatsHap, HapCut, HapCol
3
De novo assembly
Different approaches
Reference-based haplotyping
Reference genome
Aligned reads
Haplotypes
Drawbacks
● Reference bias
● Does not work for structural variants (SVs)
r1
r2
r3
r4
r5
r1 r2 r3
r4r5
Assembly
graph
Reads
Tools: WhatsHap, HapCut, HapCol
4
5
De novo assembly
Different approaches
Reference-based haplotyping
Reference genome
Aligned reads
Haplotypes
Drawbacks
● Reference bias
● Does not work for structural variants (SVs)
r1
r2
r3
r4
r5
r1 r2 r3
r4r5
Assembly
graph
assembly
Reads
Tools: WhatsHap, HapCut, HapCol
6
De novo assembly
Different approaches
Reference-based haplotyping
Reference genome
Aligned reads
Haplotypes
Drawbacks
● Reference bias
● Does not work for structural variants (SVs)
Drawbacks
● Assumption: haploid assembly
● Require high coverage data
r1
r2
r3
r4
r5
r1 r2 r3
r4r5
Assembly
graph
assembly
Reads
Tools: WhatsHap, HapCut, HapCol
Tools:
Genome assembly: Canu, SPAdes…
Diploid genome assembly: Falcon Unzip,
TrioCanu
7
Why graph-based approach?
SNV SV SV SNV
8
Phasing bubble chains
SNV SV SV SNV
9
Phasing bubble chains
SNV SV SV SNV
10
Our diploid assembly pipeline
11
Single individual
Our diploid assembly pipeline
12
Single individual Trios
Advantages
- Generalized to be applied to genomes of any complexity
- Different het rate and repeat content
- Phase complex SVs
13
Results
We require as low as 15x coverage of long
reads from each individual
in a trio
14
Conclusion
- Apply pipeline on GIAB trio samples from PGP
- Fill gaps in PGP-1 genome and other variety of samples
- Apply haplotype-aware assembly to diverse human genomes, including
clinically relevant regions, to understand complex biology
16
- Developed pipeline for diploid assembly for single individuals and trios
- Demonstrated on genomes with various het rate
Next steps:
Open for collaborations and recruiting people at Church lab
Acknowledgements
George Church
Tobias Marschall
Richard Durbin
...
17
18
Thanks

New methods diploid assembly with graphs

  • 1.
    A cost-effective approachto diploid genome assembly using sequence graphs Shilpa Garg Postdoc Church lab Harvard Medical School 1
  • 2.
    Diploid genome assembly AC G T C T C C G T A T C G T A C G T A T T A C G T A T C C G T A T G T Input: Reads Diploid assemblies Applications: ● Personalized medicine ● Human migration, evolutionary selection and population structure 2
  • 3.
    Different approaches Reference-based haplotyping Referencegenome Aligned reads Haplotypes Drawbacks ● Reference bias ● Does not work for structural variants (SVs) Tools: WhatsHap, HapCut, HapCol 3
  • 4.
    De novo assembly Differentapproaches Reference-based haplotyping Reference genome Aligned reads Haplotypes Drawbacks ● Reference bias ● Does not work for structural variants (SVs) r1 r2 r3 r4 r5 r1 r2 r3 r4r5 Assembly graph Reads Tools: WhatsHap, HapCut, HapCol 4
  • 5.
  • 6.
    De novo assembly Differentapproaches Reference-based haplotyping Reference genome Aligned reads Haplotypes Drawbacks ● Reference bias ● Does not work for structural variants (SVs) r1 r2 r3 r4 r5 r1 r2 r3 r4r5 Assembly graph assembly Reads Tools: WhatsHap, HapCut, HapCol 6
  • 7.
    De novo assembly Differentapproaches Reference-based haplotyping Reference genome Aligned reads Haplotypes Drawbacks ● Reference bias ● Does not work for structural variants (SVs) Drawbacks ● Assumption: haploid assembly ● Require high coverage data r1 r2 r3 r4 r5 r1 r2 r3 r4r5 Assembly graph assembly Reads Tools: WhatsHap, HapCut, HapCol Tools: Genome assembly: Canu, SPAdes… Diploid genome assembly: Falcon Unzip, TrioCanu 7
  • 8.
  • 9.
  • 10.
  • 11.
    Our diploid assemblypipeline 11 Single individual
  • 12.
    Our diploid assemblypipeline 12 Single individual Trios
  • 13.
    Advantages - Generalized tobe applied to genomes of any complexity - Different het rate and repeat content - Phase complex SVs 13
  • 14.
    Results We require aslow as 15x coverage of long reads from each individual in a trio 14
  • 15.
    Conclusion - Apply pipelineon GIAB trio samples from PGP - Fill gaps in PGP-1 genome and other variety of samples - Apply haplotype-aware assembly to diverse human genomes, including clinically relevant regions, to understand complex biology 16 - Developed pipeline for diploid assembly for single individuals and trios - Demonstrated on genomes with various het rate Next steps: Open for collaborations and recruiting people at Church lab
  • 16.
  • 17.