© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 1© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 1
Analytic Validation and
Performance Monitoring of
Clinical NGS Assays
Germline
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 2© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 2
A systematic comparison of traditional and multi-gene
panel testing for hereditary breast and ovarian cancer in
more than 1000 patients
Stephen E. Lincoln1, Yuya Kobayashi1, Michael J. Anderson1, Shan Yang1,
Andrea J. Desmond2, Meredith A. Mills3, Geoffrey B. Nilsen1, Kevin B. Jacobs1,
Federico A. Monzon1, Allison W. Kurian3, James M. Ford3, Leif W. Ellisen2,4
1. Invitae, San Francisco, CA
2. Massachusetts General Hospital Cancer Center, Boston, MA
3. Stanford University School of Medicine, Stanford, CA
4. Harvard Medical School, Boston, MA
Lincoln et al., J Mol Diag 2015
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 3© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 3
Companion Clinical Actionability Research Study
Desmond et al., JAMA Oncol. 2015
Swisher, JAMA Oncol. 2015
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 4© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 4
 NA12878 and 6 other Well-Characterized Genomes (WCGs) were
used in a 1105 sample study to evaluate a 29-gene hereditary
cancer panel test
 The 7 WCGs contributed 310 of 750 comparable variants to both
the sensitivity and specificity analyses
 But… the 77% coverage of GIAB data was a substantial limitation
– No exonic variants in 5 of 29 panel genes in any of 7 samples
• Only 1 coding variant each in 2 other genes
• Reason: (a) missing 23% of GIAB and (b) population genetics
– Almost all GIAB variants are SNVs
• Only 6 of 310 were very small deletions (max 4bp)
• 0 insertions, 0 other variant types
• No GIAB CNV data yet (but we’d expect 0 CNVs in these 29 genes)
– The 77% is biased to the “easy” subset of the genome
WCGs Contribution to JMD Study
Lincoln et al., J Mol Diag 2015; Lincoln GIAB Spring 2015
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 5© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 5
A Significant Fraction of Pathogenic Variants in Clinical
Cases are Technically Challenging
Pathogenic and likely pathogenic variants (n=260) among the
clinical cases (n=1062) by variant type:
Lincoln et al., J Mol Diag 2015
Small Indel
i.e. CNVs
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 6© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 6
BRCA2 c.9203del126
Split-read
signal at 3’
end of
deletion
Split-read
signal at 5’
end of
deletion
Exon target
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 7© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 7
BRCA2 c.156_insAlu
Split-read
signal of
Alu sequence
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 8© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 8
 Get IGV
MSH2 c.943+3T>C
Homopolymer-A
Alignment and
Biochemical
Artifacts
CDKN2A c.9_32dup24
Insertion of repeat in
correctly mapped
NGS reads
Split-read signal
Repeat Copy 1 Repeat Copy 2
Split-read signal
Translation
5’ Met
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 10© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 10
Idealized Workflow
David Litwack, FDA, Feb 2015
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 11© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 11
Idealized Workflow
DNA Prep
Targeting &
Library
Sequencing Bioinformatics
Interpretation
and Reporting
GIAB Sample(s)
FASTQ VCFLibraryDNASpecimen Report
GIAB Data
Comparison
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 12© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 12
Idealized Workflow
Benefits:
1. Easy (in principle, software tools still need
improvement)
2. GIAB gives much broader coverage
compared to traditional reference samples
3. Virtually unlimited sample supply
DNA Prep
Targeting &
Library
Sequencing Bioinformatics
Interpretation
and Reporting
GIAB Sample(s)
FASTQ VCFLibraryDNASpecimen Report
GIAB Data
Comparison
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 13© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 13
Idealized Workflow
DNA Prep
Targeting &
Library
Sequencing Bioinformatics
Interpretation
and Reporting
GIAB Sample(s)
FASTQ VCFLibraryDNASpecimen Report
GIAB Data
Comparison
Challenges:
1. GIAB data are NOT representative of clinical
practice
2. The VCF is NOT the report, and there are
substantial differences even in genotypes.
3. Does not evaluate specimen collection, storage,
transfer, or DNA prep
By far our
greatest source
of problems
In fact, pre-
prepared gDNA
is not in spec for
our Dx test
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 14© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 14
Idealized Workflow
DNA Prep
Targeting &
Library
Sequencing Bioinformatics
Interpretation
and Reporting
FASTQ VCFLibraryDNASpecimen Report
A complex, multi-step process which
can and does change analytic results
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 15© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 15
Actual Interpretation/Reporting Workflow:
VCF Genotypes Are NOT the Reported Analytical Data
More QC and
Filtering
Convert into
Transcript
Variants
Lab Director
Review of
Raw Data
Lab Director
Interpretation
Orthogonal
Confirmation
Re-”spelling”
QC
Confirmation
Failures
Removed
Common
Polymorphisms and
Wild-type Calls
Removed
Benigns
Removed
Known
Artifacts
Removed
 Many steps after VCF and before Dx report can and do add, remove, edit, or
change genotypes before reporting
 This happens in transcript variants (in HGVS) which is not convertible back to
VCF
 Benign variants with no clinical significance do NOT get the same quality control
as reported variants
 Many of these steps involve human medical experts, not algorithms
VCF Report
Gap Filling
Variants may
be Added
Not a 1-1
process
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 16© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 16
Challenges in NGS Validation with GIAB
1. 77% completeness of GIAB vs. hg19/build37 is a very
substantial limitation
– The other 23% includes all or part of many commonly
tested genes
– The 77% is based toward easy stuff
– Reminder: this 23/77 issue is different than the “dark
matter” issue (regions not in the reference genome)
2. The very limited number of more challenging variants in
coding regions of commonly tested genes is a even
more substantial limitation
– Indels, complex sequence changes, CNVs
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 17© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 17
Other Challenges in NGS Validation with GIAB
3. The actual references used in (most) reporting are
transcripts, not the genome sequence
– They can be different in significant ways
– We use our own curated set of transcript sequences
and alignments (Hart et al., Bioinformatics 2014)
4. Sample type (pre-prepared gDNA) is not in spec for our
validated assay
© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 18© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 18
Philosophical Digression
What is the purpose of analytic validation of a NGS
germline assay?
1. Essentially impossible to capture real world variability
and challenges
2. In practice, online QC/QA plays a much more important
role than validation
Aug2015 steve lincoln analytical validation

Aug2015 steve lincoln analytical validation

  • 1.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 1© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 1 Analytic Validation and Performance Monitoring of Clinical NGS Assays Germline
  • 2.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 2© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 2 A systematic comparison of traditional and multi-gene panel testing for hereditary breast and ovarian cancer in more than 1000 patients Stephen E. Lincoln1, Yuya Kobayashi1, Michael J. Anderson1, Shan Yang1, Andrea J. Desmond2, Meredith A. Mills3, Geoffrey B. Nilsen1, Kevin B. Jacobs1, Federico A. Monzon1, Allison W. Kurian3, James M. Ford3, Leif W. Ellisen2,4 1. Invitae, San Francisco, CA 2. Massachusetts General Hospital Cancer Center, Boston, MA 3. Stanford University School of Medicine, Stanford, CA 4. Harvard Medical School, Boston, MA Lincoln et al., J Mol Diag 2015
  • 3.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 3© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 3 Companion Clinical Actionability Research Study Desmond et al., JAMA Oncol. 2015 Swisher, JAMA Oncol. 2015
  • 4.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 4© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 4  NA12878 and 6 other Well-Characterized Genomes (WCGs) were used in a 1105 sample study to evaluate a 29-gene hereditary cancer panel test  The 7 WCGs contributed 310 of 750 comparable variants to both the sensitivity and specificity analyses  But… the 77% coverage of GIAB data was a substantial limitation – No exonic variants in 5 of 29 panel genes in any of 7 samples • Only 1 coding variant each in 2 other genes • Reason: (a) missing 23% of GIAB and (b) population genetics – Almost all GIAB variants are SNVs • Only 6 of 310 were very small deletions (max 4bp) • 0 insertions, 0 other variant types • No GIAB CNV data yet (but we’d expect 0 CNVs in these 29 genes) – The 77% is biased to the “easy” subset of the genome WCGs Contribution to JMD Study Lincoln et al., J Mol Diag 2015; Lincoln GIAB Spring 2015
  • 5.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 5© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 5 A Significant Fraction of Pathogenic Variants in Clinical Cases are Technically Challenging Pathogenic and likely pathogenic variants (n=260) among the clinical cases (n=1062) by variant type: Lincoln et al., J Mol Diag 2015 Small Indel i.e. CNVs
  • 6.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 6© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 6 BRCA2 c.9203del126 Split-read signal at 3’ end of deletion Split-read signal at 5’ end of deletion Exon target
  • 7.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 7© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 7 BRCA2 c.156_insAlu Split-read signal of Alu sequence
  • 8.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 8© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 8  Get IGV MSH2 c.943+3T>C Homopolymer-A Alignment and Biochemical Artifacts
  • 9.
    CDKN2A c.9_32dup24 Insertion ofrepeat in correctly mapped NGS reads Split-read signal Repeat Copy 1 Repeat Copy 2 Split-read signal Translation 5’ Met
  • 10.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 10© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 10 Idealized Workflow David Litwack, FDA, Feb 2015
  • 11.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 11© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 11 Idealized Workflow DNA Prep Targeting & Library Sequencing Bioinformatics Interpretation and Reporting GIAB Sample(s) FASTQ VCFLibraryDNASpecimen Report GIAB Data Comparison
  • 12.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 12© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 12 Idealized Workflow Benefits: 1. Easy (in principle, software tools still need improvement) 2. GIAB gives much broader coverage compared to traditional reference samples 3. Virtually unlimited sample supply DNA Prep Targeting & Library Sequencing Bioinformatics Interpretation and Reporting GIAB Sample(s) FASTQ VCFLibraryDNASpecimen Report GIAB Data Comparison
  • 13.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 13© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 13 Idealized Workflow DNA Prep Targeting & Library Sequencing Bioinformatics Interpretation and Reporting GIAB Sample(s) FASTQ VCFLibraryDNASpecimen Report GIAB Data Comparison Challenges: 1. GIAB data are NOT representative of clinical practice 2. The VCF is NOT the report, and there are substantial differences even in genotypes. 3. Does not evaluate specimen collection, storage, transfer, or DNA prep By far our greatest source of problems In fact, pre- prepared gDNA is not in spec for our Dx test
  • 14.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 14© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 14 Idealized Workflow DNA Prep Targeting & Library Sequencing Bioinformatics Interpretation and Reporting FASTQ VCFLibraryDNASpecimen Report A complex, multi-step process which can and does change analytic results
  • 15.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 15© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 15 Actual Interpretation/Reporting Workflow: VCF Genotypes Are NOT the Reported Analytical Data More QC and Filtering Convert into Transcript Variants Lab Director Review of Raw Data Lab Director Interpretation Orthogonal Confirmation Re-”spelling” QC Confirmation Failures Removed Common Polymorphisms and Wild-type Calls Removed Benigns Removed Known Artifacts Removed  Many steps after VCF and before Dx report can and do add, remove, edit, or change genotypes before reporting  This happens in transcript variants (in HGVS) which is not convertible back to VCF  Benign variants with no clinical significance do NOT get the same quality control as reported variants  Many of these steps involve human medical experts, not algorithms VCF Report Gap Filling Variants may be Added Not a 1-1 process
  • 16.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 16© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 16 Challenges in NGS Validation with GIAB 1. 77% completeness of GIAB vs. hg19/build37 is a very substantial limitation – The other 23% includes all or part of many commonly tested genes – The 77% is based toward easy stuff – Reminder: this 23/77 issue is different than the “dark matter” issue (regions not in the reference genome) 2. The very limited number of more challenging variants in coding regions of commonly tested genes is a even more substantial limitation – Indels, complex sequence changes, CNVs
  • 17.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 17© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 17 Other Challenges in NGS Validation with GIAB 3. The actual references used in (most) reporting are transcripts, not the genome sequence – They can be different in significant ways – We use our own curated set of transcript sequences and alignments (Hart et al., Bioinformatics 2014) 4. Sample type (pre-prepared gDNA) is not in spec for our validated assay
  • 18.
    © 2015 InvitaeCorporation. All Rights Reserved. | CONFIDENTIAL | 18© 2015 Invitae Corporation. All Rights Reserved. | CONFIDENTIAL | 18 Philosophical Digression What is the purpose of analytic validation of a NGS germline assay? 1. Essentially impossible to capture real world variability and challenges 2. In practice, online QC/QA plays a much more important role than validation