Report card for Correlation Mapping mapping on Mouse whole brain (cortex) benchmark

Overview

The accuracy of cell type mapping using the Hierarchical approximate nearest neighbor (HANN) algorithm was evaluated against the mouse whole brain (WB) cortical taxonomy.

In summary, Correlation Mapping mapping was able to achieve strong accuracy at class, neighborhood and subclass resolution of the mouse WB cortical taxonomy containing sequencing technology batch effects.

  • Summary:
    • Inputs X are log(CPM) normalized expression values of marker genes.
    • Hierarchy was encoded by Class, Neighborhood, Subclass, Cluster.
    • Confidence values were derived via bootstraping.
  • Runtime: 0.77 Hours
  • Version: X.Y.Z
  • Repository: TBD
  • Publication: –

Tasks

  • Primary tasks:
    1. Classification of scRNA-seq samples into whole brain clusters.
    2. Determining generalization of Correlation Mapping mapping classification to samples from multiple sequencing technologies.
  • Users: AIBS scientists and community mapping tool users.
  • Out of scope: Classification on other modalities (e.g. SMART-seq, Patch-seq, MERFISH), or regions (e.g. V1), or species (e.g. primate)

Metrics

  • Accuracy
  • Precision, Recall, F1-score on validation set

Reference and query evaluation data

  • Reference
    • Mouse whole brain taxonomy single nucleus 10xV3 dataset from aged healthy individuals.
    • Cluster and sequencing technology metadata provided for each reference sample.
  • Query
    • Mouse whole brain taxonomy data from multiple sequencing technologies.
      • SmartSeq_cells_AIBS
      • SmartSeq_nuclei_AIBS
      • 10X_cells_v2_AIBS
      • 10X_nuclei_v2_AIBS
      • 10X_cells_v3_AIBS
      • 10X_nuclei_v3_AIBS
      • 10X_nuclei_v3_Broad

Quantitative analysis

Here we evaluate Correlation Mapping mapping at predicting high quality samples for each of the query datasets. Each annotation level can be expanded to reveal addition evaluation metrics.

Annotaion F1-score
Class 0.999
Neighborhood 0.938
Subclass 0.937
Cluster 0.655
Class level metrics: 1. Label-wise F1-score
2. Confidence values for correctly and incorrectly assigned labels
3. Label-wise recall
4. Label-wise precision
5. Confusion matrix (row-normalized)
Neighborhood level metrics: 1. Label-wise F1-score
2. Confidence values for correctly and incorrectly assigned labels
3. Label-wise recall
4. Label-wise precision
5. Confusion matrix (row-normalized)
Subclass level metrics: 1. Label-wise F1-score
2. Confidence values for correctly and incorrectly assigned labels
3. Label-wise recall
4. Label-wise precision
5. Confusion matrix (row-normalized)
Cluster metrics: 1. Label-wise F1-score
2. Confidence values for correctly and incorrectly assigned labels
3. Label-wise recall
4. Label-wise precision
5. Confusion matrix (row-normalized)

Sequencing technology effect analysis

Here we evaluate Correlation Mapping mapping at correctly predicting the Subclass label for multiple sequencing technologies.

Query Annotation F1-score   Annotation F1-score
10X_cells_v3_AIBS Subclass 0.979   Cluster 0.869
10X_nuclei_v3_AIBS Subclass 0.886   Cluster 0.665
10X_nuclei_v3_Broad Subclass 0.975   Cluster 0.849
10X_cells_v2_AIBS Subclass 0.975   Cluster 0.837
10X_nuclei_v2_AIBS Subclass 0.848   Cluster 0.256
SmartSeq_cells_AIBS Subclass 0.944   Cluster 0.752
SmartSeq_nuclei_AIBS Subclass 0.969   Cluster 0.811

Recommendations and caveats

  • At the Class, Neighborhood, and Subclass level, for high quality RNA-seq data - Correlation Mapping mapping makes more errors than HANN but still has strong performance.
  • Correlation Mapping mapping robustly classify samples from multiple sequencing techologies which lead to changes in gene expression. Correlation Mapping mapping fails to correct label samples from the 10X_nuclei_V2 protocol.