pipeline_cell_qc.py

Overview

This pipeline performs the following steps:

  • Calculates per-cell QC metrics: ngenes, total_UMI, pct_mitochondrial, pct_ribosomal, pct_immunoglobin, pct_hemoglobin, and any specified geneset percentage

  • Runs scrublet to calculate per-cell doublet score

Configuration

The pipeline requires a configured pipeline.yml file. Default configuration files can be generated by executing:

python <srcdir>/pipeline_cell_qc.py config

Input files

A tsv file called ‘libraries.tsv’ is required. This file must have column names as explained below. Must not include row names. Add as many rows as input channels/librarys for analysis. This file must have the following columns: * library_id - name used throughout. This could be the channel_pool id eg. A1 * path - path to the filtered_matrix folder from cellranger count

Dependencies

This pipeline requires: * cgat-core: https://github.com/cgat-developers/cgat-core * R dependencies required in the r scripts

Pipeline output

The pipeline returns: * qcmetrics.dir folder with per-input qcmetrics.tsv.gz table * scrublet.dir folder with per-input scrublet.tsv.gz table

Code

cellhub.pipeline_cell_qc.qcmetrics(infile, outfile)

This task will run R/calculate_qc_metrics.R, It uses the input_libraries.tsv to read the path to the cellranger directory for each input Ouput: creates a cell.qc.dir folder and a library_qcmetrics.tsv.gz table per library/channel For additional input files check the calculate_qc_metrics pipeline.yml sections: - Calculate the percentage of UMIs for genesets provided - Label barcodes as True/False based on whether they are part or not of a set of lists of barcodes provided

cellhub.pipeline_cell_qc.qcmetricsAPI(infiles, outfile)

Add the QC metrics results to the API

cellhub.pipeline_cell_qc.scrublet(infile, outfile)

This task will run python/run_scrublet.py, It uses the input_libraries.tsv to read the path to the cellranger directory for each input Ouput: creates a scrublet.dir folder and a library_scrublet.tsv.gz table per library/channel It also creates a doublet score histogram and a double score umap for each library/channel Check the scrublet section in the pipeline.yml to specify other parameters

cellhub.pipeline_cell_qc.scrubletAPI(infiles, outfile)

Add the scrublet results to the API

cellhub.pipeline_cell_qc.plot(infile, outfile)

Draw the pipeline flowchart

cellhub.pipeline_cell_qc.full()

Run the full pipeline.