MSc student til Next Generation Sequencing Single Cell software udvikling at QIAGEN

Posted 1 week ago

Single Cell Analysis in CLC Genomics Workbench

Today, single cell analysis requires skilled bioinformaticians. At the beginning of 2021, QIAGEN released CLC Single Cell Analysis Module, lowering the barrier of entry to the field.

  • https://digitalinsights.qiagen.com/plugins/clc-single-cell-analysis-module/
  • https://resources.qiagenbioinformatics.com/manuals/clcsinglecellanalysis/current/index.php?manual=Introduction.html

The module includes user-friendly tools for QC, normalization, batch correction, dimensionality reduction, clustering, differential expression, and cell type prediction. Visualizations facilitate the analysis, empower the biologist and speed up the computationally heavy, as well as labour-intensive, steps compared to open-source solutions, which require scripting skills in R and / or Python.

Two types of projects are available:

  1. Adding new functionality
  2. Benchmarking existing functionality

Adding New Functionality

A project under this category involves creating a Java re-implementation of a permissively licensed single-cell algorithm that is currently not part of the existing module.

The re-implementation should be compared to the original algorithm, to ensure consistency, and improvements can be suggested and implemented.

The project offers the opportunity to become deeply familiar with a recently published algorithm at the leading edge of bioinformatics. A successful re-implementation would be considered for inclusion in the CLC Single Cell Analysis Module for use across academia and industry.

Possible algorithms include (but are not limited to):

  • Trajectory analysis https://github.com/theislab/cellrank
  • Data integration https://github.com/brianhie/scanorama
  • Cell cycle inference https://github.com/rfechtner/pypairs

Suitable background: computer science, bioinformatics, and related educations

Start date: anytime

Benchmarking Existing Functionality

To ensure the quality of the CLC Single Cell Analysis Module, the implemented tools have been compared against community standard open-source pipelines. This has been done both in a quantitative manner, using open-source benchmarks, but also in a more qualitative way, where published results on publicly available datasets have been reproduced using the module. More benchmarks are needed, and two projects are envisaged, as detailed below.

Qualitative Benchmark

In this project, you will reproduce the results of a paper of your choice using the CLC Single Cell Analysis Module.

The project allows you to gain experience of real-world single cell data analysis. You will gain a deep understanding of single cell bioinformatics pipelines. You will be in constant dialog with the developers, and your experiences will be used to improve future releases of the software.

Suitable background: biology, bioinformatics, and related educations

Start date: anytime

Cell Type Prediction Benchmark

One of the key features of the CLC Single Cell Analysis Module is the capability to predict the cell type:

  • https://resources.qiagenbioinformatics.com/manuals/clcsinglecellanalysis/current/index.php?manual=Predict_Cell_Types.html

During development, the chosen algorithm has been benchmarked against top performers following this 2019 article:

  • https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1795-z
  • https://resources.qiagenbioinformatics.com/manuals/clcsinglecellanalysis/current/index.php?manual=SVMs_cell_type_classification.html

The benchmark consisted of first training a classifier using a provided trained data set, and then predicting the cell types on a different test data set.

To enable users to easily predict cell types on their data without the need to first train a classifier, the module is released together with two pre-trained classifiers for human and mouse:

  • https://resources.qiagenbioinformatics.com/manuals/clcsinglecellanalysis/current/index.php?manual=_Reference_Data_Manager.html#sec:biomedicalreferencedatamanager

These classifiers will be continuously improved by adding new training data sets and new possible cell types. During this process, the performance of these classifiers needs to be evaluated to ensure it does not degrade.

This project consists of two steps:

  • Establishing a benchmark for the existing classifiers. Note that compared to the 2019 benchmark, this should not require training the classifier.
  • Investigate how best to re-train a classifier with new data sets. Three strategies for re-training are already available, and you may invent more:
    • https://resources.qiagenbioinformatics.com/manuals/clcsinglecellanalysis/current/index.php?manual=Train_Cell_Type_Classifier.html

The project allows you to gain experience of real-world single cell data analysis. You will gain a deep understanding of single cell bioinformatics pipelines. You will be in constant dialog with the developers, and your experiences will be used to improve future releases of the software.

The project offers the opportunity to become familiar with the problem of cell type prediction, a crucial part of many single cell data analyses. You will be in constant dialog with the developers, and your experiences will be used to improve future releases of the software.

Suitable background: biology, bioinformatics, and related educations

Start date: as early as summer 2021

Attention: Often you need a pre-approval from your university or study counselor, to ensure that projects or thesis found on CBS CareerGate will be accepted as part of your education. Please contact the right entity in due time to ensure that you're picking the right project.