Developing computational methods for systems biology of cancer

Achieving the goals specified by our team is possible only with using the up-to-date computational methodology: methods for network analysis and use, integrative data analysis methods, statistical and machine learning methods. In order to keep the competitive positions in the field, our group maintains a level of expertise allowing not only assessing existing methods and selecting the most suitable ones, but also adapt them to existing projects and, whenever necessary, suggest novel methodological developments. The group is able to package the developed methodology such that it can be used by the rest of the computational biology community.
The omics data coming from application of modern biotechnologies is characterized by the complexity which was not faced before. In particular, the single cell data reflects at a much fine-grained level than before the genetic and epigenetic tumoral heterogeneity, shaped by the action of cell fate decision transcriptional programs within the cell and affected by the environment. Visualizing and understanding this heterogeneity requires application of novel generation of computational methods which allows capturing strong non-linearities and non- Gaussian multidimensional distributions in the space of cell omics profiles. A number of new methods (tSNE, topological data analysis) and concepts (pseudo-time, branching cell trajectories, etc.) have emerged in the field for this reason. We are developing computational methods for advanced dimension reduction of omics data, based on construction of principal graphs, or application of omics deconvolution methods such as Independent Component Analysis (ICA).
Our group has developed a number of advanced methods for NGS data analysis helping better interpretation of the sequencing data in cancer biology. Control-FREEC is a continuation of the successful FREEC pipeline for assessing the copy number profiles, included the detection of LOH profiles from the sequencing data (Boeva et al, Bioinformatics, 2012a). Nebula web-server based on Galaxy open source network was developed for user-friendly analysis of CHiP-Seq data including using de novo discovery of sequence motifs (Boeva et al, Bioinformatics, 2012b). SV-Bay tool was developed for the analysis of paired-end data in order to detect structural variants in the genome taking into account copy number changes (Iakovishina et al, Bioinformatics, 2016). HMCan and HMCan-diff tools were developed in order to quantify the chromatin modifications in cancer taking into account the copy number changes (Ashoor et al, Bioinformatics, 2013; Ashoor et al, Nucleic Acids Res., 2017).
We are developing advanced tools for analysis of biological networks together with omics data. Several Cytoscape plugins has been developed in the past (BiNoM, OCSANA, DeDaL). We have developed Google Maps-based user-friendly interfaces for visualization of omics data on top of the large and complex biological networks, based on NaviCell technology. NaviCom web portal connects the Atlas of Cancer Signaling Network with cBioPortal, the major source of high-througput data in cancer biology.
Finally, the group has invested a lot into the methodology and applications of discrete modeling of biological networks.

Highlights