Multi-level analysis of molecular mechanisms underlying cancer development

At the genome level:

Modelling the effect of condensin molecules on chromosome condensation

So far we know little about the structural details of mitotic chromosomes, and it remains unclear how an extended chromatin chain condenses itself into a short compact chromosome during mitosis. This is because modern microscope techniques are not able to reveal enough details of a chromosome structure (see Figure A, where limited structural details of a chromosome can be observed). To gain more insights into the structual and dynamical aspects of chromosome condensation, I constructed a molecular dynamic simualtion model to study the movement of a chromatin chain in the nucleus following Brownian dynamics trajectories (Figure B).

Using chromosome 5 in the budding yeast (Saccharomyces cerevisiae) as a target system, my research shows that pair-wise stochastic interaction, rather than a higher-order of clustering between condensing molecules, is more advantageous for representing in vivo chromosome conformations and maintaining individualization of chromosomes during condensation (Figure C). Furthermore, the dynamic web-like structures, favoured by the pair-wise stochastic interactions between condensins, helps to produce locally compact structures resembling TAD (topologically associated domain) (eLife (2015) 4:e05565)

At the cell level:

A structural systems biology approach for quantifying the phenotypic impact of missense mutations

Annotating the phenotypic outcomes of individual missense mutations is an important but challenging task for understanding the molecular mechanism of multigenic diseases like cancers. Information at both the protein and pathway levels can be used to increase the accuracy of interpreting the phenotypic effects of a target mutation; nevertheless few methodologies are available for this purpose. To tacle this issue, I have explored and validated an multi-level approach, PEPP (Phenotype Extrapolation via Pathway and Protein information), to effectively integrate the structural factors of proteins into pathway dynamics.

So far PEPP has been valiadated using the G2 to mitosis transition mechanism in the cell cycle of fission yeast (Schizosaccharomyces pombe) as a model. In this specific case, PEPP is able to quantify the systemic impact of individual missense mutations that are correlated to the size of in vivo yeast cells at mitosis. This work was awarded a RCSB PDB Poster Prize at ISMB in 2011, and was also presented as one of the Late Breaking Researches in the ISMB Conference 2012. More details of the PEPP methodology are available here.

At the pathway or network level:

A multivariate statistical model to identify analyte clusters that serve as useful biomarkers

Most conventional methods for biomarker studies focus on individual analytes expressed differently between patients and normal controls. An obvious disadvantage of this type of approach is its weakness in capturing collective relationships between multiple analytes.

To tackle this problem, I introduced a new statistical model, targeted analyte cluster (TAC, PubMed link), which considers the patterned behaviours of a small set of analytes. TAC can be used to analyse gene or protein expression data and hence can be applied in a broad range of research works. So far TAC has facilitated studies in bipolar disorder and schizophrenia by bringing new insights into their disease mechanisms (Neuropsychopharmacology 37: 364-377; Mol Psychiatry 16:848-859).


nsSNPs are single nucleotide variations in human genomes that cause amino acid changes in proteins. They can affect an individual's susceptibility to cancers and response to drugs, thus a good estimation of their tendency to cause cancers is an essential step towards personalised medicine.

For estimating the likelihood of a nsSNP to be associated with cancers, Bongo (PubMed link) applies graph theory to project the disease susceptibility of a specific nsSNP by evaluating its impact on the amino acid network within the host protein. nsSNPs that destablise significantly the stability of the internal networks of their host proteins are considered as potential mutations contributing to cancer development. (A new webserver of Bongo is currently under construction)


At the protein level:

Analysing structural effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on proteins

Estimating structural effects of nsSNPs on protein complexes through protein-protein docking

In addition to the structural impact on individual proteins, nsSNPs can also affect interactions between proteins. In order to identify nsSNPs that are likely to affect protein interactions, I worked with Dr Juan Fernandez-Recio on a rigid-body protein-protein docking program pyDock (PubMed link). pyDock models the structure of protein complexes and thus helps to identify nsSNPs at the protein interfaces. It explores either FTDOCK or ZDOCK to generate the complex conformations and has an optimised scoring function for selecting the best solutions. The performance of pyDock is similar, if not superior, to contemporary rigid-body approaches.

Due to the fact that approximately two thirds of protein in prokaryotes and four fifths of proteins in eukaryotes are multi-domain proteins, I also developed pyDockTET (PubMed link), which is a distance-restrained docking method for predicting structural assembly of two-domain proteins. pyDockTET is not only helpful for identifying nsSNPs located at domain-domain interfaces but also serves as one of few methods available for predicting structural assembly of two-domain proteins.