Search A-Z index Help
University of Cambridge Home Physics Dept Home BSS Home

Biophysics and Bioinformatics of Nucleic Acids

Dr Julian Huppert

Research Councils UK Academic Fellow
in Computational Biology
Fellow, Clare College

Telephone: 01223 337256
E-mail: jlh29 [at] cam.ac.uk
Note: As MP for Cambridge, I am on long-term leave from my post in the University.

Nucleic acids are capable of forming a wide variety of different structures, far removed from the Watson-Crick double helix, itself a Cavendish discovery. Many of the alternative structures that can be formed have physiological functions, such as controlling gene expression via gene transcription or translation. Our goal is to use biophysical methods to identify these structures, and enable prediction of their structure and stability, and then to combine this work with bioinformatic analyses in order to identify and understand their functions. Ultimately this work may lead to novel insights into natural regulatory processes, as well as new targets for pharmaceutical development.

G-quadruplex nucleic acids

Background

G-quadruplex schematics

Schematic outline of a G-quadruplex, showing tetrads (left) stacking to form an intramolecular structure (right)

G-rich regions of DNA or RNA of an appropriate sequence can fold up into four-stranded G-quadruplex structures, held together by hydrogen-bonded squares of guanine, as shown schematically in the diagram. These structures have been shown to form in telomeres, and are an active target for anti-telomerase therapies. We have worked on DNA and RNA sequences in the rest of the genome that can form G-quadruplexes. We initially developed an algorithm to predict which sequences could form G-quadruplexes, based on a series of biophysical experiments. We then showed that there were around 375,000 putative quadruplex sequences in the human genome, although many are likely to be due chance and be relatively unstable.

Functions

quadruplex structures

Detailed structures of G-quadruplexes. Top - parallel form, taken from PDB 1KF1. Bottom, antiparallel form, taken from PDB 143D. Guanines shown in blue, the backbones in gold

We have shown that G-quadruplexes are highly enriched in gene promoters, with almost half of all genes containing quadruplexes in this region, which may control their transcription, as has been shown by us and others for some quadruplexes in oncogenes. We also examined G-quadruplexes in the 5' untranslated region, and showed that there are many quadruplex motifs here which could regulate translation, which we showed experimentally for the oncogene NRAS. These are now targets for drug development. We are currently looking at other functional roles that can be performed by G-quadruplexes, at the DNA or RNA level, as well as using data from evolutionary and human variation studies to understand the importance of these motifs. We are also working to develop more sophisticated algorithms for predicting G-quadruplexes, with the intention of being able to predict both structure and stability directly from sequence data, so that this information can be used to inform other experimental and computational studies. On behalf of the G-quadruplex community, we run a website, quadruplex.org, which among other items hosts the programme we use to identify G-quadruplexes, quadparser and some of the data we have generated in genome-wide searches, in a database called Quadbase.

functional model

Model for how G-quadruplexes could control transcription

RNA/DNA hybrids

Normally after transcription of a DNA duplex, the new RNA strand remains single-stranded (and probably structured), whereas the two DNA strands recombine to form a duplex. However, it has been found that for sequences where the coding strand is G-rich, a DNA/RNA duplex can form, with the G-rich DNA strand looping out. These G-loops can be very large (>1 kbase), and easily visible using electron microscopy, and are believed to be physiologically important. The stability of this unusual structure relies on the stability of the rGG/dCC base stacking interaction, but the studies to date have involved only very short oligonucleotides - a maximum of around 12 bases. There appears to be an association between regions that can form G-loops and those where the single-stranded DNA can form G-quadruplex structures. We are investigating this hypothesis by developing simulations of possible sequences using known thermodynamic parameters, and comparing the predictions to genomic sequences, in vivo experiments, and model FRET systems.

microRNA

microRNAs are small (21-23 nucleotide) pieces of RNA that play a very significant regulatory role in vivo, and are currently the target of considerable interest. They may play a number of roles, either temporarily repressing translation of a target mRNA sequence, importing it into vesicles for later release, or leading to degradation of the target. Detailed bioinformatic techniques have been developed for the prediction of the miRNAs and their targets, but these are still in an early stage. There are many hundred predicted miRNAs in the human genome, of which only a small proportion have been confirmed or studied in any detail. To date, work has solely focused on predicting the sequences of the miRNA and target mRNA, and nothing has been done towards predicting what effect the miRNA will have once it reaches the target. Our work focuses on understanding how mRNA and miRNA sequence and structure control both specificity and function.