Single-molecule genomics

Conventional approaches to genomics rely on ensemble measurements - billions of (usually identical) molecules are analysed at once. This has a couple of drawbacks:

  • Any variation between the molecules in the ensemble is masked or averaged out and
  • Quantitation takes the form of an "analog" signal (for instance, the intensity of a hybridisation signal in array work, or the rise of a PCR signal in qPCR). Analog measurements are very susceptible to error and noise.

My group's methods, in contrast, are based on the analysis of single DNA molecules, giving a digital readout. Variants are not lost by averaging, and the signal is digital rather than analog - for instance, we count the number of molecules rather than measuring the strength of a signal.

Typically, a DNA sample is diluted into many minute sub-samples, so that only a few of these contain a single molecule of the DNA of interest.

By simply counting how many of the samples contain the target molecule, and noting how molecules of different targets are distributed amongst the samples, some exquisitely sensitive genomic analyses can be performed.

Click on the tabs above to learn more.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Highly-multiplexed single-molecule PCR (smPCR) is the key to most of my single-molecule methods. It makes it possible to detect single molecules of many different sequences in a single sample.

Suppose we have a sample which we want to test for the presence of three sequences (red, green, blue) - the sample actually contains only the red and blue molecules.

We first do a multiplex PCR with all primers, to amplify any of the target sequences. The products of this reaction are split into replicate sub-samples, and each sub-sample is re-amplified using a single primer pair (red, green or blue). These products are then analysed (on gel or by melting-curve analysis) - we find products in the "red" and "blue" sub-samples, but not the "green" subsample.

In practice, up to about 1000 different sequences can be tested at a time using this approach.

 

 

 

 

 

 

 

 

 

Much of the variation in the human genome comes not from sequence variation (polymorphisms, mutations) but from copy-number variations - the presence of duplicated (or missing) segments of the genome. In the cartoon the cell on the left is normal, and contains two copies of each of the red and blue sequences. The cell on the right, however, has acquired additional copies of the "red" sequence on one of its two homologous chromosomes, giving it a total of four copies; the "blue" sequence is unaffected, and is present in two copies per cell.

Inherited copy-number variations (CNVs) account for many of the differences between individuals, for example, in the way we respond to certain drugs.

Somatic CNVs (sCNVs) are copy-number differences between different cells in the same individual. sCNVs are particularly important in most forms of cancer: the cancer cells acquire extra copies of certain genes which allow them to proliferate, or lose copies of genes which normally limit cell proliferation.

CNVs (both inherited and somatic) can be analysed by methods such as qPCR or array hybridisation, but these methods often lack sensitivity. Also, because these analog methods require large amounts of good-quality DNA, they are not well suited to analysing small sub-populations of cells.

My group developed Molecular Copy-number Counting (MCC) as a simple digital PCR method to analyse CNVs. To learn more about MCC, click on the next tab, above.

 

 

 

 

 

 

 

 

 

 

 

My group developed Molecular Copy-number Counting, or MCC, as a way to detect and quantify CNVs.

Suppose we have a population of cells carrying a CNV (in this case, extra copies of the "red" sequence; the "blue" sequence is at normal copy). We prepare DNA from these cells, and dilute and dispense this into a microtitre plate. Naturally, more wells contain a "red" fragment than a "blue" fragment.

Using the multiplex smPCR method, we test the wells of this plate to see which wells contain "red" or "blue" fragments. The ratio of "red" to "blue" then gives us a measure of the copy-number of the red sequence. (We use Poisson statistics to give a better estimate of the copy number.)

In practice, we can use this method to measure the copy number of tens or hundreds of sequences at once.

MCC gives a more accurate measure than array-based or qPCR methods. It also uses far less DNA: just a few cells' worth.

 

 

 

 

 

 

 

 

 

 

 

 

The biggest challenge for measuring CNVs is biopsy material which has been fixed in formalin, embedded in paraffin wax, sectioned and stained for histological studies. The DNA is badly damaged in such samples.

To address the need to measure CNVs in such sections, Frank McCaughan in my group developed µMCC. By modifiying the basic MCC protocol, it becomes possible to get accurate measurements of copy-number changes in these samples. One of the main modifications needed is to ensure that all amplimers are of the same length; in this way, random damage to the DNA is equally likely to affect all targets, and bias (favouring shorter targets) is eliminated.

In the example on the left (taken from McCaughan et al. 2008), the first image shows a section of lung with islands of cancerous cells. Using laser-capture microdissection, these cancerous cells are excised, leaving behind the normal cells. µMCC analysis of the excised, cancerous cells reveals a region of amplification (extra copies) on the distal end of chromosome 3. (The isolated peak labelled "M" is a sequence which is normally present in multiple copies, and is included as a positive control.)

 

 

 

 

 

 

 

 

 

 

 

 

Cancers frequently show copy-number changes relative to the normal genome. In particular, they acquire extra copies of chromosomal regions harbouring "driver" genes beneficial to the cancer. By finding the parts of the genome that are consistently "amplified" in this way, those driver genes can be found and, potentially, targetted in order to slow or halt the growth of the cancer.

By analysing sequential pre-cancerous and cancerous biopsies using µMCC, Frank McCaughan in my group was able to pinpoint SOX2 as a key driver gene in the development of squamous lung cancer, making it a candidate for further research or for therapeutic and diagnostic targetting. µMCC also enabled him precisely map the boundaries of the amplified chromosomal segment in each sample, and to show that multiple lesions at different sites in the lungs of one patient arise from a single founder, rather than independently.

This work is detailed in McCaughan et al (2010) and in McCaughan et al (2011).

 

 

 

 

 

 

 

 

 

 

Colorectal cancer is an epithelial cancer, affecting about 1.2 million people per year worldwide, and causing over half a million fatalities.

By analysing microarray data, we were able to confirm previous observations that a large region on chromosome 13 is frequently amplified (that is, present in extra copies) in cells from colon cancer, and pinponted a narrower region for closer examination.

MCC analysis was uniquely able to define, in some colon cancers, a very small region of amplification containing only a single gene. This is presumed to be "driver" gene which drives the more common amplification of the larger region. Analysis of a large series of lesions has confirmed that this gene is indeed amplified in a large proportion of colon cancers and precancerous lesions, that its expression is increased in colon cancers, and that transient overexpression in cultured colon cancer cells activates oncogenic pathways.

We hope that this discovery will lead to a new therapeutic or diagnostic target. The work has been submitted for publication.

 

 

 

 

 

 

 

 

 

 

 

 

 

In collaborative work led by Dr. Angelika Daser, we have developed a single-cell karyotyping method to improve the success rate of IVF.

Many oocytes (particularly from older women) are aneuploid: they have missing or extra chromosomes. Identifying euploid oocytes (ones with the right number of chromosomes) can increase the success rate of IVF, and reduce the need to implant multiple eggs.

During the development of an oocyte, it undergoes two cell divisions; each division gives rise to a small cell - a polar body - containing the surplus chromosomes. It is in these divisions that aneuploidy tends to arise.

A variant of MCC lets us count the chromosomes in each of the two polar bodies; any surplus chromosomes in the polar bodies mean missing chromosomes in the oocyte, and vice versa. In this way, we can rapidly identify and reject aneuploid oocytes before fertlization. The work recently won GFI grant for fertility innovation.

This method is being trialled at SH-Gen Forschungsgesellschaft, and recently won the Kade Prize in Reproductive Biology and Medicine.

 

 

 

 

 

 

 

 

 

 

Single-molecule genomics also provides a simple way to make genome maps - a method called HAPPY mapping which I developed a number of years ago.

Suppose we want to know the positions of the red, green and blue sequences on a chromosome. We break the DNA randomly into a pool of fragments. Since the red and green sequences are close together, they will generally be found together on the same fragments, whereas the blue sequence occurs on independent fragments.

We dilute and dispense the fragments into aliquots (only seven are shown here), so that each tube contains only a few fragments. We then use multiplexed smPCR to score which sequences are in each aliquot.

We find that all combinations of sequences occur, but that red and green are found together more often than, say, blue and red. This "co-segregation" lets us calculate the physical distance between the red and green sequences.

In this way, we can map the positions of many thousands of sequences throughout the genome (see, for instance, this paper).

We can use a similar approach for haplotyping.

 

 

 

 

 

 

 

 

 

 

 

My group's single-molecule PCR methods were originally developed to simplify the analysis of modern DNA samples.

However, because smPCR is so sensitive, it is also useful in the analysis of ancient samples, which may contain only minute traces of surviving DNA. The ability to amplify many specific sequences using multiplex smPCR is particularly valuable, as it helps to conserve scarce samples.

In collaboration with Michael Hofreiter and Svante Pääbo, we have used our methods for the analysis of DNA from mammoth and from cave-bears, allowing the phylogeny of these species to be established. A protocol for multiplex amplification of ancient DNA can be found here.