New statistical method improves genomic analyzes
A new statistical method provides a more efficient way to uncover biologically meaningful changes in genomic data that span multiple conditions — such as cell types or tissues.
Whole genome studies produce enormous amounts of data, ranging from millions of individual DNA sequences to information about where and how many of the thousands of genes are expressed to the location of functional elements across the genome. Because of the amount and complexity of the data, comparing different biological conditions or across studies performed by separate laboratories can be statistically challenging.
“The difficulty when you have multiple conditions is how to analyze the data together in a way that can be both statistically powerful and computationally efficient,” said Qunhua Li, associate professor of statistics at Penn State. “Existing methods are computationally expensive or produce results that are difficult to interpret biologically. We developed a method called CLIMB that improves on existing methods, is computationally efficient, and produces biologically interpretable results. We test the method on three types of genomic data collected from hematopoietic cells — related to blood stem cells — but the method could also be used in analyses of other ‘omic’ data.”
The researchers describe the CLIMB (Composite LIkelihood eMpirical Bayes) method in a paper appearing online Nov. 12 in the journal Nature Communications.
“In experiments where there is so much information but from relatively few individuals, it helps to be able to use information as efficiently as possible,” said Hillary Koch, a graduate student at Penn State at the time of the research and now a senior statistician at Moderna. “There are statistical advantages to be able to look at everything together and even to use information from related experiments. CLIMB allows us to do just that.”
The CLIMB method uses principles from two traditional techniques to analyze data across multiple conditions. One technique uses a series of pairwise comparisons between conditions but becomes increasingly challenging to interpret as additional conditions are added. More