By Lingling An and Naruekamol Pookhao

Year 2013

The 14th TSAE National Conference and the 6th TSAE International Conference : TSAE 2013. p.142-144


Metagenomics is a relatively new but fast growing field within environmental biology and medical science. It enables researchers to understand the diversity of microbes, their functions, cooperation, and evaluation in a particular ecosystem. Traditional methods in genomics and microbiology cannot capture the structure of the broad microbial community within the environmental sample (e.g, soil, seawater, or human gut). Nowadays, high-throughput next generation sequencing technologies provide a powerful way in metagenomic studies. However, due to the massive short DNA sequences produced by the new sequencing technologies, there is an urgent need to develop efficient statistical methods to rapidly analyze the massive sequencing data generated from microbial communities and to accurately detect the features/functions present in a metagenomic sample/community. Although several issues about functions of metagenomes at pathways or subsystems level have been investigated, it is lack of investigation on functional analysis of metagenomics at a low level, i.e., more specific level.

This study is focusing on identifying all possible functional roles that are at the low level and present in a metagenomic sample/community. In this research we propose a statistical mixture model at the codon level of the genes to globally assign short reads to the candidate function roles based on the SEED classification, with sequencing error considered. Comparing with other available algorithms and tools designated for metagenomic analysis through comprehensive simulation studies, our proposed approach is able to more specifically detect functional roles and more accurately estimated their abundance. The methods are also employed to analyze a real meta genomic data set.

Download: Statistical methods for functional metagenomicanalysis Based on next generation sequencing data