QIBEBT-IR  > 单细胞中心组群
QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data
Zhou, Qian1,2; Su, Xiaoquan1,2; Wang, Anhui1,2,3; Xu, Jian1,2; Ning, Kang1,2
2013-04-02
发表期刊PLOS ONE
卷号8期号:4页码:e60234
摘要 Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7–8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis.
QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html; Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7-8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html.
文章类型Article
学科领域功能基因组
WOS标题词Science & Technology
DOI10.1371/journal.pone.0060234
关键词[WOS]SHORT READ ALIGNMENT ; ULTRAFAST ; ARB
收录类别SCI
语种英语
WOS研究方向Science & Technology - Other Topics
WOS类目Multidisciplinary Sciences
WOS记录号WOS:000317717300074
引用统计
文献类型期刊论文
条目标识符http://ir.qibebt.ac.cn/handle/337004/1621
专题单细胞中心组群
作者单位1.Chinese Acad Sci, CAS Key Lab Biofuels, Qingdao Inst Bioenergy & Bioproc Technol, Qingdao, Shandong, Peoples R China
2.Chinese Acad Sci, Shandong Key Lab Energy Genet, Qingdao Inst Bioenergy & Bioproc Technol, Qingdao, Shandong, Peoples R China
3.China Three Gorges Univ, Coll Comp & Informat Technol, Yichang, Hubei, Peoples R China
推荐引用方式
GB/T 7714
Zhou, Qian,Su, Xiaoquan,Wang, Anhui,et al. QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data[J]. PLOS ONE,2013,8(4):e60234.
APA Zhou, Qian,Su, Xiaoquan,Wang, Anhui,Xu, Jian,&Ning, Kang.(2013).QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data.PLOS ONE,8(4),e60234.
MLA Zhou, Qian,et al."QC-Chain: Fast and Holistic Quality Control Method for Next-Generation Sequencing Data".PLOS ONE 8.4(2013):e60234.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
QC-Chain Fast and Ho(1111KB) 开放获取CC BY-NC-SA浏览 下载
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhou, Qian]的文章
[Su, Xiaoquan]的文章
[Wang, Anhui]的文章
百度学术
百度学术中相似的文章
[Zhou, Qian]的文章
[Su, Xiaoquan]的文章
[Wang, Anhui]的文章
必应学术
必应学术中相似的文章
[Zhou, Qian]的文章
[Su, Xiaoquan]的文章
[Wang, Anhui]的文章
相关权益政策
暂无数据
收藏/分享
文件名: QC-Chain Fast and Holistic Quality Control Method forNext-Generation Sequencing Data.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。