转载

RNA-seq 序列错误纠正评估

文章:

Optimizing error correction of RNAseq reads

摘要:

Motivation:The correction of sequencing errors contained in Illumina reads derived from genomic DNA is a common pre-processing step in many de novo genome assembly pipelines, and has been shown to improved the quality of resultant assemblies. In contrast, the correction of errors in transcriptome sequence data is much less common, but can potentially yield similar improvements in mapping and assembly quality. This manuscript evaluates several popular read-correction tool’s ability to correct sequence errors commonplace to transcriptome derived Illumina reads.

Results: I evaluated the efficacy of correction of transcriptome derived sequencing reads using using several metrics across a variety of sequencing depths. This evaluation demonstrates a complex relationship between the quality of the correction, depth of sequencing, and hardware availability which results in variable recommendations depending on the goals of the experiment, tolerance for false positives, and depth of coverage. Overall, read error correction is an important step in read quality control, and should become a standard part of analytical pipelines.

Availability: Results are non-deterministically repeatable using AMI:ami-3dae4956 (MacManes EC 2015) and the Makefile available here: https://goo.gl/oVIuE0

文章链接:

http://biorxiv.org/content/early/2015/05/29/020123

Github地址:

https://github.com/macmanes/read_error_corr

文章导读:

Biostack 收录序列错误纠正(EC,error correction)的工具有 40+,其中PacBio reads 由于测序错误率搞,所谓有需求就有方案,这里工具比较多,比如:Proovread ,ECTools 。

Illumina 测序比PacBio准确性要高,但是在基因组拼装过程中通过EC也会提高拼装效果,比如小基因组拼装软件 SPADes,元基因组拼装 IDBA_UD 都引入错误纠正(非必选项)。

专门设计纠正RNA-seq reads 的工具相对少很多,有SEECER;

这篇评估了几个比较流行的错误纠正工具(lighter,SGA, BLESS, SEECER,BFC)对 RNA-seq reads的错误纠正的效果,结果显示总体效果 BFC 胜出, BFC 是Li Heng 最近开发的一款工具;50M PE reads 以下作者推荐使用BFC, 50M – 100M PE reads SEECER 效果上是最好,但是需要的内存比较多,不过现在内存价格不是很高,256G内存还是可以装备到普通服务器上,如果真内有那么多内存使用那还是使用BFC吧。

Github 上提供了项目执行的命令,看到了熟悉的Javascript shell K8

正文到此结束
Loading...