A maximum-likelihood base caller for DNA sequencing.

Abstract

The procedures used to sequence the human genome involve the electrophoretic separation of mixtures of dioxyribonucleic acid (DNA) fragments tagged with reporting groups, usually fluorescent dyes. Each fluorescent pulse which arrives from an optical detector corresponds to a nucleotide (base) in the DNA sequence, and the subsequent process of base detection is known as base calling. Generating longer and more accurate sequences in the base-calling process will reduce the high cost of DNA sequencing. This paper presents an automated base-calling algorithm, referred to as maximum-likelihood base caller (MLB), which is based on maximum likelihood equalization for digital communication channels. Based on 125 experimental datasets, MLB averaged up to 40% fewer errors than the widely used ABI base caller from the Applied Biosystems Division of PE Corporation. MLB's accuracy rivaled that of another well-known base caller, Phred, surpassing it on datasets with high background noise.

DOI
10.1109/10.867962
Year