Faculty Publications

Indexing Genomic Sequence Libraries

Document Type

Article

Keywords

Bioinformatics, Genomics, Information retrieval, Mumps, Sequence retrieval

Journal/Book/Conference Title

Information Processing and Management

Volume

Issue

First Page

265

Last Page

274

Abstract

This paper describes an extensible, open-source (GPL) data repository and retrieval system that supports fast, efficient, keyword based retrieval of genomic sequences from multiple libraries with retrieved sequences post-processed by FASTA, Smith-Waterman and other analysis software. This application is implemented for Linux and is written in Mumps, C, and C++ with supporting components that include the Berkeley Data Base, the Perl Compatible Regular Expression Library, GLADE, and tools such as FASTA, Smith-Waterman, and modules from EMBOSS. The package described here can quickly index data sets of up to 256 terabytes using a B-tree based multi-dimensional data model. An example is presented that indexes the text of the full NCBI Genbank library. © 2003 Elsevier Ltd. All rights reserved.

Department

Department of Computer Science

Original Publication Date

3-1-2005

DOI of published version

10.1016/j.ipm.2003.09.001

Recommended Citation

O'Kane, Kevin C. and Lockner, Matthew J., "Indexing Genomic Sequence Libraries" (2005). Faculty Publications. 2961.
https://scholarworks.uni.edu/facpub/2961

Link to Full Text

Comments?

COinS

Faculty Publications

Indexing Genomic Sequence Libraries

Document Type

Keywords

Journal/Book/Conference Title

Volume

Issue

First Page

Last Page

Abstract

Department

Original Publication Date

DOI of published version

Recommended Citation

Search

Browse

Author Corner

Links

Faculty Publications

Indexing Genomic Sequence Libraries

Authors

Document Type

Keywords

Journal/Book/Conference Title

Volume

Issue

First Page

Last Page

Abstract

Department

Original Publication Date

DOI of published version

Recommended Citation

Share

Search

Browse

Author Corner

Links