High-throughput Biological Sequence Search
Description
Computer science has made incredible strides in past decade. Some
science areas such as biology and bioinformatics could benefit greatly
by exploiting these advances. However, it requires in-depth knowledge
from multiple computer science subfields to fully utilize the
computational power or storage resources offered by new technologies
in computer hardware, architecture, and system software. The more and
more complex computer systems have made performance tuning an
overwhelming task for computer scientists, not to mention for domain
scientists such as biologist or bio-chemists. As a result, most
bioinformatics applications currently used on daily basis by
scientists fail to take advantage of state-of-the-art computer
systems. In this project, we propose to focus on one important type
of bioinformatics research tools and investigate high-throughput
biological sequence search. We will conduct a comprehensive study of
performance optimization of popular biological sequence search
programs, and develop a set of techniques that can work in different
execution environments to automatically and transparently enhance the
programs' overall performance. More specifically, we propose to
develop the following key techniques:
- Efficient and transparent background I/O schemes through
light-weight data management facilities.
- Optimized data access and memory management methods customized for
sequence search programs.
- Scalable and flexible collective I/O for parallel execution.
- Automatic optimization of parallel execution plans.
We expect this research to:
- speed up existing widely used sequence search programs under a variety of system configurations
- alleviate bioinformatics application developers' and users' burden in performance tuning
- influence the design of futurebioinformatics applications by proposing/evaluating scalable execution
models
Research Sponsor
People
Faculty
Collaborators
Students
- Heshan Lin (PhD student)
- Jiangtian Li (PhD student)
- Guanghua Zhao (Part-time PhD student)
Publications
-
Heshan Lin, Xiaosong Ma, Praveen Chandramohan, Al Geist, and Nagiza Samatova,
Efficient Data Access for Parallel BLAST,
to appear, 2005 International Parallel and
Distributed Processing Symposium (IPDPS 2005).
Last modified: Mon Mar 21 11:27:46 EST 2005