High-throughput Biological Sequence Search


Computer science has made incredible strides in past decade. Some science areas such as biology and bioinformatics could benefit greatly by exploiting these advances. However, it requires in-depth knowledge from multiple computer science subfields to fully utilize the computational power or storage resources offered by new technologies in computer hardware, architecture, and system software. The more and more complex computer systems have made performance tuning an overwhelming task for computer scientists, not to mention for domain scientists such as biologist or bio-chemists. As a result, most bioinformatics applications currently used on daily basis by scientists fail to take advantage of state-of-the-art computer systems. In this project, we propose to focus on one important type of bioinformatics research tools and investigate high-throughput biological sequence search. We will conduct a comprehensive study of performance optimization of popular biological sequence search programs, and develop a set of techniques that can work in different execution environments to automatically and transparently enhance the programs' overall performance. More specifically, we propose to develop the following key techniques:
  1. Efficient and transparent background I/O schemes through light-weight data management facilities.
  2. Optimized data access and memory management methods customized for sequence search programs.
  3. Scalable and flexible collective I/O for parallel execution.
  4. Automatic optimization of parallel execution plans.
We expect this research to:
  1. speed up existing widely used sequence search programs under a variety of system configurations
  2. alleviate bioinformatics application developers' and users' burden in performance tuning
  3. influence the design of futurebioinformatics applications by proposing/evaluating scalable execution models

Research Sponsor




Last modified: Mon Mar 21 11:27:46 EST 2005