New Generation Computing, 22(2004)127-136
Ohmsha, Ltd. and Springer-Verlag

The Encyclopedia of Life Project: Grid Software and Deployment

Wilfred W. LI*1, Robert W. BYRNES*1, Jim HAYES*1,3, Adam BIRNBAUM*1,
Vicente M. REYES*1, Atif SHAHAB*5, Coleman MOSLEY*4,
Dmitry PEKUROVSKY*1, Greg B. QUINN*1, Ilya N. SHINDYALOV*1,
Henri CASANOVA*1,3, Larry ANG*5, Fran BERMAN*1,3,
Peter W. ARZBERGER*1, Mark A. MILLER*1, Phillip E. BOURNE*1,2
Integrative Biosciences Program
San Diego Supercomputer Center*1
Department of Pharmacology*2
Department of Computer Science and Engineering*3
Bioinformatics Program*4
University of California, San Diego
9500 Gilman Drive
La Jolla, CA92093, USA
Bioinformatics Institute*5
21 Heng Mui Keng Terrace
12R, Level 3
Singapore 119612

{Wilfred,rbyrnes,jhayes,birnbaum,vreyes,cmosley,dmitry,quinn,shinyal,
casanova,berman,parzberger,mmiller,bourne}@sdsc.edu
{atif,larry@bii-sg.org}

Received 15 June 2003
Revised manuscript received 7 November 2003

Abstract

The ongoing global effort of genome sequencing is making large scale comparative proteomic analysis an intriguing task. The Encyclopedia of Life (EOL; http://eol.sdsc.edu) project aims to provide current functional and structural annotations for all available proteomes, a computational challenge never seen before in biology. Using an integrative genome annotation pipeline (iGAP), we have produced 3D models and functional annotations for more than 100 proteomes thus far. This process is greatly facilitated by grid compute resources, and especially by the development of grid application execution environment. AppLeS (Application-Level Scheduling) Parameter Sweep Template (APST) has been adopted by the EOL project as a mediator to grid middleware. APST has made the annotation process much more efficient, highly automated and scalable. Currently we are building a domain-specific bioinformatics workflow management system (BWMS) on top of APST, which further streamlines grid deployment of life science applications. With these developments in mind, we discuss some common problems and expectations of grid computing for high throughput proteomics.

Keywords:Biology on the Grid, Integrative Genome Annotation Pipeline, Encyclopedia of Life, AppLes Parameter Sweep Template, Bioinformatics.

[Back]