Digital Library GRID - Enhancing Infrastructure of OAI

Home | Proposal | Project Team | Experiments | Bibliography Architectural Models | new arc | Installations   | Dlib Homepage 



OAI is becoming widely accepted and many archives are currently or soon-to-be OAI-compliant. A federated search service as efficient as Google, which provides a unified interface to all the libraries, is useful to a wide variety of audience. Google does an incredible job at providing discovery services of the 'shallow' web' to the general public, we envision a similar quality, sustainable, free discovery service for students and researchers for parts of the 'deep' web. The parts of the deep web we refer to in this vision are digital libraries and collections that are exposing their metadata using OAI-PMH (Protocol for Metadata Harvesting). A high performance federated search service that exploits the resources of a Grid will make available a large amount of information which is distributed amongst heterogeneous digital libraries. A search user will be able to access a research paper, preprint, a technical report, an image of a great painting, or a performance of a musical piece in a few seconds from thousands of libraries scattered all over the world. As part of this project we propose to build a testbed that will use 3 grid nodes to perform the high-latency tasks of harvesting and indexing from 3 data providers. We will use the grid also to transmit these indices and metadata to a small cluster (3 nodes) of search engines each of which will be working on one or more indices it receives from the harvesting nodes. 

We will develop the software tools to: 

  • Adapt existing OAI-PMH harvesting (Arc) and Lucene indexing software to the grid 

  •  Deploy a cluster to do parallel, high performance search based on Lucene engine 

  •  Develop software support to move indices and metadata between low and high latency nodes.