06/13/2007: Preliminary list of requirements proposed for the EFIGI webservice.
Rationale
Instead of developing a web service dedicated only to galaxy morphology measurements, we investigate the possibility to offer a simple and generic mean to access several services related to astronomical image analysis/processing.
These services have in common that they are all directly related to executables that work on or generate FITS data files and XML-VOTable metadata files in batch mode. The executables have been developed and are maintained in house, which means that they can easily be modified if needed.
Job Handling
Because of CPU and/or I/O intensive nature of such webservices, a mechanism for distributing jobs over a cluster of machines is necessary. Some research done by Olivier Ricou led us to selecting Condor as the platform of choice for job distribution.
The sequencing of jobs and dependencies between job executions lies outside the scope of this webservice, i.e. it is the responsability of the client.
Technology and conformance to standards
The astronomical community is currently involved in an international effort to normalise the format of metadata and webservice protocols: the Virtual Observatory (VO). It would therefore be logical to design a system which follows as much as reasonably possible to the VO recommendations.
- The tools designed at TERAPIX provide support for VOTables in output, although it is not yet clear whether this standard will remain popular in the future.
- Concerning VO protocols, besides those limited to publication of data based on simple HTTP GET requests (e.g. Simple Image Access (SIA)), existing web services generally use SOAP.
More information on VO philosophy and mythology relevant for these developments can be found here or here.
The AstroGrid Common Execution Architecture is worth investigating. Based on Java code, it makes it possible to wrap "legacy" command-line applications to be configured to run on demand in an application server. Wrapped applications can be called from desktop applications, or from workflows, or from VO services; they are automatically registered in the IVOA registry. IT should be checked that the AstroGrid system is not too constraining and inefficient for the job.
Although it is not a strict requirement, use of Python as a language for wrappers on the server side is recommended, because of the large number of Python-enabled libraries in the scientific community, especially astronomy.
Specific requirements
For maximum flexibility and maintainability, it is suggested to offer the execution of a tool chosen from a list which will contain something like a dozen executables (for security reasons). In addition, calls to the service should provide a command-line string, a list of files to be uploaded to the server (like Condor does), and the necessary free CPU / memory / disk-space requirements. It should be possible to provide a version number x.y.z of the executable. The possibility to send executables for testing purposes might be worth investigating (it should of course be restricted to local users).
One particular aspect of the web service is that it must be able to cope with a large number of big file uploads: typically thousands of files for a total of several terabytes. There are obviously some consequences to this:
- The protocol used for transferring data from the client to the web server, as well as the one used bu Condor must make optimal use of the available bandwidth, (e.g. be able to saturate a gigabit ethernet connection if possible). Some benchmarking might be required here.
- The transfer of such amounts of data can take a few hours even over gigabit ethernet. Timeouts must be set accordingly, and status requests (see below) should be able to indicate what fraction of data has been transfered so far and even some estimated time or delay of arrival.
- Some tasks executed in sequence might actually share a common set of data. It should be investigated whether it could be possible to "tag" some datasets and reuse the uploaded data with several task calls for a limited amount of time for some users.
A mechanism for requesting the job status/info should be available. Initially, it could use some standard Condor/system status call. More individualized status procedures will be defined later.
It should be possible to abort and if possible freeze job execution.
A management mechanism for lists of users, or at least user levels must be implemented. For each user level it should be possible to define a maximum job priority, amount of disk space, number of running jobs and execution time.
The service should be able to deal with nodes and network failures as well as recoveries and answer accordingly to status inquiries.
06/14/2007: Summary of the discussion
Present: Olivier, Yannick, Emmanuel, Frederic, Chiara
About the global architecture: Olivier has basically identified two main options for the service:
- go for the full monty, using grid toolkits and protocols: Globus and some of its components , especially RTF (Reliable File Transfer) protocol. RTF provides high reliability and security for long distance transfers, as well as support for redundant data resources. Another useful component is WSRF, the Web Service Resource Framework. WSRF brings "stateness" to web-services, a feature essential to grid computing. Events can be managed across the grid using WSN (Web Service Notification).
Although the Globus approach offers many interesting features (in particular concerning data access), it is not clear whether it will be the most relevant for the system we are trying to build: the current specifications for VO webservices are actually more basic; and TERAPIX is not yet interested in developing a full-fledge data-processing grid, at least not in the EFIGI framework.
Moreover, the additional layers present in the Globus toolkit may actually reduce slow down local data transfers, something it cannot afford. - build a simpler system around Condor, using Condor's built-in Web Service API (for a tutoral on web service programming, see Olivier's lecture.
It is decided to go for the second option.
The main architectural issue with the system is the handling of data I/O from the computing nodes. Since we want to drop the current design where each node has only access to a specific data subset, we have two options:
- find some fast and robust way to access transparently a common pool of data from all nodes. For budget reasons, our cluster is not built around a dedicated SAN, and must rely on Gigabit Ethernet interfaces for data transfers. But so far the NFS approach has not proven to be reliable enough for our data access pattern (especially image stacking).
- Send data together with the processing jobs to the nodes from a centralized data pool.
The second option needs to be benchmarked first: how fast, how reliable for several terabytes? A deadline has been set for July 1st. When Olivier tried to do make network throughput measurements from the 32bit machines where Condor had been installed initially, he discovered that the RAID arrays were degraded, and could not provide more than 16MB/s sequential readouts. As the RAID systems on these machines are already old (6 years) and intrinsically slow anyway ( 60MB/s write speed), it was decided to install the Condor test system on recent 64bit systems with fast RAIDs. Benchmarking tests are currently ongoing