Our problem was the following : data processing in Terapix is dominated by single-precision floating-point arithmetics on large arrays. For most tasks, these large arrays are accessed almost sequentially (a few image lines are fetched at a time). Given the fact that we may operate several processors in parallel, what processor combination gives us the best performance/price ratio ?
Comparison of CPUs available in Q4/2001
Based on the SPEC2000 benchmarks shown above, we decided to go for 1.53GHz AMD Athlons in dual-processor configuration (MP1800+), as these processors offer the best performance/price ratio. The Alpha processors we have been using so far perform better in double-precision and have a wider adressable memory-space than Athlons (they are 64bit processors). But they are also about 5 times more expensive. But what about INTEL ?
INTEL Pentium4 vs AMD Athlon XP1800+ for image resampling
We subsequently conducted a performance test using SWarp version 1.34 on AthlonMP1800+ (dual-processor box), AthlonXP1800+, and INTEL Pentium4 1.8GHz processors. We also threw in an Alpha XP1000 and a Quad-processor Alpha ES40 to check how these expensive pieces of hardware perform compared to our humble PCs. All machines had 512 MB or more of memory. The test was run on UDMA-IDE disks or on SCSI UW devices, and execution time was recorded using the UNIX time function. On the dual-processor AthlonMP1800+ and quad-Alpha, SWarp was run in parallel mode to take advantage of multiple CPUs. The test consisted of warping a 2kx4k CCD image using a 3rd order polynomial correction through a tangential de-projection/re-projection. The results are shown below in Mpixel/s (higher is better) :
This graph brings 2 remarks :
The INTEL Pentium4 performance is miserable given its clock speed (1.8GHz) and its expensive memory (dual PC800 RDRAM). It is totally smoked by the twice-cheaper Athlon at 1.53GHz and its DDR-SDRAM. Ironically, in both cases SWarp was compiled using INTEL's own compiler with all optimization options turned on, including SSE/SSE2 vectorization. We checked on a second Pentium4 machine and reproduced the same results. We also checked that the Pentium4's were not overheating (which causes these processors to automatically slow-down). Linux rates the Pentium4's at 3604 BogoMIPS compared to 2667 for the Athlons, which seems to confirm that the Pentium4's are operating correctly.
For this kind of processing (dominated by single-precision arithmetics), the Athlons are 50% faster than 667 MHz EV67 Alpha processors. Hence it sounds unlikely that the newer 833MHz Alphas will outperform the newer 1.7GHz Athlons by a large margin.