|
High Performance Computing Competence Center |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||
Vector Parallel Supercomputer NEC SX-6X at HLRSThe High Performance Computing Center Stuttgart (HLRS) and NEC have signed a contract to install a leading edge supercomputing system at Stuttgart in 2004/2005. In an initial phase a medium sized SX-6 system is installed that will later be upgraded to the actual full production system. Basic ArchitectureThe supercomputer of the SX-family is based on the well known vector architecture of NEC. The basic building block is a vector processor with the following characteristics:
The theoretical peak performance of the CPU is slightly higher than 9 GFlop/s because of an additional scalar unit. Already this single processor peak performance is outstanding. Eight of these vector processors are integrated into a shared memory system. The main characteristics of such a node are as follows.
The outstanding feature is the extremely high bandwidth of the memory. For each operation the system can load or store 4 Bytes. This leads to an extremely high level of sustained performance in the range of 60% for a single processor. The sharing of memory, however, increases costs in a way that a full node operates at a sustained speed of around 50% of the peak speed. These nodes are interconnected by the well known IXS crossbar switch of NEC. It connects each node to the switch with a bandwidth of 8 GB/s. The MPI communication latency is less than 8 µsec. The initial cluster of such SMP nodes will consist of 6 nodes such that the initial installation will have the following parameters:
This interim system will be used for code porting and software development. Its sustained performance is expected to be about 200-220 GFlop/s which is about a factor of three compared to the old HLRS systems The final system will have the same type of architecture. Performance figures for the individual processor will more than double. The number of nodes of the final cluster will be 64 with a total of 512 processors. This will result in a total peak performance of more than 11 TFlop/s and an expected sustained performance of between three and four TF/s. Overall ArchitectureThe overall concept consists of the following key parts:
Delivery ScheduleThe first part of the system was already delivered in March 2004 and is operational since April 2004. The final system will be installed in two phases starting late in 2004 and scheduled to be finished by June 2005. The system will then be fully operational by mid of 2005.
| |||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||