High Performance Computing Competence Center
Baden-Württemberg (hkz-bw)



 
<-- Menue Folgeseiten ------------------------->

 

Deutsche Version

 


Vector Parallel Supercomputer NEC SX-6X at HLRS

The High Performance Computing Center Stuttgart (HLRS) and NEC have signed a contract to install a leading edge supercomputing system at Stuttgart in 2004/2005. In an initial phase a medium sized SX-6 system is installed that will later be upgraded to the actual full production system.

Basic Architecture

The supercomputer of the SX-family is based on the well known vector architecture of NEC. The basic building block is a vector processor with the following characteristics:

Clock Rate 565 MHz
Parallel Vector Pipes 2*8
Processor Speed 9 GFlop/s

The theoretical peak performance of the CPU is slightly higher than 9 GFlop/s because of an additional scalar unit. Already this single processor peak performance is outstanding.

Eight of these vector processors are integrated into a shared memory system. The main characteristics of such a node are as follows.

Node Speed 72 GFlop/s
Total Memory Speed 288 GB/s
Memory Speed / Processor 36 GB/s
Memory Size 64 GB

The outstanding feature is the extremely high bandwidth of the memory. For each operation the system can load or store 4 Bytes. This leads to an extremely high level of sustained performance in the range of 60% for a single processor. The sharing of memory, however, increases costs in a way that a full node operates at a sustained speed of around 50% of the peak speed.

These nodes are interconnected by the well known IXS crossbar switch of NEC. It connects each node to the switch with a bandwidth of 8 GB/s. The MPI communication latency is less than 8 µsec.

The initial cluster of such SMP nodes will consist of 6 nodes such that the initial installation will have the following parameters:

Total Number of Processors 48
Total Peak Performance 434 GFlop/s
Total Main Memory 384 GB
Total Memory Bandwidth 1.7 TB/s
Total Communication Speed 48 GB/s

This interim system will be used for code porting and software development. Its sustained performance is expected to be about 200-220 GFlop/s which is about a factor of three compared to the old HLRS systems

The final system will have the same type of architecture. Performance figures for the individual processor will more than double. The number of nodes of the final cluster will be 64 with a total of 512 processors. This will result in a total peak performance of more than 11 TFlop/s and an expected sustained performance of between three and four TF/s.

Overall Architecture

The overall concept consists of the following key parts:

  • Compute Cluster: This is the cluster of shared memory vector nodes that is working as the power horse of simulation.
  • Pre-Processing Nodes: These are going to be IA64 based 32 processor nodes with a large shared memory in the range about 0.5 TB.
  • Post-Processing Cluster: A cluster for post-processing will be integrated into the concept to transfer the compute load for visualization to an appropriate platform but at the same time keep a close integration of systems.
  • Common File System: All three hardware architectures will be integrated through a file system.

Architecture of NEC SX system

Delivery Schedule

The first part of the system was already delivered in March 2004 and is operational since April 2004.

The final system will be installed in two phases starting late in 2004 and scheduled to be finished by June 2005. The system will then be fully operational by mid of 2005.