|
The above graph gives a very approximate measure of the scalability of VASP as a function of the number of nodes, N. The test system was a 192 atom bulk Al cell, using 388 bands and 82,944 planewaves. K-point sampling was restricted to the Gamma point. These tests use the scaLAPACK parallel linear algebra package. The run time (wall clock) to achieve an SCF for the serial job was slightly less than 2 hours. The Speed-up is defined as the serial time divided by the parallel time (which is a function of N). The Efficiency is the Speed-up divided by N. These results should be considered only as a rough guide for choosing job sizes, and do not necessarily reflect the optimal performance of the code. (Your mileage may vary.) In particular, since the Origin uses a physically distributed but logically shared memory architecture (cc-NUMA), it's possible that run times for identical jobs can vary depending on how a given data set is distributed across the nodes. Unfortunately, I haven't had the time to investigate this effect further. However, I have seen instances where run times differed by as much as 40%.
Comparison of VASP Performance:Alliance RoadRunner Linux Cluster vs. NCSA Origin2000
The table below compares the run times (seconds) obtained for the 192
atom Al cell (also used above) on the Origin2000
and the RoadRunner Linux Cluster
for three different job sizes. The percentage
by which the RoadRunner run times exceeded those of the appropriate Origin
job are indicated in parentheses.
Notes:
| ||||||||||||||||||||||