High performance computers make the parallel modeling codes of NCAR and STAR possible. NCAR has a rich history of high-performance computing dating back to the 1960s, and STAR has leveraged that experience in designing a multi-node, scalable research cluster with a low-latency myrinet interconnect. Market forces and physical microprocessor limitations are constantly evolving the computer industry toward different paradigms. The latest multi-core microprocessor paradigm has brought cluster level parallel computation to commodity servers, desktops and laptops. Processors from the major chip makers now contain multiple homogeneous cores, which can each perform independent tasks much like independent nodes in a cluster.
IBM has taken the paradigm a step further with its Cell Broadband Engine, which features heterogeneous cores; one core for organizational work and 8 cores for vectorized computational work. Additionally, streaming Graphics Processing Units (GPUs), each with hundreds of streaming compute cores, have become a popular and affordable way of giving certain algorithms several orders of magnitude performance speedup relative to CPU based algorithms.
STAR is dedicated to understanding the compilers, libraries, APIs and best practices that make this hardware accessible to their software engineers and scientists. A development effort is currently underway at STAR to implement an accelerated particle dispersion model that utilizes nVidia GPUs through the CUDA toolkit. Looking forward, STAR is particularly interested in the OpenCL standard (finalized in December 2008), which will provide a common API for programming multi-core CPUs and GPUs, thus making OpenCL applications computationally efficient across multiple computing and cluster platforms with minimal developer effort.