Abstract:
In an age of ever expanding data sets, there is an increasing demand for high performance computing (HPC) in the scientific community. In order to maximize the performance of existing hardware, researchers have been looking for innovative and cost effective ways to expand and/or optimize their clusters. This thesis first provides a quick summary on the traditional methods that are currently in use in institutional clusters followed by some of the new services being offered to help researchers expand their computational resources. Most importantly, three recent papers propose new ways to optimize current hardware to increase total job throughput and utilization of a cluster. The methods proposed take advantage of dynamic allocation, both on the application level and the system level. The first paper (Nathaniel Kremer Herman and Thain 2018) finds that it is possible to dynamically size master-worker applications, freeing up significant resources for other applications to run on the system. The work done in (Feng Liu 2018) offers a method to integrate on-demand HPC requests into a traditional batch system. This created a massive boost in system throughput and reduction in batch wait time. The final paper (Suraj Prabhakaran 2015) looks into the potential of a new type of malleable job that can be run on HPC systems. By integrating these jobs into the batch scheduler, total system throughput was increased and lays a foundation for potential future workflows. This review is followed by the methods and results of a comparison between parallel jobs in the Chapel language and the MPI implementation. During my experience, Chapel was much easier to learn and to implement however, when scaled, it couldn't compare to the speed-up of MPI.