The use of clusters of independent compute nodes as high capability and capacity computers is rapidly growing in industry, academia, and government. This growth is accompanied by fast-paced progress in cluster-aware hardware, and in particular in interconnection technology. Contemporary networks offer not only excellent performance as expressed by latency and bandwidth, but also advanced architectural features, such as programmable network interface cards, hardware support for collective communication operations, and support for modern communication protocols such as MPI and RDMA. The rapid progress in cluster hardware and usage is unfortunately not matched by similar progress in system software. This software consists of the middleware: the operating system, user libraries, and utilities that interface between the hardware and the user applications, allowing them to make use of the machine's resources. In fact, most of these clusters use common workstation operating systems such as Linux running on each of the cluster's nodes, with a collection of loosely-related libraries, utilities, and scripts to access the cluster's resources. Such solutions are hardly adequate for large-scale clusters and/or high-performance computing applications. The problems they cause include (but are not limited to): (1) poor performance and scalability of applications and system software; (2) reduced utilization of the machine due to suboptimal resource allocation; (3) reliability problems caused by the multitude of independent software modules, and the redundancy in their operation, and (4) difficulty in operating and making full use of these machines. The premise behind this dissertation is that system software can be dramatically improved in terms of performance, scalability, reliability, and simplicity by making use of the features offered by modern interconnects. Unlike single-node operating systems, most of a cluster's system software tasks involve efficient global synchronization of resources. As such, parallel system software can be designed to benefit from the novel hardware features offered by contemporary interconnection technology. This dissertation promotes the idea of treating a cluster's operating system as any other high-performance parallel application, and increasing its reliance on synchronization abilities while reducing its per-node complexity and redundancy. This dissertation makes the following primary contributions. First, a set of necessary network mechanisms to support this system software model is described. A prototype implementation of system software based on these mechanisms is then discussed. This system currently tackles three main aspects of parallel computers: resource management, communication libraries, and job scheduling methods. This model was implemented on three different cluster architectures. Extensive performance and scalability evaluations with real clusters and applications show significant improvements over previous work in all three areas. In particular, this research focuses primarily on job scheduling strategies, and demonstrates that through advanced algorithms, the system's throughput and responsiveness can be improved over a wide spectrum of workloads.