In this thesis a novel technique is introduced for job scheduling
in clusters and supercomputers with the goal of increasing the efficiency
and utilization of these machines. In particular, the problems arising
from heterogeneous architecture clusters and software load imbalances
are addressed. The suggested technique is a variation on gang scheduling
and other coscheduling methods, where several parallel jobs time-share
and space-share the same machine, using varying degrees of coordination
among processes.

The main idea behind this thesis is that a distributed/parallel scheduling
system can gather dynamic information on the synchronization behavior
of processes, and use this information to identify their different
coscheduling needs. Using this information, a scheduler can make better
scheduling decisions, to increase the overall system utilization and
decrease the runtime of applications in a multiprogramming environment.

The contribution of this thesis is threefold: (1) addressing the problems
that heterogeneous architectures and load imbalances pose to coscheduling
systems; (2) a methodological system of gathering job communication
information and subsequent process classification for the making of
better scheduling choices; and (3) experimental results that verify
the usefulness of applying dynamic communication statistics to scheduling
decisions. In addition, this work includes the implementation of an
efficient and flexible scheduler, with the ability to use many of
the scheduling algorithms found in the literature.

The main result of this thesis is the design and development of a
new approach to the identification of different process scheduling
requirements and their scheduling according to these requirements.
This approach is shown to be both feasible and performance-wise promising,
and may also prove to be useful when integrated with other approaches.
Another accomplishment of this work is the development of an extensive
scheduler system that is both very efficient and flexible, and allows
for testing real application behavior on real clusters, measuring
real scheduling issues.

This work was done partly at the parallel systems laboratory of the
Hebrew university in Jerusalem partly at the Modeling, Algorithms
and Informatics group of the Computer and Computational Sciences division
(CCS-3) of the Los Alamos national laboratory.