In this thesis a novel technique is introduced for job scheduling in clusters and supercomputers with the goal of increasing the efficiency and utilization of these machines. In particular, the problems arising from heterogeneous architecture clusters and software load imbalances are addressed. The suggested technique is a variation on gang scheduling and other coscheduling methods, where several parallel jobs time-share and space-share the same machine, using varying degrees of coordination among processes. The main idea behind this thesis is that a distributed/parallel scheduling system can gather dynamic information on the synchronization behavior of processes, and use this information to identify their different coscheduling needs. Using this information, a scheduler can make better scheduling decisions, to increase the overall system utilization and decrease the runtime of applications in a multiprogramming environment. The contribution of this thesis is threefold: (1) addressing the problems that heterogeneous architectures and load imbalances pose to coscheduling systems; (2) a methodological system of gathering job communication information and subsequent process classification for the making of better scheduling choices; and (3) experimental results that verify the usefulness of applying dynamic communication statistics to scheduling decisions. In addition, this work includes the implementation of an efficient and flexible scheduler, with the ability to use many of the scheduling algorithms found in the literature. The main result of this thesis is the design and development of a new approach to the identification of different process scheduling requirements and their scheduling according to these requirements. This approach is shown to be both feasible and performance-wise promising, and may also prove to be useful when integrated with other approaches. Another accomplishment of this work is the development of an extensive scheduler system that is both very efficient and flexible, and allows for testing real application behavior on real clusters, measuring real scheduling issues. This work was done partly at the parallel systems laboratory of the Hebrew university in Jerusalem partly at the Modeling, Algorithms and Informatics group of the Computer and Computational Sciences division (CCS-3) of the Los Alamos national laboratory.