Evaluating parallel job scheduling algorithms is a challenging task. Many factors affect the outcome of the evaluation, including workload and application choices, metrics, choice of scheduling algorithms and their parameters, and the hardware used or assumed. The large, non-continuous parameter space renders analytical evaluation extremely difficult, while simulation evaluations are very sensitive to the assumptions undertaken, sometimes resulting in contradicting results. Experimental evaluations are even rarer, due to the complexity of implementation, and the difficulty in obtaining a dedicated large machine for long periods of time. This talk will describe our efforts to evaluate various scheduling strategies in a dynamic, realistic environment. Of the various parameters that affect job scheduling performance, workload and implementation play a pivotal role. Most studies either employ simulations and/or simplistic workloads, which contain many assumptions, including unknown ones. Instead, we developed a scheduling framework that implements several existing and novel algorithms on various cluster architectures of up to hundreds of nodes. This framework was used to produce the first experimental evaluation of several job scheduling strategies in a dynamic workload environment, using synthetic and scientific MPI applications. This talk will discuss the challenges involved in evaluating job scheduling strategies, and the approaches we chose to address them. An analysis will be presented of three factors affecting scheduling systems running dynamic workloads: multiprogramming level, time quantum, and the use of backfilling for queue management -- and how they depend on offered load. Joint work with Dror Feitelson (Hebrew U.), Fabrizio Petrini (LANL), and Juan Fernandez (Murcia U.)