2.0 Code Generation with Ptolemy

2.3 Schedulers

Given a Universe of functional blocks to be scheduled and a Target describing the topology and characteristics of the single- or multiple-processor system for which code is generated, it is the responsibility of the Scheduler object to perform some or all of the following functions:

Determine which processor a given invocation of a given Block is executed on (for multiprocessor systems);
Determine the order in which actors are to be executed on a processor;
Arrange the execution of actors into standard control structures, like nested loops.

Not all schedulers perform all these functions (for example, we permit manual assignments of actors to processors if that is desired).

A key idea in Ptolemy is that there is no single parallel scheduler that is expected to handle all situations. Users can write schedulers and can use them in conjunction with schedulers we have written. As with the rest of Ptolemy, schedulers are written following object-oriented design principals. Thus a user would never have to write a scheduler from ground up, and in fact the user is free to derive the new scheduler from even our most advanced schedulers. We have designed a suite of specialized schedulers that can be mixed and matched for specific applications. After the scheduling is performed, each processing element is assigned a set of blocks to be executed in a scheduler-determined order.

2.3.1 Single-processor schedulers

For targets consisting of a single processor, we provide two basic scheduling techniques. In the first approach, we simulate the execution of the graph on a dynamic dataflow scheduler and record the order in which the actors fire. To generate a periodic schedule, we first compute the number of firing of each actor in one iteration of the execution, which determines the number of appearances of the actor in the final scheduled list. An actor is called runnable when all input samples are available on its input arcs. If there is more than one actor runnable at the same time, the scheduler chooses one based on a certain criterion. The simplest strategy is to choose one randomly.There are many possible schedules for all but the most trivial graphs; the schedule chosen takes resource costs into account, such as the necessity of flushing registers and the amount of buffering required, into account (see [8] for detailed discussion of SDF scheduling). The Target then generates code by executing the actors in the seq The second approach we call "loop scheduling". In this approach, actors that have the same sample rate are merged (wherever this will not cause deadlock) and loops are introduced to match the sample rates. The result is a hierarchical clustering; within each cluster, the techniques described above can be used to generate a schedule. The code then contains nested loop constructs together with sequences of code from the actors. The loop scheduling techniques used in Ptolemy are described in [9]; generalization of loop scheduling to include dynamic actors is discussed in [19].

2.3.2 Parallel scheduling

We have implemented three scheduling techniques that map SDF graphs onto multiple-processors with various interconnection topologies: Hu's level-based list scheduling, Sih's dynamic level scheduling [17], and Sih's declustering scheduling [18]. The target architecture is described by its Target object, a kind of MultiTarget. The Target class provides the scheduler with the necessary information on interprocessor communication to enable both scheduling and code synthesis. Targets also have parameters that allow the user to select the type of schedule, and (where appropriate) to experiment with the effect of altering the topology or the communication costs.

The scheduling techniques implemented in Ptolemy are retargettable in that they do not assume any limited set of interconnection topologies. When a scheduler incurs a requirement of interprocessor communication between processors, it instructs the target object to schedule the hardware resources for communication and compute the communication cost. The scheduler uses this information to decide whether the incurred communication cost is low enough to merit exploiting the parallelism. For example, in the OMA target [24], which has a shared-bus and shared-memory architecture, the requests for the shared bus from all processing elements are determined at compile-time. Taking advantage of the communication compile-time information, we can reduce the run time communication costs.

The multiple-processor scheduler produces a list of single processor schedules, copying them to the child targets. Given these single-processor schedules, the same schemes as discussed above are re-used to generate the code for each child processor target. Currently, we are targeting the Sproc from Star Semiconductor, the CM5 from Thinking Machines, the DSP-3 from AT&T, and various parallel machines using the Motorola 56000 and 96000 DSPs.

2.3.3 Extending beyond the SDF model: dynamic constructs

We have applied the idea of mixed-domain scheduling to support dynamic constructs for code generation. Here a dynamic construct is a data-dependent construct such as if-then-else, do-while, or recursion. Ptolemy defines a new domain called CGDDF for these dynamic constructs. By putting this new domain inside an "SDF Wormhole", the whole application can be scheduled statically. The dynamic constructs inside the SDF Wormhole change the run time execution profile from the scheduled one. We have developed a technique that schedules dataflow graphs with run-time decision making, aiming to minimize the cost of run-time decisions [14]. We will define a specific Target class for this dynamic construct domain and generate suitable control code for the target architecture corresponding to the dynamic constructs.

Software Synthesis for DSP Using Ptolemy - 04 SEP 94

[Next] [Previous] [Up] [Top]