Automatic Code Generation for Heterogeneous Multiprocessors

Introduction

Dataflow is a natural representation for signal processing algorithms. One of its strengths is that it exposes parallelism by only expressing the actual data dependencies that exist in an algorithm. Applications are specified by a dataflow graph where the nodes represent computational actors, and data tokens flow between them along the arcs of the graph. Ptolemy [1] is a framework that supports dataflow programming as well as other computational models (such as discrete event), in which program graphs have different semantics.

Code generation consists of two phases, scheduling and synthesis. In the scheduling phase, the dataflow graph is partitioned for parallel execution. We splice send and receive actors into the graph for interprocessor communication. These actors do the synchronization necessary for a self-timed implementation [2]. For each target processor, an ordering of actor invocations is determined. In the synthesis phase, the code segments associated with each actor are stitched together, following the ordering specified by the scheduler. Commercial systems which use this "threading" technique include Comdisco's DPC [3] and CADIS's Descartes [4]. The techniques we describe here are complementary to those in DPC and Descartes, and could, in principle, be used in combination with them. In particular, we focus on management of data passed between actors when synchronous dataflow is used. DPC, by contrast, does not use dataflow semantics.

There are several forms of dataflow defined in Ptolemy. In synchronous dataflow (SDF) [5], the number of tokens produced or consumed in one firing of an actor is constant. This property makes it possible to determine execution order and memory requirements at compile time. Thus these systems do not have the overhead of run-time scheduling (in contrast to dynamic dataflow) and have very predictable run-time behavior. Some of the many SDF scheduling algorithms implemented in Ptolemy include: uniprocessor list scheduling [5], uniprocessor loop scheduling [6], Hu-level multiprocessor scheduling, and Sih's declustering multiprocessor scheduling [7].

Although SDF is sufficient to describe many signal processing algorithms, it cannot express data dependent control flow. A dataflow model which does allow this is boolean dataflow (BDF). This model subsumes SDF with two additional actors: switch and select. These actors allow the construction of data-dependent control flow structures such as do-while and if-then-else. Buck [8] shows how most of these graphs can be scheduled with bounded memory. Such graphs allow some dynamic dataflow expression with minimal run time overhead. Unfortunately, at the present time this scheduling works only for uniprocessor systems.

In this paper, we present three IPC interfaces that are constructed automatically from send and receive actors. The first interface uses send and receive actors to implement the IPC specified by an SDF parallel scheduler. In this configuration, we generate a stand-alone application where the entire dataflow graph is scheduled as a whole. The second interface, known as the CG Wormhole, also generates a stand-alone application. However, this interface isolates subsystems of the graph, using distinct schedulers on each side of the interface. CGWormholes are similar to the Ptolemy Wormhole construct, which is defined to be the interface between two different models of computation. CGWormholes, on the other hand, interface subgraphs which typically obey the same model of computation but are executed on different processors. The third type of interface is a Wormhole between code generation and simulation. We call this type of interface a CG-Sim-Wormhole. One of its primary uses is to embed actual hardware in high level simulations. The top-level simulation can any of the computation models available in Ptolemy, such as dynamic dataflow, discrete-event, or communicating processes.

We illustrate the use of these interfaces with a heterogeneous target that consists of a workstation in combination with several digital signal processor (DSP) cards: Ariel S-56X cards installed on the SBus of a SPARC workstation. Each DSP card has a single Motorola 56001 digital signal processor, a Xilinx programmable logic cell array, serial ports, and a DMA port. In this target, the workstation serves as the interface between the DSP cards and the user, the network, and other resources.

Automatic Code Generation for Heterogeneous Multiprocessors

[Next] [Previous] [Top]