Hierarchical Static Scheduling of Dataflow Graphs onto Multiple Processors
To schedule SDF graphs onto multiple processors, a directed acyclic graph (DAG) is constructed from the original SDF graph. The SDF graph exposes functional parallelism in the algorithm; the DAG in addition exposes the data parallelism available. The DAG for the graph of figure 1 is shown in figure 2. Notice that for each actor in the original SDF graph, there are multiple nodes in the DAG corresponding to the repetitions count derived from the balance equations.
Unfortunately, the expansion due to the repetition count of each SDF actor can lead to an exponential growth of nodes in the DAG. An SDF graph that exhibits this growth is shown in figure 3. The order of the number of nodes in the DAG is calculated to be O(MN). The growth is undesirable, especially considering that known optimal multiprocessor scheduling algorithms under precedence constraints have complexity that is exponential in the number of nodes in the DAG [8]. Most uniprocessor SDF schedulers, on the other hand, do not require a DAG to be generated for scheduling purposes.
To limit the explosion of nodes when translating an SDF graph into a DAG graph, we introduce clustering of connected subgraphs into larger grain "composite" actors. The composite actors will then be scheduled with one of the available uniprocessor schedulers. We cluster the actors in a manner that simplifies the DAG graph, without hiding much parallelism.
Many multiprocessor schedulers use clustering heuristics to schedule the DAG graph onto multiple processors [9-12]. It is important to note that each resultant cluster is mapped onto a single processor. The purpose of this paper is to introduce some simple clustering techniques that can be used directly on the SDF graph, thereby reducing the number of SDF nodes that are expanded in the final DAG graph. Clustering the SDF graph also gives us the opportunity to use specialized uniprocessor SDF schedulers. These uniprocessor schedulers can optimize for such parameters as code size and memory usage [13].