15.5 UniProcessor

Class UniProcessor simulates a single processing element, or a processor. It is derived from class DoubleLinkList to hold the list of ParNodes from parallel scheduling. Class NodeSchedule is derived from class DoubleLink to register a ParNode into the DoubleLinkList.

A UniProcessor keeps two target pointers: one for multiprocessor target (mtarget), and the other for the processor (targetPtr). They are both protected members.

MultiTarget* mtarget; 
CGTarget* targetPtr;

The pointer to the processor can be obtained by a public method:

CGTarget* target();

The pointers to the multiprocessor target and to the ParProcessors class that this UniProcessor belongs to, are set by the following method:

void setTarget(MultiTarget* t, ParProcessors* p);

15.5.1 Class NodeSchedule

A NodeSchedule is an object to link a ParNode to a linked list. It indicates whether the node represents an idle time slot or not. It also contains the duration of the node. There is no protected member in this class.

void resetMembers(); 
void setMembers(ParNode* n, int dur);

These methods set the information for the associated node: the pointer to the node, duration, and a flag to tell whether it is an idle node or not. In the first method, the idle flag is set FALSE. The constructor also resets all internal information of the class.

ParNode getNode();
int getDuration();
int isIdleTime();

The above methods return the pointer to the node associated with this class, its duration, and the flag to say TRUE if it represents an idle time slot.

NodeSchedule* nextLink(); 
NodeSchedule* previousLink();

These methods return the next and the previous link in the linked list.

15.5.2 Members for scheduling

Since a list scheduling (with fixed assignment) will be performed as the last stage of all scheduling algorithms in Ptolemy , basic methods for list scheduling are defined in the UniProcessor class. In list scheduling, we need the available time of the processor.

int availTime;

Is a protected member to indicate the time when the processor available. There are public methods to access this member:

void setAvailTime(int t);
int getAvailTime(); 
NodeSchedule* curSchedule;

This protected member points to the NodeSchedule appended last to the linked list. There are two public methods to access a NodeSchedule:

NodeSchedule* getCurSchedule(); 
NodeSchedule* getNodeSchedule(ParNode* n);

The first method just returns curSchedule member while the second one returns the NodeSchedule associated with the argument ParNode.

When a ParNode is runnable earlier than the available time of the processor, we want to check whether there is an idle slot before availTime to fit the ParNode in the middle of the schedule:

int filledInIdleSlot(ParNode*, int start, int limit = 0);

The first two arguments are the pointer to the ParNode to be scheduled and the earliest time when the node can be scheduled. Without the third argument given explicitly, this method returns the earliest time that the processor is available to schedule the node. If the third argument is given, the available time of the processor should be less than limit. If this method could not find an idle slot to schedule the node, it returns -1. Otherwise, it returns the possible scheduling time of the node.

int schedInMiddle(ParNode* n, int when, int);

Schedule the node, n, at when inside an idle-time slot of the processor. The third argument indicates the duration of the node. This method returns the completion time of the schedule if scheduling is succeeded. If it fails to find an idle-time slot at when to accommodate the node, it returns -1.

If a node is to be appended at the end of the schedule in a processor,

void appendNode(ParNode* n, int val);

Does that blindly. To schedule a non-idle node, we have to use

int schedAtEnd(ParNode* n, int start, int leng);

In case start is larger than the processor available time, this method put an idle time slot in the processor and calls appendNode. And, it sets the schedule information of the node, and increases availTime of the processor.

void addNode(ParNode* node, int start);

This method is given a ParNode and its runnable time, and schedule the node either inside an idle time slot if possible, or at the end of the schedule list. The methods described above are used in this method.

void scheduleCommNode(ParNode* n, int start);

When we schedule the given communication node, n, available at start, we also have to check the communication resource whether the resources are available or not. For that purpose, we detect the time slot in this processor to schedule the node, and check whether the same time slot is available in the communication resources: we use scheduleComm of the multiprocessor target to check it. If we find a time slot available for this processor and the communication resources, we schedule the communication node.

int getStartTime();

Returns the earliest time when the processor is busy.

All methods described in this sub-section are public.

15.5.3 Sub-Universe creation

After scheduling is performed, a processor is given a set of assigned ParNodes. Using the scheduling result, we will generate code for the target processor. To generate code for the assigned nodes, we need to allocate resources for the nodes and examine the connectivity of the nodes in the original graph. These steps are common to the generic routine for single processor code generation in which a processor target is given a galaxy. Therefore, we want to create a sub-galaxy that consists of stars of which any invocation is assigned to the processor. Note that a sub-galaxy is NOT a subgraph of the original SDF graph. Besides the stars in the original program graph, we include other stars such as communication stars and spread/collect stars. This subsection will explain some details of sub-universe creation.

void createSubGal();

Is the main public method for sub-universe creation. It first creates a galaxy data structure, subGal, a private member. Then, it clones stars of at least one of whose invocations is assigned to the processor. Make a linked list for all assigned invocations (nodes) of each star in order of increasing invocation number, and set the pointer of cloned star. As for a wormhole, we create a CGWormStar instead of cloning the wormhole. A CGWormStar class will replace a wormhole in the sub-universe. If an original star is not supported by the processor target (for example, with heterogeneous scheduling ), we create a star with the same name as the original star in the target domain.

In the next step, we connect the cloned stars by referring to the original galaxy. If a star is at the wormhole boundary in the original graph, we connect the cloned star to the same event horizon; by doing so the wormhole in the original graph is connected to the sub-universe. If the star at the wormhole boundary is scheduled on more than one processors (or not all invocations are assigned to the same processor), the wormhole in the original graph will be connected to the last created sub-universe.

If an arc connects two stars who have some invocations assigned to the same processor, we examine whether all of the two stars' invocations are assigned to the same processor. If they are, we just connect the cloned stars in the sub-universe. If they aren't, we have a cloned star of either one star whose invocations are assigned to the current sub-universe. In this case, we create a send star (createSend method of the multiprocessor target) or a receive star (createReceive of the target), based on whether the cloned star is a source or destination of the connection . We create a communication star and set the communication star pointer of the communication nodes in the APEG , by matchCommNodes method of ParProcessors class. If the partner communication star was already created in another sub-universe, we pair the send and receive stars by pairSendReceive method of the multiprocessor target .

The last case is when an arc connects two stars whose invocations are distributed over more than one processor. If no invocation of the destination star is assigned to this processor, we call the makeBoundary method.

void makeBoundary(ParNode* src, PortHole* orgP);

The first argument points to the earliest invocation of the star assigned to this processor, and the second is the pointer to the output porthole in the original SDF graph as the source of the connection. We examine all assigned invocations to check whether the destination invocations are assigned to the same processor. If they are, we create one send star and connect it to the cloned star. Otherwise, we create a Spread star to distribute the output samples to multiple processors. We connect the cloned star to the Spread star, and the Spread star to multiple send stars.

Otherwise, we call the makeConnection method.

void makeConnection(ParNode* dest, ParNode* src, PortHole* ref, ParNode* firstS);

The first argument is the pointer to the first assigned invocation of the destination star while the second one is the source node connected to the first argument. The third argument is the pointer to the destination porthole of the connection in the original graph. The last argument is the pointer to the first invocation of the source star. Note that the last argument node may not be assigned to the current processor. This method examines all assigned invocations of the destination star to identify the sources of the samples to be consumed by the cloned star in this processor. If the number of sources are more than one, we create a Collect star and connect the Collect star to the cloned destination star in the sub-universe. For each source in the other processors, we create a receive star and connect it to the Collect star. Similarly, we examine all assigned invocations of the source star to identify the destinations of the samples to be produced by the source star in this processor. If the number of destinations is more than one, we create a Spread star and connect it to the source star. For each destination in the other processors, we create a send star and connect it to the Spread star. As a result, it may occur that to connect two stars in the sub-universe, we need to splice a Spread and a Collect star on that connection.

Spread and Collect stars

A Spread or a Collect star is created by createSpread or createCollect method of the multiprocessor target. The following illustrates when we need to create a Spread or a Collect star in a sub-universe.

Suppose we have star A connected to star B in original graph. Star A produces two samples and star B consumes one. Then, one invocation of star A is connected to two invocations of star B. If one invocation of star A and only one of invocation of star B are assigned to the current processor. Then, we need to connect the cloned stars of A and B in the sub-universe. We can not connect stars A and B in the sub-universe directly since among two samples generated by star A, one sample should be transferred to another processor through a send star. In this case, we connect a Spread star to star A, and one send star and star B to the Spread star in the sub-universe. Then, star A produces two samples to the Spread star while the Spread star spread the incoming two samples to the send star and star B one sample each. The number of output portholes and the sample rate of each porthole are determined during the sub-universe creation. If there is no initial delay on the connection and neither star A nor B needs to access the past samples, the Spread star does not imply additional copying of data in the generated code.

15.5.4 Members for code generation

void prepareCodeGen();

This method performs the following tasks before generating code.

(1) Initialize the sub-universe, which also initialize the cloned stars.

(2) Convert a schedule (or a linked list of ParNodes) obtained from the parallel scheduler to a SDFSchedule class format (list of stars). The code generation routine of the processor target assumes the SDFSchedule class as the schedule output.

(3) Simulate the schedule to compute the maximum number of samples to be collected at runtime if we follow the schedule. This information is used to determine the buffer size to be allocated to the arcs.

void simRunSchedule();

Performs the step (3). It is a protected member.

StringList& generateCode();

This method generate code for the processor target by calling targetPtr->generateCode() .

int genCodeTo(Target* t);

This method is used to insert the code of the sub-universe to the argument target class. It performs the almost same steps as prepareCodeGen and then calls insertGalaxyCode of the processor target class instead of generateCode method.

15.5.5 Other UniProcessor protected members

There are a set of methods to manage NodeSchedule objects to minimize the runtime memory usage as well as execution time.

void putFree(NodeSchedule* n); 
NodeSchedule* getFree();
void clearFree();

If a NodeSchedule is removed from the list, it is put into a pool of NodeSchedule objects. When we need a NodeSchedule object, a NodeSchedule in the pool is extracted and initialized. We deallocate all NodeSchedules in the pool by the third method.

void removeLink(NodeSchedule* x);

Removes the argument NodeSchedule from the scheduled list.

int sumIdle;

Indicates the sum of the idle time slots after scheduling is completed. The value is valid only after display method is called.

15.5.6 Other UniProcessor public members

There are a constructor with no argument to initialize all data members and a destructor to delete subGal and to delete all NodeSchedule objects associated with this processor.

Galaxy* myGalaxy();
int myId();
DoubleLinkList :: size; 
int getSumIdle();

The above methods return the pointer to the sub-universe, the index of the current processor, the number of scheduled nodes, and the sum of idle time after scheduling is completed (or sumIdle).

void initialize();

This method puts all NodeSchedules in the list to the pool of free NodeSchedules and initialize protected members.

void copy(UniProcessor* org);

Copies the schedule from the argument UniProcessor to the current processor.

StringList displaySubUniv(); 
StringList display(int makespan);
int writeGantt(ostream& os, const char* universe, int numProcs, int span);

The first method display the sub-universe. The second method displays the schedule in the textual form while the third one forms a string to be used by Gantt-chart display program.

15.5.7 Iterator for UniProcessor

Class ProcessorIter is the iterator for UniProcessor class. A NodeSchedule object is returned by next and ++ operator.

ParNode* nextNode();

Returns the ParNode in the list.

ptolemy@eecs.berkeley.edu