Annual Report
September 15, 1993 through September 15, 1994.
Title: "SYSTEM-LEVEL DESIGN METHODOLOGY FOR EMBEDDED SIGNAL PROCESSORS"
Contract Number: F33615-93-C-1317
Principal Investigator: Edward A. Lee
Organization: University of California at Berkeley
Current Project Participants at Berkeley:
1. Overview
The focus of this project is on design methodology for complex real-time
systems where a variety of design methodologies and implementation
technologies must be combined. Design methodologies are encapsulated in
one or more "models of computation", while implementation technologies are
implemented as synthesis tools. Applications that use more than one model
of computation and/or more than one synthesis tool are said to be
"heterogeneous".
The project aims to develop formal models for such heterogeneous systems, a
software environment for the design of such systems, and synthesis
technologies for implementation of such systems. In the latter category,
we are concentrating on problems not already well addressed elsewhere, such
as the synthesis of embedded software and the partitioning and scheduling
of heterogeneous parallel systems.
Our major accomplishments to date, in general terms, include:
A better formal understanding of the dataflow model of computation:
An object-oriented software system (called Ptolemy) that supports
multiple models of computation and implementation technologies:
Fundamental improvements in synthesis of heterogeneous systems:
The Ptolemy system serves as a very well-equipped laboratory for research
as well as repository for useful results. No small part of the effort is
devoted to maintaining and improving the software infrastructure.
Highlights of this work include:
2. Significant Accomplishments
2.1 Software Distribution
Three versions of Ptolemy have been released:
Version numbers begin with "0" to emphasize that this is research software,
not a commercial product. The "Ptiny" release is a demonstration subset
that is easy to install and requires much less disk space than the full
system. A number of our regular users started with this version.
Moreover, this version is designed to fully support our instructional
uses of Ptolemy.
Major features introduced in the 0.5 version include:
Major features introduced in the 0.5.1 version include:
All software and documentation is distributed with a very liberal
copyright agreement that encourages commercial use and redistribution.
2.2 Documentation and Public Information
With the objective of making Ptolemy more usable both within Berkeley and
outside, we completely rewrote the documentation. The 0.4.1 version had
been written using troff. The 0.5 version was converted to use
FrameMaker, and used a more tutorial, more narrative style with extensive
use of graphics. The complete manual, called "The Almagest", is divided
into four volumes:
The first two are intended for users who will not be writing code to
extend the system. The third is for users who will be writing new
functional blocks (called stars), and the fourth is for users who will be
extending the system in more fundamental ways, such as by adding new
models of computation or new synthesis tools.
Two additional volumes are available:
The first is a self-contained document for use with the demonstration
version of Ptolemy. The second is an incremental document that explains
the new features in the 0.5.1 release.
The User's manual and Kernel manual have both been converted to HTML for
on-line, hypertext access. Also providing improved on-line
documentation, two self-guided tours of Ptolemy are distributed with the
system:
- A "Quick Tour" takes the user through the features of the more mature
Ptolemy domains.
- A "What's New" tour guides the user through an overview of what has
been added in each new version of Ptolemy.
In addition to the Ptolemy documentation, we have created an extensive
hypertext document accessible on the worldwide web (URL:
http://ptolemy.eecs.berkeley.edu/). This document includes:
An extensive bibliography is also provided with an automated keyword
search mechanism for creating smaller specialized bibliographies.
A usenet newsgroup named comp.soft-sys.ptolemy has been created to
supplement the ptolemy-hackers mailing list, and the two have been linked.
2.3 Interactive, Direct-Manipulation Graphical User Interfaces
We have connected a Tcl kernel and its associated toolkit Tk to the Ptolemy
kernel. The GUI now uses this capability to achieve a concurrent,
event-driven interface compatible (mostly) with Motif. The GUI is
extensible even by casual users, allowing for customized, interactive,
animated simulations and prototypes of deployable user interfaces. Many
stars have been added to various domains to make use of the capability, and
animated demonstrations have been added to the distribution.
This work continues to be enhanced, with new visualization tools and new
mechanisms for sophisticated control of simulations being added all the
time.
2.4 Managing Complex, Heterogeneous Designs
We have created a Design Methodology Management (DMM) domain in Ptolemy to
be used to graphically manage design flows that involve a variety of tools
and modules working in concert. The domain has make-like semantics and
communication between primitives (stars) is via files. The objective is to
consider the design flow a first-class part of the design, having as much
importance as other design details. This work continues, and has not
yet been included in any release.
2.5 Interfacing to Hardware Design Systems
We have created two VHDL code-generation domains, called VHDLF and VHDLB.
The first of these uses homogeneous synchronous dataflow semantics to
describe signal processing systems at a functional level. The second uses
event-driven semantics to describe arbitrary systems at the behavioral
level. These domains have been used successfully already in industry, by a
startup company called DQDT (Dimensions in Quick Design Turnaround).
We have installed a Silage domain in main Ptolemy directory tree. This is
a code generation domain that couples to Prof. Rabaey's high-level
synthesis tool called Hyper. IMEC (in Belgium) has developed a tool, s2c,
which has an option to convert Silage programs to .pl files for inclusion
into Ptolemy as a single star. This program is publicly distributed.
Precedence, Inc. and Berkeley Design Technology, Inc. have committed to
integration of Ptolemy with Precedence's simulation backplane. This will
be carried out, with help from us if necessary, under the Martin Marrietta
RASSP project.
2.6 Heterogeneous Partitioning and Scheduling
We have made considerable progress in partitioning of systems for
hardware/software codesign. One automated method uses heuristic measures
of the goodness of fit of particular computations to the implementation
technology. We have also experimented with using the Tcl interface to
provide more user control over the partitioning.
In support of heterogeneous hardware, we have implemented a new "Target"
called "s56xwh" that defines a Sun workstation with the Ariel s56x DSP card
on its S bus. Using this, a code generation subsystem in Ptolemy can be
embedded within a simulation system transparently. This was demonstrated
at the RASSP conference in August.
The ordered transaction principle for low-overhead communication in
parallel heterogeneous systems continues to evolve. This communication
mechanism takes advantage of approximate compile-time predictability in an
algorithm. Transactions between processors are fully synchronized, as if
semaphores were being used, but without the overhead of semaphores. We
have been experimenting with a prototype of this architecture that we
constructed last year using four Motorola DSP96000 processors. The CG96
domain in Ptolemy generates code for this multiprocessor system, fully
automating the partitioning and scheduling, using synchronous dataflow
semantics. In particular, we have developed a mechanism for altering
schedules so that if execution times are predictable, absolutely no
performance penalty is incurred from ordered transaction execution compared
to self-timed execution.
2.7 Heterogeneous System Design
We have prototyped a heterogeneous scheduling framework in which a dataflow
graph is divided into subgraphs through a clustering procedure. A
parallelizing scheduler then partitions the clustered graph for parallel
execution, while a loop scheduler handles the software synthesis within
each cluster. We believe that this will solve a critical problem that we
had encountered with our parallel schedulers with applications involving
large sample-rate changes, namely that the code size of the synthesized
software was often very large. Being able to use loop scheduling
techniques greatly reduces the code size.
We have prototyped an environment for modeling candidate parallel
architectures at a high level for implementing signal processing
applications. The strategy is to describe the hardware architecture
using the CP (communicating processes domain), where the processor nodes
execute C programs synthesized from SDF (synchronous dataflow)
descriptions of signal processing applications. The basic C code
generation mechanism is being adapted for this by designing a specialized
target that can handle the partitioning and generate performance estimates.
2.8 Formal methods
In what could be a significant breakthrough in our formal understanding of
block-diagram languages, we have established a formal connection between
so-called "Kahn process networks" and our style of dataflow graphs. A
consequence of this is that we can rigorously prove necessary conditions
for determinacy for arbitrary dataflow graphs. It turns out, much to our
surprise, that the dataflow mechanism we use in our dynamic dataflow domain
guarantees determinacy if the actors are functional. This happened by
applying common sense to the design, rather than theory, but we now have
the theory.
This theory has already had an impact on the multithreaded dataflow model
of computation, which we now recognize to be a process network model, which
is a generalization of dataflow. This allows us to precisely
characterize the situations under which its more general capabilities would
be required.
We have developed a mathematical test for synchronous dataflow graphs that
determines whether a finite blocking factor can achieve maximal parallelism
in a blocked schedule. We expect to incorporate this into parallel
schedulers in order to enhance the parallelism in algorithms.
We have developed an optimal scheduling technique for chain-structured
synchronous dataflow graphs that produces single-appearance schedules with
minimal use of memory for buffering of data. The optimal algorithm is a
dynamic programming algorithm with complexity that is order N^3. We have
also developed an order N^2 heuristic that works well for a large number of
randomly generated test cases. Both techniques have a sizable practical
impact for programs that involve non-trivial sample-rate conversions. We
have demonstrated this with a test program that converts compact disc
recordings (at 44.1 kHz) to DAT recordings (at 48 kHz).
2.9 Algorithm representation
A higher-order functions (HOF) domain has been completed and incorporated
as a subdomain of SDF (synchronous dataflow) and DE (discrete event). This
is expected to have a major impact on the usability of visual (graphical)
system representations for large systems. We have developed (with help
from Thomson CSF) a variety of radar applications using these HOF
capabilities. We believe that the resulting system representations are
much more intuitive and maintainable than the traditional techniques based
on multidimensional arrays (using up to seven dimensions).
We have created a link between Matlab and Ptolemy so that stars can have
their functionality expressed as Matlab functions. One major impact of
this is that the full suite of graphical signal display facilities in
Matlab are now available under Ptolemy. Moreover, quick algorithmic
prototyping can now be done with an arbitrary mixture of Matlab
(imperative, matrix-oriented) code, and block-diagram (declarative,
signal-oriented) code.
We have built in to the Ptolemy kernel a set of matrix classes. Stars can
now exchange data in the form of matrices, in addition to scalars. The
matrix classes have a set of basic operations, such as matrix inversion,
multiplication, etc., built in to the class itself. Consequently, users of
the classes can specify their matrix operations at a high level. A set of
stars that operate on matrices have been written, together with
demonstrations of each. The set of demonstration applications includes a
Kalman filter implementation and an implementation of the MUSIC algorithm
for detection of sinusoids in noise.
We have completed an experimental "multidimensional dataflow" domain, where
arcs that connect blocks represent not simple sequences of tokens, but
rather two-dimensional orderings of tokens. This domain is well matched to
multidimensional signal processing and is capable of representing a broader
range of algorithms with static flow of control than the synchronous
dataflow model. The real potential, however, is in parallel computation,
because the model of computation exposes much more parallelism at a much
finer granularity than the SDF model. The domain has a rich enough set of
stars to be usable for experimentation. We hope soon to begin
experimenting with parallel scheduling by interfacing to our existing
parallel schedulers.
A number of stars and subsystems (galaxies) specialized to radar
applications, vector quantization, and chaos research have been created
and added to the Ptolemy system.
2.10 System-Level Modeling of Systems
We have developed a "communicating processes" (CP) domain in Ptolemy. This
domain has been used extensively for high-level modeling of a wireless
multimedia network.
We have completed a "message queue" (MQ) domain, which is an experimental
domain that models systems with highly dynamic topologies, such as
telecommunications switch software.
We have developed a process network (PN) domain working with Awesime,
a threaded task library covered by the GNU public license. Awesime is
a multitasking system that runs on a wide variety of platforms. This
domain is a superset of all existing dataflow domains. We expect that
the PN domain will have significant advantages for complex real-time
applications. For use in this domain, we have adapted results from
preemptive rate-monotonic scheduling theory to non-preemptive
scheduling of real-time signal processing systems.
2.11 Synthesis of Embedded Software and Firmware
The boolean dataflow (BDF) domain has been included in Ptolemy. It is
able to statically schedule most Dynamic Data Flow graphs.
We have made significant extensions to the "ptlang" preprocessor that make
specification of complicated conditional code generation stars much easier.
It permits free intermixing of code to be generated, and code that controls
the generation.
2.12 Technical improvements to Ptolemy
In addition to the above, a number of technical improvements made to the
Ptolemy system during the last year:
2.13 Miscellaneous
- We have installed and learned to use a suite of Mentor tools,
including the DSP station.
- We have installed (but not yet exercised extensively) a suite of
Synopsys tools.
- We have installed ADEPT (from the UVA RASSP project), and are
starting to learn to use it.
- We have installed Vantage VHDL and are evaluating it for use
with VHDL code generation domains.
3. Next Period Activities
The top priority topics that we will be addressing over the next year are:
4. Acknowledgements
We would like to give special thanks to the following people outside
Berkeley who contributed in major ways to the results reported above:
(Apologies to anyone we forgot to mention).
5. Papers
- M. J. Chen, "Developing a Multidimensional Synchronous Dataflow
Domain in Ptolemy", ERL Technical Report, UCB/ERL No.94/16,
Master's Report, May, 1994.
- S.-I. Shih, "Code Generation for VSP Software Tool in Ptolemy",
MS Report, Plan II, ERL Technical Report (number pending)
UC Berkeley, May 25, 1994, Master's report.
- S.Bhattacharyya, E.A. Lee, "Scheduling Synchronous Dataflow Graphs for
Efficient Looping", Journal of VLSI Signal Processing, 6, pp271-
288, December 1993.
- A. Kalavade, E.A. Lee, "Manifestations of Heterogeneity in Hardware/Software
Codesign", Proc. of DAC-94, San Diego, CA, June, 1994.
- J.L. Pino, T.M. Parks, and E.A. Lee, "Automatic Code Generation for
Heterogeneous Multiprocessors," ICASSP '94, Adelaide, Australia,
April 19-22, 1994, vol. II, pp. 445-448.
- P. Murthy, S. Bhattacharyya, and E.A. Lee, "Minimizing Memory
Requirements for Chain-Structured Synchronous Dataflow Programs,"
ICASSP '94, Adelaide, Australia, April 19-22, 1994, vol. II, pp. 453-456.
- E.A. Lee, "Computing and Signal Processing: An Experimental
Multidisciplinary Course", ICASSP '94, Adelaide, Australia, April 19-22,
1994, vol. VI, pp. 45-48.
- P. Murthy, E. A. Lee, "On the Optimal Blocking Factor for Blocked,
Non-Overlapped Schedules,"ERL Technical Report (number pending)
UC Berkeley, June 6, 1994. Also, invited to Asilomar '94.
- B.L. Evans, R.H. Bamberger, and J.H. McClellan, "Rules
for Multidimensional Multirate Structures", IEEE Transactions
on Signal Processing, vol. 42, no. 4, pp. 762-771, April, 1994.
- B.L. Evans, T.R. Gardos, and J.H. McClellan, "Imposing Structure
on Smith Form Decompositions of Rational Resampling Matrices",
IEEE Transactions on Signal Processing, vol. 42, no. 4,
pp. 970-973, April, 1994.
- B.L. Evans, F.A. Sakarya, "Interactive Graphical Design of Two-
Dimensional Compression Systems", Proceedings of Second National
Workshop on Signal Processing, Marman, Turkey, April, 1994.
- Shuvra Shikhar Bhattacharyya, Compiling Dataflow Programs for
Digital Signal Processing, Ph.D. Thesis, June, 1994.
- Jose L. Pino, Thomas M. Marks, and Edward A. Lee, "Mapping Multiple
Independent Synchronous Dataflow Graphs onto Heterogeneous
Multiprocessors," Asilomar '94.
- Brian L. Evans and James H. McClellan, "Algorithms for Symbolic
Linear Convolution", Asilomar '94.
- Brian L. Evans, Steve X. Gu, and Robert H. Bamberger, "Interactive
Solution Sets as Components of Fully Electronic Signals and Systems
Courseware," Asilomar '94.
- Brian L. Evans, Juergen Teich, and Christian Schwarz, "Automated
Design of Two-Dimensional Rational Decimation Systems," Asilomar '94.
- Brian L. Evans and Juergen Teich, "Families of Smith Form Decompositions
to Simplify Multidimensional Filter Bank Design," Asilomar '94.
- Brian L. Evans, Douglas R. Firth, Kennard D. White, and Edward A. Lee,
``Automatic Generation of Programs That Jointly Optimize Characteristics Of
Analog Filter Designs'', submitted to the Proceedings of the IEEE.
- Asawaree Kalavade and Edward A. Lee, "A Global Criticality/Local
Phase Driven Algorithm for the Constrained Hardware/Software
Partitioning Problem," Proc. of the Codesign Workshop,
Grenoble, France, September, 1994.
- M. J. Chen and E. A. Lee, "Design and Implementation of a
Multidimensional Synchronous Dataflow Environment", Asilomar '94.
- S. Sriram and E. A. Lee, "Static Scheduling of Communication
Resources in Multiprocessor DSP Architectures", Asilomar '94.
- Edward A. Lee, "Dataflow Process Networks," UCB/ERL memorandum
number 94/53, Electronics Research Laboratory, Univ. of California,
Berkeley, CA 94720.
- Karim P. Khiar and E. A. Lee, "Modeling Radar Systems Using
Hierarchical Dataflow," paper proposal sent to ICASSP '95.
- Jose Luis Pino and E. A. Lee, "Hierarchical Static Scheduling
of Dataflow Graphs onto Multiple Processors," paper proposal
sent to ICASSP '95.
- T. M. Parks and E. A. Lee, "Non-Preemptive Real-Time Scheduling
of Dataflow Systems," paper proposal sent to ICASSP '95.
- A. Kalavade, B. Evans, J. Pino, and E. A. Lee, "Managing Complexity
in Heterogeneous Specification, Simulation, and Synthesis,"
invited paper, ICASSP '95.
- S. Sriram and E. A. Lee, "Scheduling Communication Resources in
Statically Scheduled Multiprocessor Architectures," submitted to
IEEE Trans. on Parallel and Distributed Computing.
Back to the Ptolemy RASSP home page.
Last updated 11/08/96.
Send comments to
www@ptolemy.eecs.berkeley.edu.