Annual Report

September 15, 1993 through September 15, 1994.

Title: "SYSTEM-LEVEL DESIGN METHODOLOGY FOR EMBEDDED SIGNAL PROCESSORS"
Contract Number: F33615-93-C-1317
Principal Investigator: Edward A. Lee
Organization: University of California at Berkeley

Current Project Participants at Berkeley:

  • Wan-Teh Chang
  • Brian L. Evans
  • Christopher Hylands
  • Sangjin Hong
  • Asawaree Kalavade
  • Alan Kamas
  • Allen Lao
  • Bilung Lee
  • Edward A. Lee
  • David G. Messerschmitt
  • Praveen Murthy
  • Thomas M. Parks
  • Jose' Luis Pino
  • Farhana Sheikh
  • S. Sriram
  • Juergen Teich
  • Michael C. Williamson
  • Mei Xiao
  • 1. Overview

    The focus of this project is on design methodology for complex real-time systems where a variety of design methodologies and implementation technologies must be combined. Design methodologies are encapsulated in one or more "models of computation", while implementation technologies are implemented as synthesis tools. Applications that use more than one model of computation and/or more than one synthesis tool are said to be "heterogeneous".

    The project aims to develop formal models for such heterogeneous systems, a software environment for the design of such systems, and synthesis technologies for implementation of such systems. In the latter category, we are concentrating on problems not already well addressed elsewhere, such as the synthesis of embedded software and the partitioning and scheduling of heterogeneous parallel systems.

    Our major accomplishments to date, in general terms, include:

    A better formal understanding of the dataflow model of computation:

  • A dataflow process network formalism.
  • A multidimensional dataflow model for parallel algorithm development.
  • A variety of scheduling methodologies for dataflow systems.
  • Higher-order functions for visual dataflow languages.
  • Interaction with other models of computation.
  • An object-oriented software system (called Ptolemy) that supports multiple models of computation and implementation technologies:

  • Dataflow and process network software environments.
  • Discrete-event simulation.
  • Design flow management.
  • Synthesis of embedded software.
  • Synthesis of hardware specifications.
  • Fundamental improvements in synthesis of heterogeneous systems:

  • Partitioning and scheduling techniques.
  • Synthesis of embedded software.
  • A methodology for managing heterogeneous computational hardware.
  • The Ptolemy system serves as a very well-equipped laboratory for research as well as repository for useful results. No small part of the effort is devoted to maintaining and improving the software infrastructure. Highlights of this work include:

  • High quality documentation.
  • An interactive, animated user interface.
  • Carefully designed object-oriented infrastructure.
  • 2. Significant Accomplishments

    2.1 Software Distribution

    Three versions of Ptolemy have been released:

  • 0.5 (February 1994)
  • Ptiny 0.5 (April 1994)
  • 0.5.1 (September 1994)
  • Version numbers begin with "0" to emphasize that this is research software, not a commercial product. The "Ptiny" release is a demonstration subset that is easy to install and requires much less disk space than the full system. A number of our regular users started with this version. Moreover, this version is designed to fully support our instructional uses of Ptolemy.

    Major features introduced in the 0.5 version include:

  • Greatly improved documentation (see below).
  • Extensible, animated, interactive GUI based on Tcl/Tk.
  • The boolean dataflow domain.
  • VHDL code generation.
  • Silage code generation.
  • Fast discrete-event scheduling.
  • A communicating processes domain for event-driven simulation of hardware systems.
  • Fixed-point simulation.
  • Matrix data types and functional blocks.
  • Major features introduced in the 0.5.1 version include:

  • A Matlab interface.
  • Higher-order functions.
  • Multidimensional synchronous dataflow.
  • Initializable delays.
  • All software and documentation is distributed with a very liberal copyright agreement that encourages commercial use and redistribution.

    2.2 Documentation and Public Information

    With the objective of making Ptolemy more usable both within Berkeley and outside, we completely rewrote the documentation. The 0.4.1 version had been written using troff. The 0.5 version was converted to use FrameMaker, and used a more tutorial, more narrative style with extensive use of graphics. The complete manual, called "The Almagest", is divided into four volumes:

  • The User's Manual
  • The Star Atlas
  • The Programmer's Manual
  • The Kernel Manual
  • The first two are intended for users who will not be writing code to extend the system. The third is for users who will be writing new functional blocks (called stars), and the fourth is for users who will be extending the system in more fundamental ways, such as by adding new models of computation or new synthesis tools.

    Two additional volumes are available:

  • The Ptiny Almagest
  • The 0.5.1 Addendum
  • The first is a self-contained document for use with the demonstration version of Ptolemy. The second is an incremental document that explains the new features in the 0.5.1 release.

    The User's manual and Kernel manual have both been converted to HTML for on-line, hypertext access. Also providing improved on-line documentation, two self-guided tours of Ptolemy are distributed with the system:

    In addition to the Ptolemy documentation, we have created an extensive hypertext document accessible on the worldwide web (URL: http://ptolemy.eecs.berkeley.edu/). This document includes:

  • A brief overview of Ptolemy.
  • A mockup of some Ptolemy demos.
  • Access to the released software and all documentation.
  • Access to many of our published papers.
  • On-line access to the User's Manual.
  • On-line access to the Kernel Manual.
  • Biographies of many of the project participants.
  • An extensive bibliography is also provided with an automated keyword search mechanism for creating smaller specialized bibliographies.

    A usenet newsgroup named comp.soft-sys.ptolemy has been created to supplement the ptolemy-hackers mailing list, and the two have been linked.

    2.3 Interactive, Direct-Manipulation Graphical User Interfaces

    We have connected a Tcl kernel and its associated toolkit Tk to the Ptolemy kernel. The GUI now uses this capability to achieve a concurrent, event-driven interface compatible (mostly) with Motif. The GUI is extensible even by casual users, allowing for customized, interactive, animated simulations and prototypes of deployable user interfaces. Many stars have been added to various domains to make use of the capability, and animated demonstrations have been added to the distribution.

    This work continues to be enhanced, with new visualization tools and new mechanisms for sophisticated control of simulations being added all the time.

    2.4 Managing Complex, Heterogeneous Designs

    We have created a Design Methodology Management (DMM) domain in Ptolemy to be used to graphically manage design flows that involve a variety of tools and modules working in concert. The domain has make-like semantics and communication between primitives (stars) is via files. The objective is to consider the design flow a first-class part of the design, having as much importance as other design details. This work continues, and has not yet been included in any release.

    2.5 Interfacing to Hardware Design Systems

    We have created two VHDL code-generation domains, called VHDLF and VHDLB. The first of these uses homogeneous synchronous dataflow semantics to describe signal processing systems at a functional level. The second uses event-driven semantics to describe arbitrary systems at the behavioral level. These domains have been used successfully already in industry, by a startup company called DQDT (Dimensions in Quick Design Turnaround).

    We have installed a Silage domain in main Ptolemy directory tree. This is a code generation domain that couples to Prof. Rabaey's high-level synthesis tool called Hyper. IMEC (in Belgium) has developed a tool, s2c, which has an option to convert Silage programs to .pl files for inclusion into Ptolemy as a single star. This program is publicly distributed.

    Precedence, Inc. and Berkeley Design Technology, Inc. have committed to integration of Ptolemy with Precedence's simulation backplane. This will be carried out, with help from us if necessary, under the Martin Marrietta RASSP project.

    2.6 Heterogeneous Partitioning and Scheduling

    We have made considerable progress in partitioning of systems for hardware/software codesign. One automated method uses heuristic measures of the goodness of fit of particular computations to the implementation technology. We have also experimented with using the Tcl interface to provide more user control over the partitioning.

    In support of heterogeneous hardware, we have implemented a new "Target" called "s56xwh" that defines a Sun workstation with the Ariel s56x DSP card on its S bus. Using this, a code generation subsystem in Ptolemy can be embedded within a simulation system transparently. This was demonstrated at the RASSP conference in August.

    The ordered transaction principle for low-overhead communication in parallel heterogeneous systems continues to evolve. This communication mechanism takes advantage of approximate compile-time predictability in an algorithm. Transactions between processors are fully synchronized, as if semaphores were being used, but without the overhead of semaphores. We have been experimenting with a prototype of this architecture that we constructed last year using four Motorola DSP96000 processors. The CG96 domain in Ptolemy generates code for this multiprocessor system, fully automating the partitioning and scheduling, using synchronous dataflow semantics. In particular, we have developed a mechanism for altering schedules so that if execution times are predictable, absolutely no performance penalty is incurred from ordered transaction execution compared to self-timed execution.

    2.7 Heterogeneous System Design

    We have prototyped a heterogeneous scheduling framework in which a dataflow graph is divided into subgraphs through a clustering procedure. A parallelizing scheduler then partitions the clustered graph for parallel execution, while a loop scheduler handles the software synthesis within each cluster. We believe that this will solve a critical problem that we had encountered with our parallel schedulers with applications involving large sample-rate changes, namely that the code size of the synthesized software was often very large. Being able to use loop scheduling techniques greatly reduces the code size.

    We have prototyped an environment for modeling candidate parallel architectures at a high level for implementing signal processing applications. The strategy is to describe the hardware architecture using the CP (communicating processes domain), where the processor nodes execute C programs synthesized from SDF (synchronous dataflow) descriptions of signal processing applications. The basic C code generation mechanism is being adapted for this by designing a specialized target that can handle the partitioning and generate performance estimates.

    2.8 Formal methods

    In what could be a significant breakthrough in our formal understanding of block-diagram languages, we have established a formal connection between so-called "Kahn process networks" and our style of dataflow graphs. A consequence of this is that we can rigorously prove necessary conditions for determinacy for arbitrary dataflow graphs. It turns out, much to our surprise, that the dataflow mechanism we use in our dynamic dataflow domain guarantees determinacy if the actors are functional. This happened by applying common sense to the design, rather than theory, but we now have the theory.

    This theory has already had an impact on the multithreaded dataflow model of computation, which we now recognize to be a process network model, which is a generalization of dataflow. This allows us to precisely characterize the situations under which its more general capabilities would be required.

    We have developed a mathematical test for synchronous dataflow graphs that determines whether a finite blocking factor can achieve maximal parallelism in a blocked schedule. We expect to incorporate this into parallel schedulers in order to enhance the parallelism in algorithms.

    We have developed an optimal scheduling technique for chain-structured synchronous dataflow graphs that produces single-appearance schedules with minimal use of memory for buffering of data. The optimal algorithm is a dynamic programming algorithm with complexity that is order N^3. We have also developed an order N^2 heuristic that works well for a large number of randomly generated test cases. Both techniques have a sizable practical impact for programs that involve non-trivial sample-rate conversions. We have demonstrated this with a test program that converts compact disc recordings (at 44.1 kHz) to DAT recordings (at 48 kHz).

    2.9 Algorithm representation

    A higher-order functions (HOF) domain has been completed and incorporated as a subdomain of SDF (synchronous dataflow) and DE (discrete event). This is expected to have a major impact on the usability of visual (graphical) system representations for large systems. We have developed (with help from Thomson CSF) a variety of radar applications using these HOF capabilities. We believe that the resulting system representations are much more intuitive and maintainable than the traditional techniques based on multidimensional arrays (using up to seven dimensions).

    We have created a link between Matlab and Ptolemy so that stars can have their functionality expressed as Matlab functions. One major impact of this is that the full suite of graphical signal display facilities in Matlab are now available under Ptolemy. Moreover, quick algorithmic prototyping can now be done with an arbitrary mixture of Matlab (imperative, matrix-oriented) code, and block-diagram (declarative, signal-oriented) code.

    We have built in to the Ptolemy kernel a set of matrix classes. Stars can now exchange data in the form of matrices, in addition to scalars. The matrix classes have a set of basic operations, such as matrix inversion, multiplication, etc., built in to the class itself. Consequently, users of the classes can specify their matrix operations at a high level. A set of stars that operate on matrices have been written, together with demonstrations of each. The set of demonstration applications includes a Kalman filter implementation and an implementation of the MUSIC algorithm for detection of sinusoids in noise.

    We have completed an experimental "multidimensional dataflow" domain, where arcs that connect blocks represent not simple sequences of tokens, but rather two-dimensional orderings of tokens. This domain is well matched to multidimensional signal processing and is capable of representing a broader range of algorithms with static flow of control than the synchronous dataflow model. The real potential, however, is in parallel computation, because the model of computation exposes much more parallelism at a much finer granularity than the SDF model. The domain has a rich enough set of stars to be usable for experimentation. We hope soon to begin experimenting with parallel scheduling by interfacing to our existing parallel schedulers.

    A number of stars and subsystems (galaxies) specialized to radar applications, vector quantization, and chaos research have been created and added to the Ptolemy system.

    2.10 System-Level Modeling of Systems

    We have developed a "communicating processes" (CP) domain in Ptolemy. This domain has been used extensively for high-level modeling of a wireless multimedia network.

    We have completed a "message queue" (MQ) domain, which is an experimental domain that models systems with highly dynamic topologies, such as telecommunications switch software.

    We have developed a process network (PN) domain working with Awesime, a threaded task library covered by the GNU public license. Awesime is a multitasking system that runs on a wide variety of platforms. This domain is a superset of all existing dataflow domains. We expect that the PN domain will have significant advantages for complex real-time applications. For use in this domain, we have adapted results from preemptive rate-monotonic scheduling theory to non-preemptive scheduling of real-time signal processing systems.

    2.11 Synthesis of Embedded Software and Firmware

    The boolean dataflow (BDF) domain has been included in Ptolemy. It is able to statically schedule most Dynamic Data Flow graphs.

    We have made significant extensions to the "ptlang" preprocessor that make specification of complicated conditional code generation stars much easier. It permits free intermixing of code to be generated, and code that controls the generation.

    2.12 Technical improvements to Ptolemy

    In addition to the above, a number of technical improvements made to the Ptolemy system during the last year:

  • Ports to Cfront compilers, the Solaris 2.3 operating system, and the following platforms: Sun, DEC (mips), HP, Silicon Graphics, Linux-PC, IBM R6000.
  • Dynamic linking on all platforms.
  • Fixed-point class, stars, and demos.
  • Purified the code to eliminate memory leaks.
  • Initializable delays in dataflow domains.
  • Faster simulations by disabling the event loop.
  • Faster discrete-event scheduler.
  • Compilation of octtools with gcc.
  • Update of the StringArrayState class to support input from files.
  • Improved recursion in dataflow domains.
  • Bidirectional portholes support in ptlang.
  • Improved base classes for code generation.
  • Rewritten makefiles for the octtools to conform to the Ptolemy style.
  • Distribution of the compiler with the Ptolemy software.
  • Distribution of a subset of the octtools with the Ptolemy software.
  • Shared data structures between stars.
  • New "begin" method in the base class for stars.
  • Support in the CG56 Domain for a DSP card.
  • Simpler installation and user configuration.
  • Installed "gnats" bug tracking system
  • The pxgraph program has been linked to FrameMaker.
  • Improved the example "XXX" domain.
  • Many new stars.
  • Improved compile-SDF target.
  • Improved classes to support clustering in schedulers.
  • 2.13 Miscellaneous

    3. Next Period Activities

    The top priority topics that we will be addressing over the next year are:

  • Design flow management as an integrated part of design.
  • Redesign of our basic code generation mechanism.
  • Synthesis from multidimensional dataflow.
  • Better support for scripted simulations.
  • Improved dynamic dataflow scheduling and PGM compatibility.
  • Redesigned image processing infrastructure.
  • Improved algorithm-level design including symbolic manipulations.
  • System-level methodology for evaluating parallel architectures.
  • Finite-state machine control functions.
  • Interface to Esterel for specifying sophisticated controllers.
  • A real-time process network model of computation.
  • Automated regression tests for the complete system.
  • Use of the "cyclo-static" dataflow model for scheduling.
  • Higher-order functions in the code generation domains.
  • 4. Acknowledgements

    We would like to give special thanks to the following people outside Berkeley who contributed in major ways to the results reported above:

  • Egbert Ammicht (Bell Labs)
  • Neal Becker (Comsat)
  • Joe Buck (Synopsys)
  • Gyorgy Csertan (Technical University of Budapest)
  • Stefan De Troch (IMEC)
  • Chandan Egbert
  • Tom Lane (Structured Software Systems)
  • Douglas Niehaus (Univ. of Kansas)
  • Sunil Samel (IMEC)
  • Christopher Scannell (NRL)
  • Richard S. Stevens (NRL)
  • Richard Tobias (White Eagle Systems Technology)
  • Alberto Vignani (Fiat)
  • (Apologies to anyone we forgot to mention).

    5. Papers

    1. M. J. Chen, "Developing a Multidimensional Synchronous Dataflow Domain in Ptolemy", ERL Technical Report, UCB/ERL No.94/16, Master's Report, May, 1994.
    2. S.-I. Shih, "Code Generation for VSP Software Tool in Ptolemy", MS Report, Plan II, ERL Technical Report (number pending) UC Berkeley, May 25, 1994, Master's report.
    3. S.Bhattacharyya, E.A. Lee, "Scheduling Synchronous Dataflow Graphs for Efficient Looping", Journal of VLSI Signal Processing, 6, pp271- 288, December 1993.
    4. A. Kalavade, E.A. Lee, "Manifestations of Heterogeneity in Hardware/Software Codesign", Proc. of DAC-94, San Diego, CA, June, 1994.
    5. J.L. Pino, T.M. Parks, and E.A. Lee, "Automatic Code Generation for Heterogeneous Multiprocessors," ICASSP '94, Adelaide, Australia, April 19-22, 1994, vol. II, pp. 445-448.
    6. P. Murthy, S. Bhattacharyya, and E.A. Lee, "Minimizing Memory Requirements for Chain-Structured Synchronous Dataflow Programs," ICASSP '94, Adelaide, Australia, April 19-22, 1994, vol. II, pp. 453-456.
    7. E.A. Lee, "Computing and Signal Processing: An Experimental Multidisciplinary Course", ICASSP '94, Adelaide, Australia, April 19-22, 1994, vol. VI, pp. 45-48.
    8. P. Murthy, E. A. Lee, "On the Optimal Blocking Factor for Blocked, Non-Overlapped Schedules,"ERL Technical Report (number pending) UC Berkeley, June 6, 1994. Also, invited to Asilomar '94.
    9. B.L. Evans, R.H. Bamberger, and J.H. McClellan, "Rules for Multidimensional Multirate Structures", IEEE Transactions on Signal Processing, vol. 42, no. 4, pp. 762-771, April, 1994.
    10. B.L. Evans, T.R. Gardos, and J.H. McClellan, "Imposing Structure on Smith Form Decompositions of Rational Resampling Matrices", IEEE Transactions on Signal Processing, vol. 42, no. 4, pp. 970-973, April, 1994.
    11. B.L. Evans, F.A. Sakarya, "Interactive Graphical Design of Two- Dimensional Compression Systems", Proceedings of Second National Workshop on Signal Processing, Marman, Turkey, April, 1994.
    12. Shuvra Shikhar Bhattacharyya, Compiling Dataflow Programs for Digital Signal Processing, Ph.D. Thesis, June, 1994.
    13. Jose L. Pino, Thomas M. Marks, and Edward A. Lee, "Mapping Multiple Independent Synchronous Dataflow Graphs onto Heterogeneous Multiprocessors," Asilomar '94.
    14. Brian L. Evans and James H. McClellan, "Algorithms for Symbolic Linear Convolution", Asilomar '94.
    15. Brian L. Evans, Steve X. Gu, and Robert H. Bamberger, "Interactive Solution Sets as Components of Fully Electronic Signals and Systems Courseware," Asilomar '94.
    16. Brian L. Evans, Juergen Teich, and Christian Schwarz, "Automated Design of Two-Dimensional Rational Decimation Systems," Asilomar '94.
    17. Brian L. Evans and Juergen Teich, "Families of Smith Form Decompositions to Simplify Multidimensional Filter Bank Design," Asilomar '94.
    18. Brian L. Evans, Douglas R. Firth, Kennard D. White, and Edward A. Lee, ``Automatic Generation of Programs That Jointly Optimize Characteristics Of Analog Filter Designs'', submitted to the Proceedings of the IEEE.
    19. Asawaree Kalavade and Edward A. Lee, "A Global Criticality/Local Phase Driven Algorithm for the Constrained Hardware/Software Partitioning Problem," Proc. of the Codesign Workshop, Grenoble, France, September, 1994.
    20. M. J. Chen and E. A. Lee, "Design and Implementation of a Multidimensional Synchronous Dataflow Environment", Asilomar '94.
    21. S. Sriram and E. A. Lee, "Static Scheduling of Communication Resources in Multiprocessor DSP Architectures", Asilomar '94.
    22. Edward A. Lee, "Dataflow Process Networks," UCB/ERL memorandum number 94/53, Electronics Research Laboratory, Univ. of California, Berkeley, CA 94720.
    23. Karim P. Khiar and E. A. Lee, "Modeling Radar Systems Using Hierarchical Dataflow," paper proposal sent to ICASSP '95.
    24. Jose Luis Pino and E. A. Lee, "Hierarchical Static Scheduling of Dataflow Graphs onto Multiple Processors," paper proposal sent to ICASSP '95.
    25. T. M. Parks and E. A. Lee, "Non-Preemptive Real-Time Scheduling of Dataflow Systems," paper proposal sent to ICASSP '95.
    26. A. Kalavade, B. Evans, J. Pino, and E. A. Lee, "Managing Complexity in Heterogeneous Specification, Simulation, and Synthesis," invited paper, ICASSP '95.
    27. S. Sriram and E. A. Lee, "Scheduling Communication Resources in Statically Scheduled Multiprocessor Architectures," submitted to IEEE Trans. on Parallel and Distributed Computing.

    Back to the Ptolemy RASSP home page.


    Last updated 11/08/96. Send comments to www@ptolemy.eecs.berkeley.edu.