1. Java programming and compilation
Speaker: Milind Girkar
Milind_Girkar@ccm.sc.intel.com
Target Audience: Java programmers, compiler writers.
Assumed Background: Knowledge about object oriented languages,
code generation techniques.
Length: full day
About the Instructor:
Milind Girkar did his PhD at the University of Illinois
at Urbana-Champaign. Currently, he is a senior software
engineer in Intel's Microcomputer Research Labs. His
research interests are in parallelizing and optimizing
compilers.
Outline of the tutorial:
Introduction
what is java, history of java, buzzwords associated
with java, comparisons with C/C++, sample programs,
development environment.
Language
basic types, strings, arrays, objects, statements,
operators, classes, packages, input/output, variables,
scope rules.
Inheritance, interfaces
virtual functions, superclass dependencies, runtime class
loading, interfaces to simulate multiple inheritance,
run time type information.
Exceptions
error handling, throwable objects, catch or specify
requirement, finally clause, exception hieararchy.
Threads
thread context, priorities, synchronization, monitors,
wait and notify, threads by inheritance and interfaces,
thread states, examples, daemon threads, thread groups.
Applets
example applets, animation (with threads), sound clips,
security restrictions, in html files, parameters to applets.
Graphics
awt, containers, components, event model, frames, buttons,
choices, lists, labels, menus etc, examples.
Networking
examples, URLs, sockets, datagrams
Native methods
integrating native methods in java, examples, VM functions
available, signalling errors and exceptions, synchronization
in native methods.
Java virtual machine
architecture, registers, operand stack, current limits,
bytecode instruction set, strong typing, garbage collection.
Security
security management, security checks class loader, name
spaces, bytecode verifier, verifier algorithm, verifier
example.
Just in time code generation
just-in-time compilation, interaction with VM, field/method
access, dependence on object model, runtime resolution,
exception handling and garbage collection interface.
Performance
current performance of various VMs, comparison with native
code generation, _quick bytecodes, performance/optimization
of bytecode, application characteristics determining
performance, issues with compilation at runtime.
Miscellaneous
hotjava, Sun's announced java chips, 100% pure Java,
upcoming releases of JDK, pointers to more information.
2. Instruction Level Parallel Processing: Architectures and
Code Generation
Speaker: Henk Corporaal
heco@cs.et.tudelft.nl
http://cs.et.tudelft.nl/~heco
Target Audience:
This tutorial is meant for all people interested in computer
architecture beyond the introductory level; especially for those who
are interested in new trends in computer architecture with the
emphasis on architectures exploiting instruction level parallelism
and in advanced code generation techniques for these architectures.
In order to enhance understanding a large introductory part has been
included covering current trends in computer architecture, and
treating principles of operation and design of instruction level
parallel processors.
The tutorial may also interest people involved with research in
embedded system design. ASIC designers will be interested, because
VLIWs and TTAs offer an attractive alternative to dedicated,
non-programmable, hardware solutions. Very short design times are
possible through the automatic design space exploration, and the
automatic generation of hardware and software for arbitrary
applications written in a high level language.
Assumed Background:
Length: full day
About the Instructor:
Henk Corporaal is Associate Professor in Computer Architecture at the
Delft University of Technology (TUD). At this university he initiated
and managed several research projects in the areas of computer
architecture, processor hardware design, parallel processing, MIMD
processor design, and concurrent simulation of VHDL. One of his key
projects, the MOVE project, concerns the automatic generation of
hardware and software for embedded systems (see:
http://cs.et.tudelft.nl/MOVE).
Dr. Corporaal has a Ph.D. in Electrical Engineering from the TUD and a
M.Sc. in mathematics and physics from the University of Groningen
(the Netherlands). He has a large teaching experience, both at
college and university level. Currently Dr. Corporaal lectures
undergraduate, graduate and postgraduate courses on computer
programming, computer architecture and parallel processing at the TUD
and the Advanced School for Computing and Imaging (ASCI).
Corporaal has written a range of publications in different areas,
including computer architecture, embedded system design, design
automation, non-conventional architectures, garbage collection and
runtime support for high level languages, MIMD computing and
communication support, concurrent simulation, neural networks, and
compilation for instruction level parallel architectures.
Outline of the tutorial:
Introduction
============
The material of this tutorial covers a full day split into two parts,
one on processor design and the other on techniques for code
generation exploiting instruction level parallelism. Each part takes
approximately three hours.
Part of the covered material is taken from the following book:
Microprocessor Architectures:
from VLIW to TTA
author: Henk Corporaal
publisher: John Wiley
due September 1997
Handouts and copies of slides will be made available.
Below follows a summary of the tutorial, an indication of the intended
audience, and a contents overview of the tutorial. Finally a short
curriculum vitae is included.
Tutorial summary
================
A computer architect is faced with answering the question what
functionality has to be supported by a computer system, and in
particular by the processor of such a system: is hardware support
needed for complex functionality or suffices hardware support for
basic (simple) operations only. In the latter case more complex
functionality is handled by one or more layers of software, like the
operating system, special libraries, and high level languages.
Looking into history this question has been answered differently. In
the seventies there was a strong tendency to support as much
functionality as possible, leading to the well-known complex
instruction set computers (CISCs). In the eighties computer architects
became more aware of taking the whole mapping trajectory from
applications to hardware into account, while performing a quantitative
analysis of cost and performance. This resulted in reduced in-
struction set computers (RISCs), which pipelined the execution of
instructions.
The nineties again show a tendency towards more complex
processors. VLSI developments enable the design of processors
containing many concurrently operating function units; these
processors exploit the instruction level parallelism (ILP) which is
available in any program. Two important classes of ILP processors are
superscalars and very large instruction word processors (VLIWs).
Superscalar processors, which are now commonly used in PCs and
workstations, become very complex because of their support for
issuing multiple instructions per cycle and speculative execution.
Especially the decoding and control logic of those superscalars using
out of order execution, like e.g. the Pentium Pro, the MIPS R10000,
PowerPC 620, HP 8000, and the Alpha 21264, is getting very complex and
may require multiple stages of the instruction pipeline. These
processors are therefore expensive and oriented towards the (high end)
general purpose market; they are less suitable for low cost embedded
systems.
VLIWs circumvent the complex decoding and control logic and its
associated cost, at the price of not being binary code compatible with
predecessor architectures. VLIWs have better scaling properties. In
addition they are more flexible; their functionality can easily be
adapted to specific applications. Therefore they are good candidates
for high performance low cost embedded systems. Several interesting
and very powerful VLIWs hit the (multi-media) market like: the
Trimedia of Philips, the Mpact of Chromatic, and the TMS320C6x of
Texas Instruments. Despite their good properties, the data path of
VLIWs is still too complex, in particular when they are scaled to high
performance. This makes it interesting to look at alternative
architectures which avoid this complexity but keep the good properties
of VLIWs.
One such alternative is the transport triggered architecture (TTA)
which has extensively been researched within the MOVE project at the
Delft University of Technology. TTAs combine a number of interesting
properties, like functional flexibility, high performance
scalability, and design modularity, which make these architectures
particularly suited for being applied within embedded systems.
ILP architectures can only become successful when new compilers become
available which automatically detect and exploit the parallelism which
is available in sequential programs written in high level languages
like C, C++ and Fortran. To this purpose advanced scheduling
techniques have been developed which reorganize the code such that
many operations can be executed in parallel.
This tutorial treats the developments within the world of ILP
processing; it gives the audience an up-to-date overview. It consists
of two parts. The first part concerns ILP architectures and
processors. It discusses recent trends in computer architecture,
describes several recent superscalar and VLIW processors. It gives
an in-depth analysis of the complexity of these processors, and
analyzes methods which reduce this complexity. TTAs are described as
an example which resolves many of the ILP complexity
bottlenecks. The first part also shows how ILP architectures can be
automatically tuned to different applications, making them
attractive to the embedded system designers. Finally several hot
topics on ILP processor design are discussed.
The second part describes compilation techniques for ILP
processors. It starts with an introduction of the compilation
trajectory, and on compiler developments, in particular on
parallelization techniques. It then treats different scheduling
techniques which exploit ILP, ranging from basic block scheduling
techniques to advanced software pipelining methods. This part also
pays attention to specific issues when scheduling for TTAs. TTA
schedulers are somewhat different because they schedule data
transports instead of operations. At the end several hot topics in the
area of code generation for ILP processors are discussed.
Tutorial Contents
=================
Part I : Instruction level parallel architectures
-------------------------------------------------
1. Introduction
o Motivation
o Architecture design goals
2. Trends in computer architecture
o Performance developments of microprocessors
o VLSI developments
o Architecture developments
o Architecture design space: the big picture
3. Instruction Parallel Processors
o Superscalar examples like: HP-PA8000, Alpha 21264, MIPS
R10000, Pentium Pro
o VLIW examples like: TriMedia, Mpact, TMS320C6x
4. Complexity of ILP processors
o Control complexity
o Data path complexity
5. Complexity reduction techniques for ILP architectures
o Reducing control complexity
o Reducing data path complexity
6. Transport Triggered Architectures: a solution to the complexity bottle-
necks
o TTA programming concepts
o TTA design space:
- FU interconnection network
- Pipelining issues,
- Register units
- Exception handling
7. A framework for automatic processor design for arbitrary applications
o System synthesis: let applications define architectures
o Intelligent search of the architecture design space
o Automatic processor generation
o Automatic code generation
8. Hot topics in ILP processor design
9. Summary and Conclusions
Part II : Compiling for instruction level parallel architectures
----------------------------------------------------------------
1. Introduction
o How much parallelism exists in applications?
o How much parallelism is exploitable by ILP scheduling
techniques?
2. Compiler developments
o Compilation trajectory
o Parallelization techniques
3. Basic block scheduling
o Classification of different methods
o List scheduling
o Useful heuristics
4. Extended basic block scheduling and Speculation
o Scheduling scope
o Trace scheduling
o Region scheduling
5. Software pipelining techniques
o Different methods
o Modulo scheduling
o Enhanced software pipelining
6. TTA specific code scheduling issues
o Scheduling transports instead of operations
o Exploiting TTA specific optimizations
o Scheduling for different word-sizes
7. Hot topics in ILP scheduling and code generation
8. Summary and Conclusions
4. Distributed Shared Memory: Concepts and Systems - The 1997 Update!
Speaker: JELICA PROTIC + MILO TOMASEVIC + VELJKO MILUTINOVIC
vm@etf.bg.ac.yu
Target Audience:
Assumed Background:
Length:
About the Instructors:
Ms. Jelica Protic received her B.S. and M.S. degrees from the
University of Belgrade, and is now working towards her Ph.D.
degree in the field of distributed shared memory (final stages).
Her general field of interest covers various issues of importance in
multiprocessor and multicomputer systems, computer architecture
and performance evaluation in general. She is heavily involved
in both academic research and industrial development, and she
worked on several projects related to distributed systems
implementations.
Prof. Dr. Milo Tomasevic received his B.S. in electrical
engineering, and M.S. and Ph.D. in computer engineering, from
University of Belgrade, in 1980, 1984, and 1992, respectively.
He is now with the University of Belgrade, and also associated
with IFACT (Institute for Advanced Computer Technology),
a research and development organization spread over
several countries, with strong business ties in the USA, Europe,
and Far East. There he was involved in several large research
projects (Encore, TDT,...) in the area of cache memory and
related issues. His Ph.D. thesis research (sponsored by NCR)
delt with the cache coherence problem in shared-memory bus-based
multiprocessors. His current research interests cover computer
architecture and distributed shared memory multiprocessor
systems. He has received awards for some of his conference
papers.
Prof. Dr. Veljko Milutinovic is with the University of Belgrade,
and also associated with the IFACT (Institute for
Advanced Computer Technology), since 1989. Before that, he was
on the faculty of the School of Electrical Engineering, Purdue
University, West Lafayette, Indiana, USA (since 1982). He was
active in the RISC field for about one decade now, in the
technology related research (32-bit GaAs RISC for RCA), and the
application related research (multimedia oriented RISC based
multiprocessor efforts of NCR). He published about 50 papers in
IEEE journals, and presented over 200 invited talks all over the
world. He received awards for some of his conference papers.
Some of his IEEE Computer Society Press authored/edited books
were best sellers in the past. He is one of the most widely referenced
researchers in the textbooks on the general field of computer
architecture. He was the project leader for a
number of DSM designs, and related research studies, for some of
the leading US and Japanese industry.
Outline of the tutorial:
INTRODUCTION
------------
This one-day (six-hour) tutorial is based on the IEEE tutorial
book of the same title, by the same authors/editors
(see the reference below). A half-day (three-hour version) is also
available. The presentation follows closely the organization of
the book, and if so desired, each attendee can obtain a copy of
the book. The book is scheduled to be published before March 97.
The tutorial speakers were involved in the research,
development, or actual VLSI design of cache coherent DSM systems
for companies like NCR, Encore, Unisys, Marubeni, and similar
(related examples are included into the presentation). I na pioneering
effort (back in 1993), they have designed
a board which enables a personal computer to serve as a node in a DSM
system based on the reflective memory approach. Also, they have
published several papers in the field, of which two received
awards at international conferences, two appeard in the IEEE
MICRO magazine (October 94 and December 94), two in the IEEE COMPUTER
magazine (March 95 and Decemmber 95), and two in the IEEE PDT magazine (1996).
The three authors have lots of experience with pre-conference and
in-house tutorials - one or the other or the third - they
presented over 100 tutorial talks (a list of their most recent
tutorial presentations is available on request).
This tutorial is nicely complemented with another full-day
(six-hour) tutorial on hardware solutions to cache consistency
in shared-memory multiprocessor systems, by Tomasevic and
Milutinovic (corresponding to an IEEE tutorial book which is
already published, and represents a best-seller of the IEEE CS
PRESS - see the reference below).
CONTENTS
------------
Introduction to DSM concepts and algorithms (distributed shared
memory coherence problem, global overview). Classification
criteria in the field (according to the type of DSM
implementation, DSM management, and DSM algorithm). Relevant
classification criteria in the field (according to the type of
DSM implementation, DSM management, and DSM algorithm). Relevant
classification parameters (layout of shared data, granularity,
consistency mode, interconnection network, cache configuration).
DSM algorithms (SRSW, MRSW, and MRMW).
Special emphasis devoted to memory consistency models
(sequential, processor, weak, release, leasy relase, entry,
aurc, scope, ...) and their implementations. DSM implementations on
the hardware level (basic concepts, examples: DASH, SCI, KSR1, DDM,
Memnet, RMS, etc.). DSM implementations on the software level
(basic concepts, examples: Ivy, Munin, Mirage, Amber, Linda,
TreadMarks, etc.). DSM hybrid approach (basic concepts, examples: PLUS,
Paradigm, Lynx/Galactica Net, LimitLESS Directories, Flash, Shrimp, etc.).
Evaluation of DSM coherence schemes (analytic
and simulation methods, real and synthetic workloads, comparison
of hardware and software schemes). State-of-the-art research at leading
universities (Stanford, Princeton, etc.), or industry (STiNG, DEC, etc.).
All case studies are presented using the same presentation template,
so the subject is easy to follow and comprehend. Also, numerical
examples are included, so engineers feel good after they walk out
from the conference room.
REFERENCES
----------
J. Protic, M. Tomasevic, V. Milutinovic, "Distributed Shared
Memory: Concepts and Systems," IEEE Computer Society Press, Los
Alamitos, California, USA, 1997 (in production).
M. Tomasevic, V. Milutinovic, "Cache Coherence in Shared Memory
Multiprocessors: Hardware Solutions," IEEE Computer Society
Press, Los Alamitos, California, USA, 1993 (a best seller).