Towards Wide-Area Network Piranha: Implementing Java-Linda Andrew Smith Advisors: Profs. David Gelernter and Nick Carriero  
Introduction

The Piranha model of parallel computation allows the use of idle computer cycles for parallel computation, and is described by Kaminsky as follows:

"Adaptive parallelism refers to parallel computation on a dynamically changing set of processors: processors may join or withdraw from a computation as it proceeds...In the Piranha model of adaptive parallelism, idle network nodes are used to run explicitly parallel programs. Idle nodes host transient processes called Piranha. A Piranha process is started automatically when a node becomes idle and continues while it remains so. When a node becomes busy, its Piranha process executes a short retreat function and exits. One feeder process is created to manage the ongoing computation. The feeder does not retreat...Piranha processes transform a distributed data structure describing work into one describing results. Distributed data structures are stored in a Linda tuple space where they are accessible to all participating processes. A Piranha programmer writes a program by specifying three coordination functions---feeder, Piranha, and retreat---and an arbitrary number of computation functions. The coordination functions define the interaction among the processes. The computation functions perform the task at hand..."

The Piranha system as currently implemented is written in C and allows users to utilize idle cycles on a set of homogeneous UNIX workstations connected in a local area network (LAN) in order to run a parallel program in a master/worker style. Low intensity daemon processes are run on machines which will be hosting Piranha parallel programs; these daemons periodically check for application submission, monitor node usage (i.e., to determine if a user is back and thus the Piranha processes should be forced to retreat), etc. Piranha users must have accounts on all machines hosting Piranha (i.e., to do rsh/rcp commands on those machines) and all the machines must have access to the same file system where Piranha executables are stored. Piranha has been tested on non-Solaris Sparcstations and IBM RS6000's, and a limited amount of heterogeneity is supported in that Piranha applications can be run on these two architectures at the same time.

Key directions for Piranha development include heterogeneity and executable distribution (i.e., running Piranha parallel programs on a collection of heterogeneous computers, such as various types of UNIX workstations, Macintoshes, PC's, and possibly even specialized devices containing special chips/controllers, and how executables for the different architectures should be distributed), data portability (i.e., making certain that data passed between Piranha processes, possibly running on a heterogeneous collection of computers, is always interpreted correctly), Wide Area Network Piranha ("WAN Piranha") (i.e., allowing Piranha processes to run on machines not only in a LAN, possibly crossing organizational boundaries and spawning a new business of "selling idle cycles") and security (i.e., making certain Piranha cannot be used for malicious purposes, such as stealing data, deleting files, etc.---this will be especially important for the successful development of WAN Piranha). In addition, fault tolerance (i.e., allowing Piranha computations to continue and Piranha data structures to remain consistent in the presence of node/process failures) and scheduling (i.e., determining the number and location of nodes on which to run a certain Piranha computation, and partitioning the tasks among those nodes in the most effective way) are also very important. Of course, many of these directions are interrelated.

My long-term research goals are to continue development of Piranha by pursuing the key directions stated in the previous paragraph. As an important start, this paper proposes that future development of Piranha should be done in the Java language. The paper is organized into two major parts. The first part motivates why continuing development of Piranha in Java is beneficial. In particular, the paper discusses key features of Java that will solve, simplify, or at least provide a definite, coherent framework for dealing with the problems that will be encountered in trying to achieve the key directions of the previous paragraph. The paper then considers an important possible drawback to Piranha development in Java, base Java performance. The second major part of the paper describes the design and implementation (to date) of the Linda coordination and communication language in Java; the results of performance tests of the implementation will also be given. While the ultimate research goal is Piranha in Java, Linda plays a central role in Piranha; Linda is the "glue" which coordinates the Piranha parallel processes and allows them to communicate, and developing Linda in Java is an important prerequisite to developing Piranha in Java. In addition, such an implementation of Linda in Java is practically useful in itself.

It is important to stress before continuing that the key directions for Piranha discussed above are significant technical challenges. The effort described in this paper does not address all of these directions. The important accomplishment of the work described in this paper is to solve the two most pressing directions in an elegant way which lays a solid foundation for dealing with the remaining directions. In particular, continuing development of Piranha in Java solves the problems of heterogeneity and executable distribution, and data portability. In addition, Java’s comprehensive security features will provide an excellent framework for dealing with the important security requirements for WAN Piranha. Finally, the elegant solution Java provides for heterogeneity and executable distribution and data portability will allow scheduling and fault tolerance to be dealt with in a more consistent and coherent way. For example, Java’s "write once, run anywhere" property means that only a single executable for Java-Piranha applications will need to be created and this executable will run on all platforms; in addition to easing the burden on Java-Piranha application developers, this means that future scheduling algorithms will not have to deal with distributing multiple executables: the same executable can be distributed to all machines. Scheduling and fault tolerance will be significant challenges, but I do have a general strategy to deal with them. WAN Piranha will have to schedule effectively in an environment where 1) the availability of the computing nodes for the entire run of the computation is not certain, 2) the range of computing devices available and their capabilities will vary greatly, and 3) interprocess communication time will vary greatly. Some powerful heuristics will have to be invented to handle scheduling in such an environment, but the overall framework for dealing with the problem is the Network Trellis. This is another application of Linda tuplespace technology created by the Linda group. Its main purpose is as a system for real-time data fusion, where high-level views of data can be created in real-time by propagating low-level data streams up through a number of levels in a hierarchy; each level synthesizes the data streams below it and passes its results on to higher levels --- the updating of levels is handled in parallel to the extent possible. The main idea for how the trellis can be used for scheduling in WAN Piranha is that the data needed to make scheduling decisions is arranged hierarchically and a trellis can be constructed from the levels of this hierarchy. The lowest-level data is individual machine data, such as machine load, user activity, availability of memory, etc. The next level up is a local area network, where overall statistics about the network as a whole are synthesized and stored: number of idle machines, average load, etc. Higher levels would be created for medium area networks and wide area networks, using the results from lower levels.The trellis should also be useful as a framework for addressing fault tolerance.

Heterogeneity, Executable Distribution and Data Portability

The current Piranha prototype is written in C to be run on UNIX workstations; in fact, it is more restricted than that: it is written in C to be run on SunOS machines and IBM RS6000's---some effort would be required to port to machines running different versions of UNIX. Putting aside for the moment the goal of extending Piranha to WANs, even in a LAN there are 2 big problems with trying to extend the system as it is now to run in a heterogeneous environment:

(1) Executable distribution

The C language code for the current Piranha system would have to be modified, compiled and linked on each architecture on which it is desired to have Piranha run. The modifications necessary to the code to get it to run even on different flavors of UNIX would probably be somewhat difficult; and the port to Macintoshes and PCs would likely be quite complicated and time-consuming. Finally, a port to other computing devices (such as "settop boxes") might be impossible. In addition, while it would require much effort to port the Piranha system to various architectures, at least it would be a one-time effort. Unfortunately, the Piranha application programmer, in writing his piranha, feeder, retreat, and other functions, would have to take into account the different architectures his program might run on and create a separate version for each one; also, there would need to be a mechanism to distribute the different executable versions to their corresponding architectures.

(2) Data portability

Piranha applications need to exchange data among themselves through tuple space. However, when Piranha applications written in C are run on a heterogeneous collection of nodes, correct exchange of data becomes a problem. The problem is that different machines define different, incompatible formats for the way they represent data. Thus, when running Piranha in such a heterogeneous environment there must be some facility for translating data from one machine's format to another's. There are some protocols and software, such as XDR, for helping in this translation, but it is not a completely robust solution, since not all platforms support it. In addition, some effort would be required to incorporate such translation code into the various versions of Piranha.

The Java programming language platform has lately attracted much attention as a new and better language, especially for creating network/internet applications. In fact, by its nature, if the Piranha system were rewritten in Java the problems of heterogeneity, executable distribution and data portability would cease to be problems at all---the Piranha system, including code for tuple space, and end users' Piranha applications would only have to be written in Java once and would be executable as is on any platform which supports Java (there are many, and likely more to come). An excerpt from the Java white paper tells why:

"Java is designed to support applications that will be deployed into heterogeneous network environments. In such environments, applications must be capable of executing on a variety of hardware architectures. Within this variety of hardware platforms, applications must execute atop a variety of operating systems and inter operate with multiple programming language interfaces. To accommodate the diversity of operating environments, the Java compiler generates bytecodes-an architecture neutral intermediate format designed to transport code efficiently to multiple hardware and software platforms. The interpreted nature of Java solves both the binary distribution problem and the version problem; the same Java language byte codes will run on any platform......Architecture neutrality is just one part of a truly portable system. Java takes portability a stage further by being strict in its definition of the basic language. Java puts a stake in the ground and specifies the sizes of its basic data types and the behavior of its arithmetic operators. Your programs are the same on every platform-there are no data type incompatibilities across hardware and software architectures."

Wide Area Network Piranha

The Piranha system as currently implemented runs only on a local area network of homogeneous machines. In the ideal future Piranha system, users will have access to more machines than just those on their local area network for parallel applications --- ideally, all of the heterogeneous nodes connected to the internet; Piranha applications will thus be able to span wide area networks, if necessary to enhance performance, instead of being limited to local area networks as at present. Organizations will be able to rent the idle cycles of their computers for Piranha computations. There will be significant challenges in achieving such a system (as discussed previously). However, three central problems will clearly be heterogeneity and executable distribution, data portability (i.e., WANs are generally composed of many types of computers, having differing formats for data), and security (discussed in the next section). As discussed previously, the first two problems are completely eliminated by switching development to Java. In addition, since Java is a language especially suited to network and internet computing, security is a central part of its design; thus, while not eliminating the security problem for Piranha, it does provide a clear framework and mechanism for dealing with it.

Security

In the ideal Piranha system which allows Piranha computations to spread across wide area networks, possibly spanning many different organizations' computers, insuring security will be of prime importance. Ideally, such a Piranha system should insure the following two symmetric security rules:

1) Organizations which allow their computing resources to be used for Piranha computation by outside sources should not have their private, internal resources (e.g., top-secret files, etc.) compromised by the Piranha processes.

2) Likewise, Piranha application programmers in some organization who submit Piranha processes to be run on the computational resources of other organizations should be guaranteed that the data/code that their Piranha application is operating on (which might be sensitive data for their organization) is not accessible to the other, Piranha-hosting organizations.

To make the importance of these two rules more apparent, consider the following example. Ford and Chrysler are two competing automotive corporations and say that Ford wants to use Piranha parallel computation to speed up computations on the design of its new top-secret sports car. Chrysler wants to make sure that any Piranha processes from Ford running on Chrysler's computers cannot access sensitive Chrysler data, such as data about similar Chrysler sports cars, etc. Likewise, Ford wants to be sure that Chrysler cannot access the data its Piranha processes are operating on, and possibly even that the source code for its Piranha processes and computations being carried out cannot be accessed by Chrysler. In this example, failure to uphold the two security rules above could provide a way for Ford or Chrysler to obtain sensitive information from the other.

Java has the advantage that security concerns were directly considered and mechanisms for insuring security are an integral part of the language at several different levels (please see http://daffy.cs.yale.edu/java/java_sec/java_sec.html for an overview of Java's security features). However, it should be noted that Java's current security features concentrate on providing security of rule 1's type. Java security prevents rogue applets from outside the organization from compromising the resources of the organization, but does not protect the data/code of the applet from being accessed by the organization. In other words, Java security protects the computer hosting the applet from the applet but does not protect the applet from the computer hosting the applet. It is not clear how such protection for applets could be added to Java, or even if it is possible or necessary for application domains other than Piranha. It seems like it could be an interesting direction for future research in Java security; possibly some method for "encrypted computations" (in addition to the more traditional notion of encrypted data) would be necessary for this. Nevertheless, for the present much work has gone into Java's security mechanisms (and, given Java's continuing gain in popularity and widespread use, much more will probably be carried out) and it has many desirable security features which will provide a clear and effective means to deal with much of the security required by Piranha.

Java Performance - A possible caveat

Java has many nice features as a programming language, especially for network/internet applications like Piranha, and incorporates many desirable features of modern programming languages and methodologies. These include object oriented design (to ease client/server application creation, support software reuse, etc.), strong compile-time type checking, automatic garbage collection (to eliminate many troublesome memory problems, such as dangling pointers), and array bounds checking. However, this paper is proposing to implement Piranha (and Linda, as a first step), a system specifically for achieving high performance through parallelism, in Java, which is at present not a language for high performance computing. This presents a possible strong argument against developing Piranha in Java, and this section addresses this argument.

Java, as it was initially described and implemented, is an interpreted language. Java programs are compiled to a machine independent object file format (Java byte code), which can then be interpreted on any machine which runs the Java virtual machine. This has many advantages but, unfortunately, interpreted code is considerably slower than compiled C or C++ code---there are estimates of 10 to 50 times (on average about 20 times) slower than compiled C or C++. Since the whole goal of Piranha is to make programs run faster through parallelism, it seems somewhat counter intuitive to be thinking about making Piranha parallel applications 10 to 50 times slower by moving to Java over C.

The issue is that for high-performance, parallel computing people want to use the fastest base language that they can, and C and FORTRAN are currently such languages. As stated above, Java is currently 10 to 50 times slower than compiled C (and probably a similar result for FORTRAN holds). Is this the end of the story, however? Is Java always going to be so inefficient compared to C and FORTRAN? The answer to these questions is no. There is active work in the Java community on developing "just in time" compilers; these work by translating the Java byte codes to native machine code on the fly (at runtime). Sun's Java team makes the claim that "in interpreted code we're getting about 300,000 method calls per second on a Sun Microsystems SPARCStation 10. The performance of bytecodes converted to machine code is almost indistinguishable from native C or C++" in their White paper describing Java . Also, various Web pages describe other Java "just in time" compilers for PCs which give roughly C or C++ level performance (i.e., on some commonly known benchmarks). Finally, a chapter in the book "Using Java" gives an overview of "just in time" compilers and evaluates currently available ones. In particular, the latest versions of Netscape Navigator and Microsoft Internet Explorer both use "just in time" compilers within their Java interpreters and the chapter in "Using Java" has graphs showing the speedups of the "just in time" compiler enhanced Navigator and Internet Explorer as compared to older versions of Navigator and Internet Explorer (without "just in time" compilers) on a suite of benchmarks designed to measure various aspects of Java performance (specifically, the CaffeineMark benchmarks testing loops, string manipulations, method invocations, floating point operations, logic, image, graphics, and dialog operations --- test your own personal browser/machine at http://www.webfayre.com/pendragon/cm2/caffeinemark2.html). The results are dramatic improvements (more than 10 times speedup) on most of the benchmarks. Thus, it seems that C or C++ performance (or, at least, close to it) is possible with Java. Java is also a very young language; if its popularity doesn't wane, there will be much more work on such "just in time" compilers for the various architectures which run Java, and their performance will continue to improve.

Even if Java's performance never quite matches that of compiled C or C++, however, it seems that the benefits of Java far outweigh the small decrease in performance. In other words, even though the small decrease in performance is like taking a step back, the advantages discussed above are like taking a giant leap forward. Also, there are likely many people who use and will continue to use Java because it is the best fit to their application needs (e.g., world wide web applets); although Java might not be the ideal language for parallelism in general, such Java users could still benefit greatly from parallelism if Piranha were available to them.

Implementing Linda in Java as a First Step

The preceding sections described my long-term research goals of continuing development of Piranha, and give the context for the actual, concrete work done for my CS690 project to be described next. In particular, arguments were given for why it would be greatly advantageous to do further development of Piranha in the Java language. The Linda communication and coordination language plays a central role in Piranha by coordinating the Piranha parallel processes and providing the mechanisms for them to communicate through tuplespaces. Thus, adding Linda’s coordination and communication primitives to Java is an important and necessary prerequisite to development of Piranha in Java. Furthermore, just as Linda is practically useful and widely used apart from Piranha (for normal, non-adaptive parallel computations), creating Java-Linda would similarly be quite useful apart from Java-Piranha. Thus, as a first step I have designed and begun implementation of Linda in Java . In fact, this design and implementation of Java-Linda is the chief tangible result of my CS690 project. In the rest of the paper I will describe the high-level design of Java-Linda, the relevant details of a proof-of-concept implementation of it incorporating most of Linda’s basic features, the results from performance tests of the implementation, and future implementation plans.

Since Java-Linda itself is quite a large project, I took a divide-and-conquer approach and qualified the scope of the project for the time being as follows:

1) Focus on the Linda runtime system for now and leave compile-time analysis for later. Compile-time analysis of Linda Programs is necessary for efficiency and works by limiting runtime tuple matching through partitioning tuples into disjoint sets (whose tuples cannot intermatch) and determining hash codes for tuples. The runtime system uses information from compile-time analysis to handle tuple storage and matching efficiently, but it is not strictly necessary. My approach for the present is to assume that the user adds compile-time information manually if desired for efficiency (i.e., performs the compile-time analysis "by hand" --- this is described below). Future implementation efforts will concentrate on compile-time analysis (possibly using the Java compiler developed by Prof. Zhong Shao and his FlintJava project group) if the runtime system is found to have sufficiently good performance.

2) Focus on developing a single-server version. Multiple servers allow tuples to be spread among participating nodes based on hash codes, and give better efficiency through less runtime contention for any single server. However, multiple servers also add many complications, such as requiring more sophisticated distributed algorithms. Since my goal is quick implementation of a proof-of-concept system which can demonstrate the utility of Linda in Java, I will put off a multiple-server version for later.

3) Focus on developing the necessary Linda for Piranha. Since the purpose of building Linda in Java is for future development of Piranha in Java, it is not necessary to adhere strictly to the Linda model where it is not necessary for Piranha (and possibly cumbersome implementation-wise). For example, standard Linda specifies that evals turn into normal tuples upon completion of computation; this isn’t necessary for Piranha and we don’t support it here. However, since, as stated previously, Java-Linda will be useful in itself I will add full Linda support in the future.

4) Use TCP for simplicity of communication. UDP is generally used in production Linda systems due to its better performance. However, UDP is also much more cumbersome to program and thus TCP will be used to speed implementation. UDP support will be added later if performance requires it.

Related Work

There are a few other projects similar to the one described here that are implementing coordination and communication languages like Linda in Java. Importantly, however, the focus of most of these projects is on distributed computing, where efficiency, while important, is not as central as in parallel computing. The project most similar to ours is Sun’s own JavaSpace project. JavaSpace is Sun’s implementation of a Linda-like model for distributed computing. JavaSpace is built upon two other related projects of Sun’s. These are Remote Method Invocation ("RMI") and Object Serialization. RMI, which is Sun’s version of remote procedure calls for Java, allows the methods of objects residing on different Java virtual machines (possibly running on another machine) to be called. Object Serialization allows the creation and retrieval of serialized, byte-stream representations of objects; this is useful, for example, for object persistence: the serialized version of an object can be stored in a file and later read in and turned into the appropriate object. Java-Linda differs from JavaSpace in that it has multiple servers (eventually, anyway), uses hashing to limit runtime tuple matching for efficiency, uses (eventually) UDP for efficient communication (JavaSpace uses RMI, which uses TCP), and performs tuple matching differently (more faithful to the semantics of Linda and the object oriented nature of Java, I believe). We will make use of Object Serialization, as JavaSpace does, for sending objects (tuples) between participating nodes of the computation. Finally, Rob Bjornson’s 1993 Ph.D. thesis (and earlier ideas from Nick Carriero’s 1987 Ph.D. thesis) described and demonstrated how to build a high-performance network Linda system using hashing and tuple partitioning; these methods will be adapted for Java-Linda.

A Crash Course in Building Tuplespaces

Linda tuplespaces are implemented efficiently in such languages as C and FORTRAN by doing compile-time analysis of Linda source programs. The compile-time analysis gains efficiency by two primary means: tuple partitioning and hashing. Tuple partitioning works by partitioning tuples into disjoint sets, where members of a set might possibly match other tuples in that set at runtime but could never match tuples in other sets. The partitioning is done based on such aspects of tuples as the number of fields and the types of fields. For example, consider the following two Linda statements that could appear in some Linda program:

                    out("tup1",4,"hi",3.75);

                    in(6.32,"string","string2");

Here, we can clearly see that this in could never match the out (the types of the fields and the number of fields are different) and this can be determined at compile time. In general, tuples are only matched at runtime against other tuples in their tuple set. In addition, hashing is used to reduce tuple matching within tuple sets and the basic idea is easily explained by an example. Consider the following Linda statements:

                    out("tup", 2, 1.7, 3.34);
                    out("tup", 3, 4.27, 6.3);
                    ... ...
                    float x,y; in("tup", 3, ? x, ? y);

Here, the two out tuples can be stored in a hash table which is hashed based on the value of the second field (i.e., the two out tuples will be stored in separate hash buckets corresponding to the values of their second field). Then, in handling the in tuple, tuple matching only needs to be done against tuples stored in the hash bucket corresponding to the value 3 (i.e., the value of the in tuple’s second field). Thus, in this case the in tuple would only be matched against the second out tuple, since they share the same value in their second field (and would thus hash to the same bucket). Thus, in addition to partitioning tuples into disjoint sets, compile-time analysis also determines the fields of tuples which can be used to hash tuples as efficiently as possible. Tuple partitioning and hashing in compile-time analysis as described above are thus the two basic ways that runtime efficiency in tuple matching are attained; this section gave a brief overview of the process and the details can be found in Nick Carriero’s 1987 Ph.D. thesis and Rob Bjornson’s 1993 Ph.D thesis.

Adding Linda to the Object-oriented Language Java

This section describes the high-level design for Java-Linda. In addition, the user-interface to Java-Linda (i.e., how programmers will program in Java-Linda) will be explained indirectly in describing the design (actual Java-Linda example programs are also included in the appendix to show concretely how to create Java-Linda programs). The basic approach taken is to use ideas from C-Linda as much as possible, modifying or extending them for Java-Linda as appropriate. The important difference between Linda in C and FORTRAN and Linda in Java is that Java is an object-oriented language and C and FORTRAN are not. Object-oriented programming is a paradigmatically different, ostensibly better way of doing the business of programming than the structured programming methodology as embodied in C and FORTRAN. Thus, Java’s object-oriented nature must be considered in any proposal to extend Java; any such extension must be consistent with and take advantage of Java’s object-oriented features. Failure to do this could result in the extension being rejected, or not used if implemented. In short, we don’t want users of the extension to think it has "weird object-oriented semantics." Java-Linda takes advantage of Java’s object-orientedness in the way tuples are defined and in the semantics of tuple matching; this is the important difference between C-Linda and Java-Linda.

In Java-Linda, tuple sets are simply classes defined by users and actual tuples are simply instantiated objects of those classes. The only qualification is that the fields must be object types and not primitive types (such as int, float, etc.); this is so that the special object value null can be used to mark formal fields in an in tuple (see below), and does not present a problem in using primitive types since there are object "wrapper" types (such as Integer for int) for each primitive Java type. For example, a user could define a tuple set in Java-Linda by defining a class for it as follows:

                    public class Tuple {
                        Integer i;
                        Float x;
                        Float y;
                    }

and then create an actual tuple by creating an object of this class:

                    Tuple tuple = new Tuple(2, 1.7, 3.34);

This would correspond to the tuple (the tuple class for this would be generated by the C-Linda compile-time analyzer) for the following C-Linda statement:

                    out("Tuple",2,1.7,3.34);

Basic tuple matching in Java-Linda is performed as follows. First, a field in a tuple that has the value null is considered to be formal . Then, the fields of an in tuple and a tuple for which a match is being tested must have the same values, except for formal fields which match automatically. An example will make this clear. Consider the following Java-Linda code:

                    TupleSpace TS = new TupleSpace();
                    Tuple out_tuple = new Tuple(2,1.7,3.34);
                    TS.out(out_tuple);
                    ....
                    ....
                    Tuple in_tuple = new Tuple(2,null,null);
                    Tuple matched_tuple = TS.in(in_tuple);

Here, out_tuple could (there might be other matching tuples) be returned from TS.in, since the first field of out_tuple and in_tuple have the same value and the other two fields of in_tuple are null (and thus match the other two fields of out_tuple automatically).

Note that in our design tuplespaces are objects of a class TupleSpace (this is a Java-Linda system class; see the implementation section below for a detailed description) and tuplespace operations (in, out, etc.) are done by invoking methods of the tuplespace object. One could also envision supporting tuplespace operations as methods of tuple objects (e.g., "out_tuple.out(TS)" instead of "TS.out(out_tuple)" as above). This is unattractive because it would require users to subclass all of their tuple classes from some Java-Linda system class which would provide the methods for tuplespace operations. Users who want to make use of classes they have already written (i.e., originally written for some non-Java-Linda application) will have to go through and change all of these classes to subclass the Java-Linda system class. This would be quite cumbersome, as well as completely unnecessary for the original non-Java-Linda applications using these classes. Furthermore, classes loaded dynamically, for which the user has no control over the class definition, could not be used as tuples. Instead of subclassing, the user could simply encapsulate tuple objects inside extra container classes which subclass the system class; while this would provide a solution for dynamically loaded classes, it is still cumbersome and a needless burden to the user since the user would have to write all of these extra container classes. Our design allows any class to be used as a tuple class without any modifications (it must be serializable, however, but almost all classes are serializable).

Inheritance is handled in Java-Linda as follows. First, classes used for tuple sets can be subclassed. The matching rule is that an in tuple can match any tuple of its own class or that is a subclass of its own class. When matching an in tuple to a tuple of a class that is a subclass of the in tuple’s class, the fields added by the subclass are considered to be formal in the in tuple. The rationale for handling inheritance this way is that it is consistent with the way assignment is handled in Java (i.e., in Java a reference of some class A can be assigned to an object of class A or any of A’s subclasses); tuple matching is, after all, basically a kind of assignment and so this makes sense.

Compile-time analysis revisited

As stated previously, I have left compile-time analysis for future development in order to concentrate on building the runtime system. Nevertheless, if possible it is still desirable to get in some way the performance benefits that compile-time analysis gives. This is done as follows. First, note that partitioning into disjoint tuple classes is done directly by the user when s/he writes classes for tuples (like class Tuple in the examples above). Thus, in fact, compile-time analysis is not needed for this and only the determination of discriminating hash fields needs to be done. For this, the user is required to determine manually the field to be used for hashing and the hash function to be applied to this field. The Java base language provides a Hashtable class and objects are added to Hashtables based on the value given by their hashCode method (i.e., an object O is added to a bucket in a Hashtable corresponding to the value of O.hashCode()). Java-Linda makes use of this Hashtable class to store tuples (see below), and this provides a simple way to allow users to specify fields within tuples to be used as hash fields: the user simply is required to write a hashCode method for each tuple class if desired for efficiency (if the user doesn’t want to supply an optimal hashCode method, a default one which always returns some constant value can be used --- all tuples whose hashCode method is the default are then hashed to the same hash bucket). The user can write his own hash function, whose input is the chosen hash field, as this method; alternatively, since all fields must be object types and primitive types are usually used for hashing, the user can simply call the hashCode method of the chosen hash field (i.e., Java predefines hashCode methods for the primitive wrapper classes, such as Integer.hashCode(), Float.hashCode(), etc.) inside the hashCode method of the tuple class. Note that for out tuples, the hash field will always be available (i.e., not null --- Java-Linda does not currently allow formals in out tuples). In in (and inp, rd, and rdp) tuples, however, the chosen hash field might be null (i.e., formal); the hashCode method must therefore check if the hash field is null and return some "no hash" value if it is. At runtime, if an in (or inp, rd, and rdp) tuple object’s hashCode method returns anything other than "no hash," this value is used to limit runtime matching as described previously. If "no hash" is returned, exhaustive search of all tuples in the same tuple set as the in tuple must be performed in matching.

A compile-time analyzer will eventually be built which will automatically determine the best hash fields and hash functions (i.e., it will fill in automatically what the user is now required to add manually as described in the previous paragraph). It should be noted that building such an analyzer will probably be more complicated than for C or FORTRAN. This is mainly because Java is object-oriented with inheritance and any Java-Linda (which represents tuples as Java classes) compile-time analyzer, must take this into account. The complication is that while in C and FORTRAN Linda it is clear simply by looking at the source text the number and types of fields in a tuple (which at least makes it clear what the possible hash fields could be), the same is not true for Java-Linda; in Java-Linda, a tuple object can reference a tuple class, or any subclass of that class, and thus it is not even directly known exactly the fields available for hashing. Thus, development of the compile-time analyzer should be an interesting research topic. Practically, it will probably be beneficial to work some with Prof. Zhong Shao and his FlintJava group, which built a Java compiler in ML, on the development of the compile-time analyzer. Note that, even without a compile-time analyzer, the system can be practically used without too much difficulty.

Storing Tuples and More on Tuple Matching

Since inheritance plays a key role in Java-Linda’s tuple matching semantics (e.g., an in tuple can match any other tuple of its own class or subclasses of that class), it makes sense to store tuples within a tree structure that embodies the inheritance relationships of the tuple classes. For example, the tree shown in figure 1 on the following page could be used to store tuples for a hypothetical "geometric shape" Java-Linda program.We call this tree the tuple hierarchy tree. At each node of the tuple hierarchy tree is a hash table, a linked list, and an in/rd queue. Out tuples are stored in the hash table at the node in the tuple hierarchy tree for the tuple’s class and are stored in that hash table based on the result of their hashCode method. In addition, all out tuples are also stored in the linked list at the node; this facilitates exhaustive search in matching when no hash field is available in an in, inp, rd, or rdp tuple. Finally, the in/rd queue stores unfulfilled in, inp, rd, and rdp tuples (i.e., for which there was no matching out tuple when the request was first made).         Figure 1: the Tuple Hierarchy Tree.  
 
 

In Java-Linda, all tuples are stored in their serialized, byte-array form (i.e., the result of applying Object Serialization). Object Serialization stores all necessary information about an object (i.e., class information about the object and objects contained within, field values, etc.) in a standard format (Sun gives a complete BNF specification of this format in their Object Serialization specification). Thus, an object can be completely understood and recreated from just the information in its serialized, byte array. In Java-Linda, all tuple matching is done based on the serialized, byte-array form of tuples. This allows one generic tuple matching function to be used for all tuple classes, and tuple matching functions specific to each tuple class do not have to be generated. Since the complete BNF specification of the Object Serialization byte-array format is available, it is straightforward (albeit tedious and time-consuming) to build a "tuple matching parser" to perform tuple matching. Finally, Java provides mechanisms for synchronized access to objects and their methods and Java-Linda uses these to maintain consistency of the tuple and tuple hierarchy tree data structures. This is necessary so that, for example, the same tuple isn’t used to fulfill more than one in request (i.e., a tuple can only be tested for a match against one in request at a time), or other things like this do not happen.

Ins and Outs

Since Java-Linda is a parallel/distributed system, it consists of both client and server side code. This section outlines the sequence of steps that occur at the client and server sides when out and in (inp, rd, and rdp are similar and will not be described) operations are performed. An out operation has the following steps at client and server:

Client:

(1) Determine the class name and hash value of the tupleobject.

(2) Serialize the tuple object into a byte array.

(3) Send, using TCP, the class name, hash value, and
byte array to the tuple server for the tuple object’s tuplespace.

Server:

(1) Receive the class name, hash value, and byte array from
the client.

(2) If the class name has not been seen before, extend
the tuple hierarchy tree with nodes for the class and
all its superclasses.

(3) Use the class name and hash value to place the byte
array of the tuple at the correct node in the tuple hierarchy
tree. Also, link it into the linked list at the class’s node. In adding
the byte array, descend the class hierarchy from the root and
compare the byte array against any pending tuples in the in/rd
queue --- satisfy these requests if possible, without adding the
byte array to the tree.

An in operation has the following steps at client and server:

Client:

(1) Determine the class name and hash value (if it
can be computed --- otherwise "no hash") of
the in tuple object.

(2) Serialize the in tuple object into a byte array.

(3) Send, using TCP, the class name, hash value
(or possibly "no hash"), and byte array to the
tuple server for the tuplespace of the in tuple.

(4) Block, waiting for a reply.

(5) When the reply comes (in the form of a byte
array for a tuple object), deserialize it and
return the created object.

Server:

(1) Receive the class name, hash value, and byte
array for the in tuple.

(2) If the class has not been seen before, extend
the tuple hierarchy tree with the class and all its
superclasses.

(3) Go to the node in the tree for the class of
the in tuple. If the hash value was available,
match the in template against all tuples in
the hash bucket for the hash value; otherwise
use the linked list to match exhaustively against
all tuples at the node. Do this recursively for all
subclass nodes of the class of the in tuple until a
match is found or there are no more subclasses
to examine.

(4) If no match is found, store the in tuple
in the in/rd queue at its class’s node.

Implementation

A prototype version of Java-Linda has been implemented incorporating most of the features described above. This section gives an overview of this implementation. The appendix gives the actual source code for a few Java-Linda demo programs (the next section gives informal performance results for test runs of these demos), showing how the classes can be used to build a parallel Java-Linda application. Note that the prototype implementation is a Java application and users also write their Java-Linda programs as applications (not as applets --- it would not be difficult to modify the system so that applets could be used, however, and this will be a future extension). The key client, server, and auxiliary component classes which comprise Java-Linda are as follows:

Server-side classes:

(1) LindaDaemon: LindaDaemon objects run as background processes on nodes which participate in parallel Java-Linda computations. Its job is to handle eval and tupleserver creation requests. For a machine to have tupleservers and evaled processes running on it, it must have a LindaDaemon running. Users need not modify or subclass this class, but rather they simply must start it running on their machine if they want to service eval and tupleserver creation requests. Note that it is this class which will eventually evolve into a full-featured Piranha daemon (i.e., to monitor node usage, put Piranha processes to sleep, call retreat, etc.)

(2) TupleServer: This class implements tupleservers. It is a thread that is started by a LindaDaemon and it works by listening on a TCP port for tuplespace requests (i.e., in, out, eval, etc.) and servicing them accordingly.

(3) EvalServer: This class is used to handle the running of evals (i.e., Eval objects --- see below). An EvalServer object is created and started when a TupleSpace is created. It is a thread which works as follows. It first makes requests to all LindaDaemons which could service evals and tupleservers (i.e., the list of nodes in a user-supplied "tsnet.nodes" file) and asks them to start a special EvalClient thread; the EvalClient thread will execute an infinite loop which continuously asks the EvalServer for an eval to run, run it, and then ask for another eval. The EvalServer is thus a central dispatch point for evals: user code requests to have an eval run by sending an Eval object to the EvalServer and EvalClients request an Eval object to run from the EvalServer. This method of handling evals leads to better dynamic load balance, since Eval objects are run as soon as possible by the earliest available EvalClient.

Client-side classes:

(1) TupleSpace: This class represents a Java-Linda tuplespace and has methods for the standard Linda operations (i.e., in, out, eval, etc.) A TupleSpace is thus a normal Java object, and users can create as many TupleSpaces as they want (which will each have a separate, corresponding TupleServer started by a LindaDaemon).

(2) LindaProgram: This is the class which embodies the Java-Linda computation as a whole. Users must subclass this class and add a realMain static method. The main static method in this class performs some initialization (such as reading in a list of nodes which will serve to run evals and tupleservers) and then calls the realMain method written by the user. The realMain method starts the parallel computation and should create necessary tuplespaces, start evals, and perform computation.

(3) Eval: This is where user’s define their main remote computation functions. This class will extend runnable (so it can be run as an independent thread); i.e., users define a remote, evaled computation by supplying a run method. A user does an "eval" by creating an Eval object and calling TupleSpace.eval(EvalObj) --- this just serializes the Eval object and sends it to the EvalServer object for TupleSpace (see description of EvalServer). Note that evals don’t turn into passive tuples upon completion at the present time (as in standard Linda); this feature is not necessary for Piranha, but will be added in the future.

(4) EvalClient: This class is run by a LindaDaemon and is a thread which continuously requests evals to run and runs them (see description under EvalServer above).

Important auxiliary classes:

(1) ObjTree: This class defines a data structure representing the tuple hierarchy tree described previously. It is used extensively by TupleServer to store tuples.

As stated above, the prototype implementation of Java-Linda provides most of the basic Linda functionality. Two parts of the system which were within the qualified scope of the design (i.e., see the section "Implementing Linda in Java as a First Step" above) but which have not yet been implemented are the following. First, the "tuple matching parser" which compares the serialized, byte arrays of two tuple objects to see if they match has not yet been implemented. In planning out the steps of the implementation, it was decided that, while implementing the "tuple matching parser" would be straightforward (see the discussion of this above), it would also be tedious and time-consuming. More importantly, even without the "tuple matching parser" we can still handle limited, practically useful matching (i.e., we can easily distinguish tuples from different tuple sets because they will be different Java classes and this alone is sufficient for many kinds of master/worker-style parallel programs) and thus do some interesting and practical tests of the system. Thus, it was decided that implementing the "tuple matching parser" could be postponed to the latter part of the project, in order to be able to get some performance results as soon as possible.

Second, while not described previously, to have a truly seamless system in which Java-Linda class files can be dynamically loaded over the network (instead of having to be in the working directory of every running LindaDaemon as in our current prototype) a special Java-Linda ClassLoader is needed. Once again, this is not strictly necessary to carry out performance tests (i.e., we currently simply copy the needed class files to all file systems where LindaDaemons are running) but will be necessary in future versions. Building ClassLoaders is fairly standard fare in Java, however, and will not be a problem for Java-Linda. Implementing the "tuple matching parser" and ClassLoader will be my next implementation endeavor. Finally, we do not handle security in this prototype implementation; as discussed previously, Java provides an excellent framework for dealing with security and we will add support for it in the future.

Testing and Performance Results

Tests were performed to determine the prototype’s correctness and performance. First, simple tests were performed to determine the correctness of the Java-Linda primitives (i.e., in, out, rd, inp, rdp, eval) and all such tests performed correctly. The time to perform a single out, rd, or in is about 200 milliseconds. Next, stress testing of the single tupleserver was done to determine how its performance degrades as more tuplespace requests are made of it. Two tests were done. The first stress test attempts to measure how performance degrades as the total number of requests to the tupleserver increases, where the requests are not dependent (i.e., not in contention), and runs as follows. Evaled processes (each running on a separate CPU) continuously do X outs and then X ins of the outed tuples for some number of iterations and measure the time per iteration; we measure how the performance degrades as the number of evals is increased (i.e., as more processes are continuously banging on the tupleserver), and as the size of outed/ined tuples is increased. The second stress test attempts to measure how performance degrades as the number of requests to the tupleserver increases and the requests are in contention. In particular, the test we performed was to have the main Java-Linda process (i.e., LindaProgram) out a single tuple and then have evaled processes continuously in and out that tuple X times; we measured how performance degraded as the number of evaled processes increased and the size of the tuple increased.

The results of the stress tests can be seen in figures 2 and 3 below. For both figures 2 and 3, X was 15 and the small tuple size was one byte while the big tuple size was 50000 bytes; also, the values plotted are the medians of 5 trials. As can be seen in figure 2, performance is degrading as the number of evals is increased. In addition, the performance degrades by a slightly larger proportion for each doubling of the number of evals; this proportion appears to be approaching 2 for larger numbers of evals. Figure 3 shows that the contention factor has a significant effect and makes the degradation even worse. It is not possible from just these tests to explain the precise reason for these results; we intend to perform more comprehensive tests in the future to understand the tupleserver performance better. A plausible explanation for figure 2 is as follows, however. In general, the time to fulfill a tuplespace request will be greater when there are other requests being fulfilled simultaneously. This is because each tuplespace request is handled by a separate thread at the tupleserver and the threads are run (pseudo) concurrently with preemptive scheduling. Thus, for a small number of evals most of the tuplespace requests will not overlap and will be quickly fulfilled. For a larger number of evals, however, there will be more overlap of requests and more tuplespace requests will be handled simultaneously; thus, the average time to handle a tuplespace request will increase as the number of evals is increased. For figure 3, the same idea should hold but the requests being handled simultaneously will essentially be serialized since there is only one tuple to fulfill all in requests. Once again, more comprehensive tests need to be done to determine precisely the factors affecting tupleserver performance, but the explanation will probably be similar to this. Whatever the explanation, these results indicate that the single tupleserver is a bottleneck for anything but applications that use a small number of CPUs; a multiserver version of Java-Linda will be necessary for acceptable performance in scaling to run on larger numbers of CPUs. In addition, 200 ms per simple in/out/rd seems inefficient, even for Java, so the basic tupleserver performance will have to optimized as well (remember that this work is preliminary and no optimizations have been done yet; correct operation of the system was the top priority). Finally, note that the simple correctness tests were performed on different machines (Windows NT, Solaris, and AIX) while the stress tests were performed all on AIX machines for consistency.

In addition to the correctness and stress tests, we implemented several standard Linda demonstration programs (i.e., similar to C-Linda demonstration programs) to show that Java-Linda can be practically useful even at this early stage in its development. In particular, as stated above, even without fully functional tuple matching Java-Linda as it is now can be used for many master/worker-style programs. To demonstrate this, we implemented master/worker programs for the nqueens problem (i.e., finding the number of legal board positions for n queens such that they are not attacking one other) and the primes finding problem (i.e., finding the number of prime numbers between 1 and some limit). We ran these demos on a heterogeneous mixture of machines (Windows NT, Solaris, and AIX) and the computation proceeded correctly in all cases, thus showing that "write once, run anywhere" applies to Java-Linda as well as Java. Finally, we collected some informal performance results from these demos running on a collection of AIX machines. Since this is an initial version of Java-Linda which has not yet been explicitly optimized for speed, we did not expect great speedups. However, we did achieve distinct speedups and this was a pleasant surprise. In particular, for the primes finding demo we achieved a speedup of about 3 on 8 nodes and a speedup of about 4 on 16 nodes (compared to a single-CPU Java version) when computing the number of primes between 1 and 100,000. It is also important to to note that the AIX Java compiler has just-in-time compiler technology built in and the single-CPU Java version is only three to four times slower than the single-CPU, optimized C version; thus, this clearly shows that the possibility to perform better than native C code could become a reality.                           References

(1) Bjornson, Robert D., "Linda on Distributed Memory Multiprocessors," Yale University Department of Computer Science Ph.D. Thesis, May 1993.

(2) Ciancarini, Paolo, Tolksdorf, Robert, et al., "PageSpace: An Architecture to Coordinate Distributed Applications on the Web," Fifth International World Wide Web Conference, May 6-10, 1996, Paris, France.

(3) Flanagan, David, Java in a Nutshell, O’ Reilly & Associates, Inc., 1996.

(4) Fritzinger, J. Steven, Mueller, Marianne, "Java Security," Sun Microsystems, Inc., 1996. Available at http://www.javasoft.com.

(5) Kaminsky, David L., "Adaptive Parallelism with Piranha," Yale University Department of Computer Science Ph.D. Thesis, May 1994.

(6) Weber, Joseph, et al., Special Edition Using Java, Second Edition, QUE Publishing, 1996.

(7) "Frequently Asked Questions - Applet Security," Sun Microsystems, Inc. Available at http://www.javasoft.com.

(8) "JavaSpace Specification," Sun Microsystems, Inc., 1997. Available at http://chatsubo.javasoft.com.

(9) "Java ObjectSerialization Specification Prebeta Release Revision 1.1," Sun Microsystems, Inc., 1996. Available at http://chatsubo.javasoft.com.

(10) "Java Remote Method Invocation Specification Prebeta Draft Revision 1.1," Sun Microsystems, Inc., 1996. Available at http://chatsubo.javasoft.com.

(11) "The Java Language: A White Paper," Sun Microsystems, Inc., 1995. Available at http://www.javasoft.com/doc/language_environment.

Appendix

The appendix contains the LindaProgram.java and Eval.java files for a simple example Java-Linda program. The program is a very simple master/worker program (called "basic") which demonstrates the concrete details of building a Java-Linda program.

LindaProgram.java.basic
Eval.java.basic
Result.java.basic
Task.java.basic