The CML Model

CML provides the following concurrent features:

CML Threads

Figure 6-3 shows the typical structure of a thread in a CML program. The thread may receive messages from one or more input channels on the left. It may write messages to one or more output channels on the right. The thread may also do conventional file I/O. The body of the thread is implemented as a function. Typically this will contain a loop that runs a state machine. The function maps the pair (current state, inputs) to (next state, outputs) and then it loops. The function contains an environment which is the set of variables captured by it from surrounding scopes or passed in as arguments. These will typically supply the channels, files etc. that the thread will communicate on. The function can be written to be pure with all of its state being passed through the arguments.

Figure 6-3. A CML Thread

Threads can be created dynamically and are light-weight enough that you can structure your program with large numbers of threads.

Messages on a channel are normal values and can be of any type. However each channel has a single type. If you need to pass values of different types then you will need to either combine them into a datatype or have separate channels. Functions dealing with channels can be polymorphic over the type of the channel.

The CML model has some parallels with the object-oriented model. Originally classes and objects were introduced in the Simula language [Simula] which was designed for simulating (discrete event-based) real-world systems. Objects represented real-world entities. Today it is common to explain object-oriented concepts by showing how to model the real-world entities as objects and describe the differences, commonalities and relationships between objects using classes. Each object interacts with the other objects by sending and receiving messages. An object contains some private state which may be updated as a result of the messages it receives.

But real-world systems are naturally concurrent. There have been several attempts at designing concurrent object-oriented languages but this is difficult in the imperative programming paradigm because of the necessity to protect the imperative state from concurrent update and to manage the correct ordering of update operations when parts of the system are operating asynchronously. The root of the problem is that state in imperative programs is finely divided into imperative variables and spread throughout the program creating a great many points to pay attention to.

The way I use CML is to think of each thread as representing a concurrent object in the system. The objects will be coarse-grained representing major divisions of the system architecture rather than the fine-grained "everything is an object" idea that some languages push. The body of the object is usually implemented as a pure function with all of the state of the object segregated into a single state value that is passed around outside of the function. The result is a hybrid paradigm that is imperative with objects and state at the top level and is functional at the level of the implementation of the objects.

In a conventional language like Java or C++ in a multi-threaded program a piece of code may be executed by more than one thread at a time. This creates the need for identifying critical sections which must be executed by at most one thread at a time. You could get into similar difficulties in CML if you try to have threads updating shared reference variables. Instead, following a concurrent object paradigm, you would wrap each piece of state into an object which controls access to the state. The object updates the state in response to messages from other objects. It can then be single-threaded internally with each object having its own thread. Since CML threads are light-weight it is not a problem to have large numbers of threads.

The model I'm describing here appears more explicitly in other concurrent languages as a coordination sub-language. These languages have two parts, a sequential language for manipulating the data and a coordination language to control the interaction between the concurrent objects.

An example of a coordination language is Linda[Linda]. Linda is independent of the data language and can be used with a variety of languages, even C. Another interesting language is COOL, the Crisp object coordination language[CoolCrisp]. This coordinates concurrent objects called actors. The actors typically implement finite-state machines. They communicate via asynchronous messages (rather than the synchronous messages of CML). Getting closer to functional languages, there is the new research language called Hume[Hume]. This has a restricted purely functional language in the Haskell mold for the data language. All imperative state is handled at the level of the coordination language. Search for "coordination language" at Google for more examples.

CML Channels

A channel is a rendezvous point between two threads that allows them to pass a value. The value passing is synchronous. The sender of the value waits for the receiver and the receiver waits for the sender.

Each channel you create has a fixed type. The type of a channel is defined in the CML structure named CML[1].

type 'a chan

The type variable 'a is the place holder for the type of values passed through the channel. If you want to pass more than one type of value then you will need to either combine them in a datatype or use more than one channel.

Channels are bidirectional. Any pair of sending thread and receiving thread can use a channel. The following functions defined in the CML structure deal with channels. (See the CML structure for more channel handling functions.)

val channel     : unit -> 'a chan 
val send        : ('a chan * 'a) -> unit 
val recv        : 'a chan -> 'a 
val sendEvt     : ('a chan * 'a) -> unit event 
val recvEvt     : 'a chan -> 'a event 

The channel function creates a new channel. The send and recv functions do just what the name says. The sendEvt and recvEvt functions return events (described in the next section). The event functions allow a thread to choose between several send or receive operations.

CML Events

An event represents some activity that will be completed at a later time. An event is treated like any other value so it can be passed around and stored. An event is said to be enabled when its activity is completed. For example an event might represent the reception of a message on a channel or the completion of some I/O activity.

A program can choose to wait for an event to be completed. The act of waiting is called synchronising and is independent of launching the activity that the event represents. The program can choose one from a collection of events to synchronise on. This is similar to the traditional select or poll system call of Unix but it is more general. An event can represent any concurrent activity such as the completion of a thread.

The type of an event is defined in the CML structure.

type 'a event

An event has an associated data value that is returned when the program synchronises on the event. The type variable 'a is a place-holder for the value's type. The following functions defined in the CML structure handle collections of events. (See the CML structure for more event handling functions.)

val wrap       : ('a event * ('a -> 'b)) -> 'b event 
val choose     : 'a event list -> 'a event 
val sync       : 'a event -> 'a 
val select     : 'a event list -> 'a 
val guard      : (unit -> 'a event) -> 'a event 
val timeOutEvt : Time.time -> unit event 
val atTimeEvt  : Time.time -> unit event 

These functions build up a representation of a network of events. The wrap function associates a function with an event which will process the event's value after the event is synchronised on. The choose function represents the choice of one event from the list of events. The choice is not actually made until a synchronisation is attempted. Then the first enabled event from the list is chosen or if several are enabled then one of them is chosen non-deterministically. A synchronisation is performed on the chosen event returning the event's value.

The following code illustrates the interaction of wrap and choose. The bev values are base events such as the reception of a message. The w values are wrapping functions.

val ev = choose [
    wrap (bev1, w1),
    wrap (choose [
        wrap (bev2, w2),
        wrap (bev3, w3)
        ], w4)
    ]

Figure 6-4 shows the network of events that results. The nodes labelled with "|" represent choices. When a synchronisation is attempted on the event ev then the program will wait for one of the events bev1, bev2 or bev3 to be enabled. If bev2 is the first to be enabled then its returned value will be run through the w2 and w4 functions in that order to produce the value returned by the ev event.

Figure 6-4. A Network of Events

The sync function waits for an event. The select function is equivalent to choose and then a sync but is more efficient.

The guard function associates a function with an event to be run at the time of synchronisation. The function will typically be used to make preparations for the event. This is useful in a choice of events to have preparations specific to each event.

The timeOutEvt function produces an event that becomes enabled after some time interval has passed. The atTimeEvt function is similar but it becomes enabled at a specified point in time.

Synchronous Variables

A synchronous variable is a buffer with a capacity for one value that provides for asynchronous communication between threads. A writer can put a value into a variable without waiting for a reader. There are two kinds, an I-variable is write-once, an M-variable can be written to more than once. The writer cannot overwrite the value in an M-variable. It must wait for it to be emptied by the reader before another value can be written. Symmetrically a reader must wait for a writer to put a value into a variable. CML events are available for waiting on a variable.

I-variables are useful when you want to pass only one message between two threads. For example when replying from a remote procedure call (RPC). They are more efficient than channels for this. For a more complex example, the CML library provides an implementation of multicasting using I-variables.

M-variables are a more general-purpose primitive for building up synchronisation operations. I've only used them to implement a mutex to protect a critical section. The value in the M-variable can be treated as a baton which gives access to the critical section. A thread takes the baton out of the M-variable, performs the critical code and puts the baton back into the M-variable. If another thread tries to take the baton at the same time it will block because the M-variable is empty.

Synchronous variables are implemented in the SyncVar structure of CML. See the CML documentation for more details.

Mailboxes

A mailbox provides a buffer with unlimited capacity for asynchronous communication between threads. They are implemented in the Mailbox structure of CML. Since the capacity is unlimited a program using them should implement some sort of flow control.

Notes

[1]

Read the CML reference documentation along with the following material.