ThreadKit.rtf

This is ThreadKit.rtf in view mode; [Download] [Up]

ThreadKit

Concepts

Introduction

ThreadKit users are assumed to be experienced developers looking for tools to implement multi-threaded features in their applications. This section of the ThreadKit manual is intended to provide a brief introduction to multi-threaded programming concepts. It is by no means a comprehensive treatment of the subject.

It is certainly going to require some time and effort to master the art of multi-threaded programming, but ThreadKit goes a long way to take the sting out of the task. Set aside some time to work through the tutorial included, and keep the class documentation handy at all times!

What is a Thread?

To answer this question, let's take a quick look at the basic concepts behind multitasking operating system designs. Traditional multitasking systems maintain a distinct boundary between separate process, placing each one in a different address space. This prevents unwanted interactions between programs and provides a solid foundation for running multiple concurrent applications.

The traditional approach allows many unrelated processes to take place simultaneously. It does not, however, provide a convenient mechanism for a single application to perform multiple task concurrently. The obvious design solution is to design an application as a set of cooperating processes. It turns out that starting additional processes is quite resource intensive. Worse yet, the barrier between them prevents effective sharing of data. Threads are designed to provide an elegant solution to these problems.

The traditional UNIX task, or process, combines a unique address space with a single thread of control. NEXTSTEP's Mach kernel separates the two concepts, allowing multiple threads of control to exist within a single address space. The term "thread" is shorthand for "thread of control" and refers to a unit of scheduled execution in a multi-threaded environment.

Multiple threads may share a common address space. Creating new threads created within an existing address space and discarding threads can be done with minimal overhead. The fact that these threads can share the same address space allows for rapid communication and cooperative manipulation of data.

What Makes Multi-Threaded Programming Difficult?

The difficulties all arise from the need for concurrent access to data. Allowing multiple threads of control to operate independently on the same data simultaneously has enormous potential for disaster.

Take a simple example: A global counter in memory needs to be incremented by two different threads, we'll call them threads "A" and "B". Each will perform the following actions:

1. Read the counter from memory
2. Add 1 to the value retrieved
3. Store the result back in memory

In our example, let's say that thread "A" reads the counter first and finds a value of zero. If control were to switch to thread "B" immediately, thread "B" would also read a value of zero. Thread "A" and "B" then both increment the value, and store their results. The resulting stored value is one, where it should probably be two.

Of course, the results might well be correct if the operating system were to switch between the threads at a different time. It is because of this that multi-threading applications can exhibit very difficult to reproduce bugs.

The above is a trivial example, but it is representative of the problems with manipulating shared data on a larger scale. A thread needs to ensure that a complete, meaningful atomic action can be applied to the data before another thread can interfere.

This is accomplished through the use of locks controlling access to information. Each thread could acquire an exclusive lock, perform a complete operation, and then release the lock. All you need to do, then, is ensure that every thread follows these semantics when manipulating shared data.

Lock Granularity

Do you really need to place a separate lock on every piece of shared data in your application? Maybe, but probably not.

Consider the extreme alternative: If a single lock controlled access to all shared data, you'd still be able to guarantee safe access. Only one thread at a time would be capable of modifying any shared data. This is simple to implement, but reduces the ability for multiple threads to perform significant work concurrently.

The ideal approach is normally somewhere between having locks on small amounts of data and locks controlling large amounts. The former is often referred to as "fine grained locking", while the latter is called "coarse grained locking". ThreadKit supports either approach for you own application data, and applies coarse grained locking for the entire AppKit, which is not thread-safe.

Thread-Safe Code

Code that performs the necessary locking and unlocking of shared data is often referred to as "thread-safe". You'll probably want to develop entire classes that are thread safe for your own use. The entire ThreadKit is itself thread-safe.

Unfortunately, most NEXTSTEP classes and some traditional UNIX calls aren't thread-safe at all. You can probably assume that use of any AppKit class will manipulate shared information in an unsafe fashion and needs to be protected. Similarly, ctime() is an example of a UNIX call that maintain a global buffer space that can be corrupted if used by multiple threads concurrently. The only way to deal with thread-unsafe code is to provide locking mechanisms wrapping calls to these facilities.

The ThreadKit provides a way to lock any object, even those available from third-party developers, before sending a request. Similarly, the ThreadKit allows arbitrary resources such as the ctime() buffer to be represented by a lock which your applications can use. Lastly, a specialized facility is provided that allows the main thread of your application to use the AppKit without needing to explicitly lock it, while additional threads can lock it explicitly and utilize it safely.

Multi-Threaded NEXTSTEP Applications

Why are they so tough to write? Writing a multi-threaded application for any environment is a non-trivial task, but NEXTSTEP throws us a particularly unpleasant loop: The entire AppKit is not thread-safe, nor is it likely to become so in the near future. Sending a message to an AppKit object from any thread other than the main one typically has disastrous results.

Solutions to this problem have been many, and varied, mostly patterned after the excellent SortingInAction sample code supplied with NEXTSTEP. This approach relies on a protocol for sending Mach messages to the main thread to request behavior on behalf of the sending thread. While the approach works, it makes for code that is difficult to read, and requires that a communication mechanism be established that covers all possible requests.

More recent attempts have used Distributed Objects as a generic communication mechanism for sending messages between threads. While this works well in some situations, performance problems and limitations on Distributed Object messages prevent the approach from providing for all situations.

In an attempt to make multi-threaded programming much more flexible and natural, the ThreadKit takes a different approach entirely: Allowing messages to kit objects to be sent at any time by providing a locking mechanism for the AppKit as a whole.

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.