Multiprocessor Semaphore

MULTIPROCESSOR SEMAPHOREmethod.
AbstractPeterson's Algorithm
=======Peterson's algorithm, published in 1981, provides an
Shared memory semaphores are basic tools forelegant software solution to the n-process critical
interprocessor synchronization. Althoughself-imposedsection problem and has two distinct advantages
design constraints can often reduce synchronizationover test-and-set spin locks. One is that atomic
requirements, semaphores offer significant flexibilitytest-and-set is not required - the algorithm eliminates
to multiprocessor system designers. Thethe need for special instructions and bus locking. The
implementation presented here illustrates someother is eventual entry - a task waiting for entry to
fundamental issues of multiprocessor concurrencya critical section won't starve in a weakly fair (typical)
and demonstrates the tremendous value of ascheduling environment. Although Peterson's algorithm
multitasking OS like DSP/BIOS.looks deceptively simple, it's a culmination of many
Many variations on this theme are possible. Mostattempts by researchers to solve the critical section
obviously, we can modify the semaphore to handleproblem.
multiple tasks on one processor or tasks on moreThe following pseudo-code shows the entry and exit
than two processors. And because our waitprotocols used to enforce two-process mutual
operation handles notification interrupts and taskexclusion. Note that Peterson adds a secondary
wake-up, we can implement any scheduling policies"turn" variable - this prevents incorrect results from
that make sense for more generalized versions.race conditions and also ensures that each waiting
MULTIPROCESSOR SEMAPHOREtask will eventually enter the critical section.
========Listing 1: Peterson's Algorithminitialization
Multiprocessing architectures are becoming pervasive.P1_wants_entry = P2_wants_entry = FALSEturn =
DSPs are widely used in dense multiprocessorP1task P1
arrangements at the network edge, andP1_wants_entry = TRUE /* set lock */turn = P2 /*
systems-on-chip often include DSP cores togrant turn to other task */loop while
accelerate math-intensive computation. Although DSP(P2_wants_entry and turn == P2) /* buzz waiting
BIOS provides a standard, efficient, robust API forfor lock */critical section /* execute critical section */
uniprocessor applications, we sometimes encounterP1_wants_entry = FALSE /* release lock */task P2
situations where interprocessor synchronization* same logic as P1 */
mechanisms would be very useful. Here we will takeP2_wants_entry = TRUEturn = P1loop while
a look at multiprocessor mutual exclusion and we will(P1_wants_entry and turn == P1)critical section
talk about new method to implement interprocessorP2_wants_entry = FALSE
semaphores using DSP-BIOS.We can easily imagine situations where more than
Many multiprocessor DSP systems are designed totwo processes try to enter their critical sections
share a physical pool of memory in the sense thatconcurrently. Peterson's algorithm can be generalized
each processor sees the memory as directlyto n processes and used to enforce mutual exclusion
addressable - a DSP shares a region of memory withbetween more than two tasks. And other n-process
a host processor or other DSPs-.solutions, such as the bakery algorithm, are readily
A common architecture uses a large region ofavailable in computer science textbooks. Our
single-port RAM shared by all devices, including thediscussion here is limited to the two-process case
host. Although arbitration issues complicate hardwareonly for clarity and brevity. Pseudo-code for the
design. A second architecture uses of dual-port RAMn-process Peterson algorithm can be found on the
between processors. The downside here is theElectric Sand web-site.
relatively high cost and small storage capacity ofBlocking mechanism
these devices - large banks of expensive DPRAM areNow that we have a low-level mutual exclusion tool
seldom practical. But in applications that useto safely manipulate shared data within a semaphore,
segmented data transport or small data sets whereconsider the other key ingredient of semaphores,
small amounts of DPRAM are sufficient, this methodblocking. Assuming that each processor runs DSP
is very useful. DPRAM is relatively fast,BIOS or another multitasking OS, we develop our
designer-friendly and, unlike FIFOs, can store sharedwait operation using services that are already
data structures used for interprocessoravailable on each individual processor. DSP/BIOS
communication.provides a flexible semaphore module (SEM) that we
We need to remember this when using shareduse in our implementation.
memory. When processors have on-chip cache or aWhen the owner of a uniprocessor semaphore
system uses write posting, software designers mustreleases it with a signal system call, the local
pay attention to shared variable coherence.scheduler has immediate knowledge of the signal
Depending on the processor, programmers canevent and can unblock a task waiting on the
disable cache, use cache by-pass, or flush cache tosemaphore. In contrast, a multiprocessor semaphore
ensure that a shared location is in a proper state. Theimplies that the owner and the requestor can reside
cache control API in TI's comprehensive Chip Supporton different processors. Because a remote kernel has
Library, for example, provides an ideal tool for cacheno implicit knowledge of signal calls to a local kernel,
subsystem management. Solutions to write-postthe remote kernel needs timely notification of local
delay problems are system-specific.signal events. Our solution uses interprocessor
Our discussion here assumes that two processorsinterrupts to notify other processors of local activity
use a common shared memory buffer to pass datainvolving a shared semaphore.
or to operate cooperatively on a data set. In eitherSemaphore implementation
case, one or more tasks on the processors mightThis implementation of a multiprocessor binary
need to know the state of the buffer beforesemaphore (MBS) assumes that the hardware
accessing it, and possibly to block while the buffer issupports interprocessor interrupts and that a task
in use. As in the case of single-processor multitasking,won't try to acquire a semaphore while another task
we need a mutual exclusion mechanism to preventon the same processor owns it. The latter restriction
inappropriate concurrent operations on the sharedsimplifies the example and can easily be removed
resource. Let's start with a quick review of mutualwith some additional design work.
exclusion to better understand multiprocessor issues.The wait operation
Shared resource management is a fundamentalMBS_wait is invoked to acquire a shared memory
challenge of multitasking. A task (or thread, orsemaphore. If the semaphore is available, MBS_wait
process) needs the ability to execute sequences ofdecrements it and continues. If the semaphore is
instructions without interference so it can atomicallyalready owned, the requestor task blocks within
manipulate shared data. These sequences, known asMBS_wait until a release notification interrupt makes
critical sections, are bracketed by entry and exitit ready-to-run. Once the interrupt occurs and higher
protocols that satisfy four properties - mutualpriority tasks have relinquished the CPU, the task
exclusion, absence of deadlock, absence ofwaiting on the semaphore wakes up within MBS_wait
unnecessary delay and eventual entry (no starvation).and loops to re-test it. Note that the task doesn't
Our focus here is mutual exclusion - the remainingassume ownership immediately when unblocked.
properties are detailed in any number of textbooksBecause a remote task might re-acquire the
and will be satisfied by our multiprocessor semaphore.semaphore by time the requestor wakes up,
Relative to a shared resource, mutual exclusionMBS_wait loops to compete for the semaphore
requires that only one task at a time execute in aagain.
critical section. Critical section entry and exit protocolsWhen MBS_wait determines that a semaphore is
use mechanisms such as polled flags (often calledunavailable, it sets a notification request flag in the
simple locks or spin locks) or more abstract entitiesshared semaphore data structure to indicate that the
such as blocking semaphores. Simple locks can beprocessor should be interrupted when the semaphore
used to build protection mechanisms of greateris released elsewhere in the system. To avoid a race
complexity.condition known as the lost wake-up problem,
B-SemaphoreMBS_wait atomically tests the semaphore and sets
The semaphore is a system-level abstraction usedthe notification request flag if the semaphore is
for interprocess synchronization. It provides twounavailable.
atomic operations, wait (P) and signal (V), which areCode for the wait operation is divided into two
invoked to manipulate a non-negative integer withindistinct parts. MBS_wait contains the blocking code
the semaphore data structure. The wait operationand is called by an application. MBS_interrupt runs in
checks the value of the integer and eitherresponse to the notification interrupt and posts a
decrements it if positive or blocks the calling task.local signal to the task waiting on the semaphore.
The signal operation either unblocks a task waiting onThis is very similar to a device driver model, where
the semaphore or increments the semaphore if nothe upper part of a driver suspends a task pending I
tasks are waiting. A binary semaphore, with valueO service and the interrupt-driven lower part wakes
limited to 0 and 1, can be used effectively by anit up.
application to guard critical sections.The signal operation
A multiprocessor semaphore can be implemented byMBS_signal releases a semaphore by incrementing its
placing its data structure in shared memory and usingvalue and posting an interrupt to a processor that
RTOS services on each processor to handle blocking.requested release notification. This causes
Before outlining an implementation, let's look at twoMBS_interrupt to execute on the remote processor
aspects of semaphores that cause complications in awhere a task is blocked on the semaphore. Note that
multiprocessor environment. One is low-level mutualthis varies slightly from the uniprocessor signal
exclusion to protect shared data within a semaphoreoperation described earlier where the semaphore is
and the other is wake-up notification when aincremented only if no tasks are blocked.
semaphore is released.Pseudo-code
Low-level mutual exclusionNow that we have a notion of shared semaphore
At its core, a semaphore has a count variable andarchitecture, let's look at pseudo-code describing the
possibly other data elements that must bewait and signal operations. Keep in mind that this
manipulated atomically. System calls use simple mutualexample applies to a two-processor version where
exclusion mechanisms to guard very short criticalonly one task at a time on each processor tries to
sections where the semaphore structure is accessed.acquire the semaphore. General implementations
This prevents incorrect results from concurrentservicing more processors and sharing the semaphore
modification of shared semaphore data.between multiple tasks on each processor can be
In a uniprocessor environment, interrupt masking is aimplemented by using the n-process Peterson
popular technique used to ensure that sequentialalgorithm and modifying the MBS operations.
operations occur without interference. Interrupts areNote that the critical sections, enforced by Peterson's
disabled at the entrance to a critical section andalgorithm (Peterson entry and Peterson exit), are
re-enabled on exit. In a multiprocessor situation,very short instruction sequences used to manipulate
however, this isn't an option. Even if one processorthe semaphore data structure. The details of
could disable the interrupts of another (rarely thePeterson's algorithm are not shown - these are
case), the second processor would still execute animplicit in the Peterson entry/exit operations. The lock
active thread and might inadvertently violate mutualand turn variables used in Peterson's algorithm are
exclusion requirements.distinct from the semaphore data elements accessed
A second technique uses an atomic test-and-set (orin the critical sections.
similar) instruction to manipulate a variable. ThisThe critical sections are preceded with DSP/BIOS
variable might be the semaphore count itself or aTSK_disable calls to prevent task switching. A task
simple lock used to guard critical sections whereswitch during a critical section could cause another
semaphore data is accessed. Either way, a specializedprocessor to spin indefinitely in Peterson entry if it
instruction guarantees atomic read-modify-write in atries to acquire the same semaphore. The critical
multitasking environment. Although this looks like asections should be executed as quickly as possible.
straightforward solution, test-and-set hasAlso note that the example omits error checking,
disadvantages in both uniprocessor andreturn values and timeouts. The pseudo-code is
multiprocessor scenarios. One drawback ismeant to highlight discussion topics rather than
dependence on machine instructions. These varyprovide a detailed implementation template.
across processors, provide only a small number ofMBS_wait () {success = FALSE /* local variable, not
atomic operations and are sometimes unavailable. Apart of semaphore */while (success == FALSE) { /*
second problem is bus locking. If multiple processorsrepeat semaphore acquisition attempt */
share a common bus that doesn't support lockingTSK_disable () /* prevent DSP/BIOS task switch */
during test-and-set, processors might interleavePeterson entry /* Peterson's entry protocol */
accesses to a shared variable at the bus level while/* critical section begins */if (sem_value > 1)
executing seemingly atomic test-and-set instructions.{sem_value = sem_value - 1success = TRUE
And a third problem is test-and-set behavior in}else {notification_request = TRUE
multi-port RAM systems. Even if all buses can be}
locked, simultaneous test-and-set sequences atPeterson exit /* end critical section */
different ports might produce overlapped accesses.TSK_enable () /* re-enable DSP/BIOS scheduler */if
Now consider two approaches that are very useful in(success == FALSE) { /* local variable shows result *
shared memory scenarios. One relies on simple atomic
hardware locks and the other is a general-purposeSEM_pend () /* sleep using DSP/BIOS semaphore */
software solution known as Peterson's algorithm.}
Hardware flags}
In shared memory systems, hardware-assisted}
mutual exclusion can be implemented with special bitMBS_interrrupt {
flags found in multi-port RAMs. DPRAM logic preventsSEM_post () /* local wake-up signal using DSP/BIOS
overlap of concurrent operations on these hardware*/
flags, forcing them to maintain correct state during}
simultaneous accesses. And because processors useMBS_post () { /* release the semaphore */
standard-issue read/write instructions to manipulateTSK_disable ()
the flags, special test-and-set-like instructions are notPeterson entry
required. But this is still a limited solution - software/* start critical section */sem_value = sem_value +1;
engineers often encounter shared memory systems/* increment the semaphore */if (notification_request
that lack this feature. So let's take one more step to== TRUE) { /* notify a remote task?
arrive at a general-purpose hardware-independent