| MULTIPROCESSOR SEMAPHORE | | | | method. |
| Abstract | | | | Peterson's Algorithm |
| ======= | | | | Peterson's algorithm, published in 1981, provides an |
| Shared memory semaphores are basic tools for | | | | elegant software solution to the n-process critical |
| interprocessor synchronization. Althoughself-imposed | | | | section problem and has two distinct advantages |
| design constraints can often reduce synchronization | | | | over test-and-set spin locks. One is that atomic |
| requirements, semaphores offer significant flexibility | | | | test-and-set is not required - the algorithm eliminates |
| to multiprocessor system designers. The | | | | the need for special instructions and bus locking. The |
| implementation presented here illustrates some | | | | other is eventual entry - a task waiting for entry to |
| fundamental issues of multiprocessor concurrency | | | | a critical section won't starve in a weakly fair (typical) |
| and demonstrates the tremendous value of a | | | | scheduling environment. Although Peterson's algorithm |
| multitasking OS like DSP/BIOS. | | | | looks deceptively simple, it's a culmination of many |
| Many variations on this theme are possible. Most | | | | attempts by researchers to solve the critical section |
| obviously, we can modify the semaphore to handle | | | | problem. |
| multiple tasks on one processor or tasks on more | | | | The following pseudo-code shows the entry and exit |
| than two processors. And because our wait | | | | protocols used to enforce two-process mutual |
| operation handles notification interrupts and task | | | | exclusion. Note that Peterson adds a secondary |
| wake-up, we can implement any scheduling policies | | | | "turn" variable - this prevents incorrect results from |
| that make sense for more generalized versions. | | | | race conditions and also ensures that each waiting |
| MULTIPROCESSOR SEMAPHORE | | | | task will eventually enter the critical section. |
| ======== | | | | Listing 1: Peterson's Algorithminitialization |
| Multiprocessing architectures are becoming pervasive. | | | | P1_wants_entry = P2_wants_entry = FALSEturn = |
| DSPs are widely used in dense multiprocessor | | | | P1task P1 |
| arrangements at the network edge, and | | | | P1_wants_entry = TRUE /* set lock */turn = P2 /* |
| systems-on-chip often include DSP cores to | | | | grant turn to other task */loop while |
| accelerate math-intensive computation. Although DSP | | | | (P2_wants_entry and turn == P2) /* buzz waiting |
| BIOS provides a standard, efficient, robust API for | | | | for lock */critical section /* execute critical section */ |
| uniprocessor applications, we sometimes encounter | | | | P1_wants_entry = FALSE /* release lock */task P2 |
| situations where interprocessor synchronization | | | | * same logic as P1 */ |
| mechanisms would be very useful. Here we will take | | | | P2_wants_entry = TRUEturn = P1loop while |
| a look at multiprocessor mutual exclusion and we will | | | | (P1_wants_entry and turn == P1)critical section |
| talk about new method to implement interprocessor | | | | P2_wants_entry = FALSE |
| semaphores using DSP-BIOS. | | | | We can easily imagine situations where more than |
| Many multiprocessor DSP systems are designed to | | | | two processes try to enter their critical sections |
| share a physical pool of memory in the sense that | | | | concurrently. Peterson's algorithm can be generalized |
| each processor sees the memory as directly | | | | to n processes and used to enforce mutual exclusion |
| addressable - a DSP shares a region of memory with | | | | between more than two tasks. And other n-process |
| a host processor or other DSPs-. | | | | solutions, such as the bakery algorithm, are readily |
| A common architecture uses a large region of | | | | available in computer science textbooks. Our |
| single-port RAM shared by all devices, including the | | | | discussion here is limited to the two-process case |
| host. Although arbitration issues complicate hardware | | | | only for clarity and brevity. Pseudo-code for the |
| design. A second architecture uses of dual-port RAM | | | | n-process Peterson algorithm can be found on the |
| between processors. The downside here is the | | | | Electric Sand web-site. |
| relatively high cost and small storage capacity of | | | | Blocking mechanism |
| these devices - large banks of expensive DPRAM are | | | | Now that we have a low-level mutual exclusion tool |
| seldom practical. But in applications that use | | | | to safely manipulate shared data within a semaphore, |
| segmented data transport or small data sets where | | | | consider the other key ingredient of semaphores, |
| small amounts of DPRAM are sufficient, this method | | | | blocking. Assuming that each processor runs DSP |
| is very useful. DPRAM is relatively fast, | | | | BIOS or another multitasking OS, we develop our |
| designer-friendly and, unlike FIFOs, can store shared | | | | wait operation using services that are already |
| data structures used for interprocessor | | | | available on each individual processor. DSP/BIOS |
| communication. | | | | provides a flexible semaphore module (SEM) that we |
| We need to remember this when using shared | | | | use in our implementation. |
| memory. When processors have on-chip cache or a | | | | When the owner of a uniprocessor semaphore |
| system uses write posting, software designers must | | | | releases it with a signal system call, the local |
| pay attention to shared variable coherence. | | | | scheduler has immediate knowledge of the signal |
| Depending on the processor, programmers can | | | | event and can unblock a task waiting on the |
| disable cache, use cache by-pass, or flush cache to | | | | semaphore. In contrast, a multiprocessor semaphore |
| ensure that a shared location is in a proper state. The | | | | implies that the owner and the requestor can reside |
| cache control API in TI's comprehensive Chip Support | | | | on different processors. Because a remote kernel has |
| Library, for example, provides an ideal tool for cache | | | | no implicit knowledge of signal calls to a local kernel, |
| subsystem management. Solutions to write-post | | | | the remote kernel needs timely notification of local |
| delay problems are system-specific. | | | | signal events. Our solution uses interprocessor |
| Our discussion here assumes that two processors | | | | interrupts to notify other processors of local activity |
| use a common shared memory buffer to pass data | | | | involving a shared semaphore. |
| or to operate cooperatively on a data set. In either | | | | Semaphore implementation |
| case, one or more tasks on the processors might | | | | This implementation of a multiprocessor binary |
| need to know the state of the buffer before | | | | semaphore (MBS) assumes that the hardware |
| accessing it, and possibly to block while the buffer is | | | | supports interprocessor interrupts and that a task |
| in use. As in the case of single-processor multitasking, | | | | won't try to acquire a semaphore while another task |
| we need a mutual exclusion mechanism to prevent | | | | on the same processor owns it. The latter restriction |
| inappropriate concurrent operations on the shared | | | | simplifies the example and can easily be removed |
| resource. Let's start with a quick review of mutual | | | | with some additional design work. |
| exclusion to better understand multiprocessor issues. | | | | The wait operation |
| Shared resource management is a fundamental | | | | MBS_wait is invoked to acquire a shared memory |
| challenge of multitasking. A task (or thread, or | | | | semaphore. If the semaphore is available, MBS_wait |
| process) needs the ability to execute sequences of | | | | decrements it and continues. If the semaphore is |
| instructions without interference so it can atomically | | | | already owned, the requestor task blocks within |
| manipulate shared data. These sequences, known as | | | | MBS_wait until a release notification interrupt makes |
| critical sections, are bracketed by entry and exit | | | | it ready-to-run. Once the interrupt occurs and higher |
| protocols that satisfy four properties - mutual | | | | priority tasks have relinquished the CPU, the task |
| exclusion, absence of deadlock, absence of | | | | waiting on the semaphore wakes up within MBS_wait |
| unnecessary delay and eventual entry (no starvation). | | | | and loops to re-test it. Note that the task doesn't |
| Our focus here is mutual exclusion - the remaining | | | | assume ownership immediately when unblocked. |
| properties are detailed in any number of textbooks | | | | Because a remote task might re-acquire the |
| and will be satisfied by our multiprocessor semaphore. | | | | semaphore by time the requestor wakes up, |
| Relative to a shared resource, mutual exclusion | | | | MBS_wait loops to compete for the semaphore |
| requires that only one task at a time execute in a | | | | again. |
| critical section. Critical section entry and exit protocols | | | | When MBS_wait determines that a semaphore is |
| use mechanisms such as polled flags (often called | | | | unavailable, it sets a notification request flag in the |
| simple locks or spin locks) or more abstract entities | | | | shared semaphore data structure to indicate that the |
| such as blocking semaphores. Simple locks can be | | | | processor should be interrupted when the semaphore |
| used to build protection mechanisms of greater | | | | is released elsewhere in the system. To avoid a race |
| complexity. | | | | condition known as the lost wake-up problem, |
| B-Semaphore | | | | MBS_wait atomically tests the semaphore and sets |
| The semaphore is a system-level abstraction used | | | | the notification request flag if the semaphore is |
| for interprocess synchronization. It provides two | | | | unavailable. |
| atomic operations, wait (P) and signal (V), which are | | | | Code for the wait operation is divided into two |
| invoked to manipulate a non-negative integer within | | | | distinct parts. MBS_wait contains the blocking code |
| the semaphore data structure. The wait operation | | | | and is called by an application. MBS_interrupt runs in |
| checks the value of the integer and either | | | | response to the notification interrupt and posts a |
| decrements it if positive or blocks the calling task. | | | | local signal to the task waiting on the semaphore. |
| The signal operation either unblocks a task waiting on | | | | This is very similar to a device driver model, where |
| the semaphore or increments the semaphore if no | | | | the upper part of a driver suspends a task pending I |
| tasks are waiting. A binary semaphore, with value | | | | O service and the interrupt-driven lower part wakes |
| limited to 0 and 1, can be used effectively by an | | | | it up. |
| application to guard critical sections. | | | | The signal operation |
| A multiprocessor semaphore can be implemented by | | | | MBS_signal releases a semaphore by incrementing its |
| placing its data structure in shared memory and using | | | | value and posting an interrupt to a processor that |
| RTOS services on each processor to handle blocking. | | | | requested release notification. This causes |
| Before outlining an implementation, let's look at two | | | | MBS_interrupt to execute on the remote processor |
| aspects of semaphores that cause complications in a | | | | where a task is blocked on the semaphore. Note that |
| multiprocessor environment. One is low-level mutual | | | | this varies slightly from the uniprocessor signal |
| exclusion to protect shared data within a semaphore | | | | operation described earlier where the semaphore is |
| and the other is wake-up notification when a | | | | incremented only if no tasks are blocked. |
| semaphore is released. | | | | Pseudo-code |
| Low-level mutual exclusion | | | | Now that we have a notion of shared semaphore |
| At its core, a semaphore has a count variable and | | | | architecture, let's look at pseudo-code describing the |
| possibly other data elements that must be | | | | wait and signal operations. Keep in mind that this |
| manipulated atomically. System calls use simple mutual | | | | example applies to a two-processor version where |
| exclusion mechanisms to guard very short critical | | | | only one task at a time on each processor tries to |
| sections where the semaphore structure is accessed. | | | | acquire the semaphore. General implementations |
| This prevents incorrect results from concurrent | | | | servicing more processors and sharing the semaphore |
| modification of shared semaphore data. | | | | between multiple tasks on each processor can be |
| In a uniprocessor environment, interrupt masking is a | | | | implemented by using the n-process Peterson |
| popular technique used to ensure that sequential | | | | algorithm and modifying the MBS operations. |
| operations occur without interference. Interrupts are | | | | Note that the critical sections, enforced by Peterson's |
| disabled at the entrance to a critical section and | | | | algorithm (Peterson entry and Peterson exit), are |
| re-enabled on exit. In a multiprocessor situation, | | | | very short instruction sequences used to manipulate |
| however, this isn't an option. Even if one processor | | | | the semaphore data structure. The details of |
| could disable the interrupts of another (rarely the | | | | Peterson's algorithm are not shown - these are |
| case), the second processor would still execute an | | | | implicit in the Peterson entry/exit operations. The lock |
| active thread and might inadvertently violate mutual | | | | and turn variables used in Peterson's algorithm are |
| exclusion requirements. | | | | distinct from the semaphore data elements accessed |
| A second technique uses an atomic test-and-set (or | | | | in the critical sections. |
| similar) instruction to manipulate a variable. This | | | | The critical sections are preceded with DSP/BIOS |
| variable might be the semaphore count itself or a | | | | TSK_disable calls to prevent task switching. A task |
| simple lock used to guard critical sections where | | | | switch during a critical section could cause another |
| semaphore data is accessed. Either way, a specialized | | | | processor to spin indefinitely in Peterson entry if it |
| instruction guarantees atomic read-modify-write in a | | | | tries to acquire the same semaphore. The critical |
| multitasking environment. Although this looks like a | | | | sections should be executed as quickly as possible. |
| straightforward solution, test-and-set has | | | | Also note that the example omits error checking, |
| disadvantages in both uniprocessor and | | | | return values and timeouts. The pseudo-code is |
| multiprocessor scenarios. One drawback is | | | | meant to highlight discussion topics rather than |
| dependence on machine instructions. These vary | | | | provide a detailed implementation template. |
| across processors, provide only a small number of | | | | MBS_wait () {success = FALSE /* local variable, not |
| atomic operations and are sometimes unavailable. A | | | | part of semaphore */while (success == FALSE) { /* |
| second problem is bus locking. If multiple processors | | | | repeat semaphore acquisition attempt */ |
| share a common bus that doesn't support locking | | | | TSK_disable () /* prevent DSP/BIOS task switch */ |
| during test-and-set, processors might interleave | | | | Peterson entry /* Peterson's entry protocol */ |
| accesses to a shared variable at the bus level while | | | | /* critical section begins */if (sem_value > 1) |
| executing seemingly atomic test-and-set instructions. | | | | {sem_value = sem_value - 1success = TRUE |
| And a third problem is test-and-set behavior in | | | | }else {notification_request = TRUE |
| multi-port RAM systems. Even if all buses can be | | | | } |
| locked, simultaneous test-and-set sequences at | | | | Peterson exit /* end critical section */ |
| different ports might produce overlapped accesses. | | | | TSK_enable () /* re-enable DSP/BIOS scheduler */if |
| Now consider two approaches that are very useful in | | | | (success == FALSE) { /* local variable shows result * |
| shared memory scenarios. One relies on simple atomic | | | | |
| hardware locks and the other is a general-purpose | | | | SEM_pend () /* sleep using DSP/BIOS semaphore */ |
| software solution known as Peterson's algorithm. | | | | } |
| Hardware flags | | | | } |
| In shared memory systems, hardware-assisted | | | | } |
| mutual exclusion can be implemented with special bit | | | | MBS_interrrupt { |
| flags found in multi-port RAMs. DPRAM logic prevents | | | | SEM_post () /* local wake-up signal using DSP/BIOS |
| overlap of concurrent operations on these hardware | | | | */ |
| flags, forcing them to maintain correct state during | | | | } |
| simultaneous accesses. And because processors use | | | | MBS_post () { /* release the semaphore */ |
| standard-issue read/write instructions to manipulate | | | | TSK_disable () |
| the flags, special test-and-set-like instructions are not | | | | Peterson entry |
| required. But this is still a limited solution - software | | | | /* start critical section */sem_value = sem_value +1; |
| engineers often encounter shared memory systems | | | | /* increment the semaphore */if (notification_request |
| that lack this feature. So let's take one more step to | | | | == TRUE) { /* notify a remote task? |
| arrive at a general-purpose hardware-independent | | | | |