A race hazard or race condition is a flaw in a system or process whereby the output of the process is unexpectedly and critically dependent on the sequence or timing of other events. The term originates with the idea of two signals racing each other to influence the output first.
Race hazards can occur in poorly-designed electronics systems, especially logic circuits, but they can and often do also arise in computer software.
Solution: Design a machine so that each state is sensitive to only one input change.
For example, consider a two input AND gate fed with a logic signal X on input A and its negation, NOT X, on input B. In theory, the output (X AND NOT X) should never be high. However, if changes in the value of X take longer to propagate to input B than to input A then when X changes from false to true, a brief period will ensue during which both inputs are true, and so the gate's output will also be true.
Proper design techniques (e.g. Karnaugh maps—note, the Karnaugh map article includes a concrete example of a race hazard and how to eliminate it) encourage designers to recognise and eliminate race hazards before they cause problems.
As well as these problems, logic gates can enter metastable states, which create further problems for circuit designers.
See critical race and non-critical race for more information on specific types of race hazards.
Race hazards may arise in software, especially when communicating between separate processes or threads of execution.
Here is a simple example:
Let us assume that two threads T1 and T2 each want to increment the value of a global integer by one. Ideally, the following sequence of operations would take place:
In the case shown above, the final value of i is 2, as expected. However, if the two threads run simultaneously without locking or synchronization, the outcome of the operation could be wrong. The alternative sequence of operations below demonstrates this scenario:
The final value of i is 1 instead of the expected result of 2.
For another example, consider the following two tasks, in pseudocode:
global integer A = 0; task Received() { A = A + 1; print "RX"; } task Timeout() // Print only the even numbers { if (A is divisible by 2) { print A; } }
task Received is activated whenever an interrupt is received from the serial controller, and increments the value of A.
task Timeout occurs every second. If A is divisible by 2, it prints A. Output would look something like:
0 0 0 RX RX 2 RX RX 4 4
Now consider this chain of events, which might occur next:
A and finds it is divisible by 2, so elects to execute the "print A" next.
Mutexes are used to address this problem in concurrent programming.
In filesystems, File locking provides a commonly-used solution. A more cumbersome remedy involves reorganizing the system in such a way that one unique process (running a daemon or the like) has exclusive access to the file, and all other processes that need to access the data in that file do so only via interprocess communication with that one process (which of course requires synchronization at the process level).
In networking, consider a distributed chat network like IRC, where a user acquires channel-operator privileges in any channel he starts. If two users on different servers, on different ends of the same network, try to start the same-named channel at the same time, each user's respective server will grant channel-operator privileges to each user, since neither server will yet have received the other server's signal that it has allocated that channel. (Note that this problem has been largely _timestamping_vs._nick.2Fchannel_delay_protocol by various IRC server implementations.)
In this case of a race hazard, the concept of the "shared resource" covers the state of the network (what channels exist, as well as what users started them and therefore have what privileges), which each server can freely change as long as it signals the other servers on the network about the changes so that they can update their conception of the state of the network. However, the latency across the network makes possible the kind of race condition described. In this case, heading off race conditions by imposing a form of control over access to the shared resource—say, appointing one server to control who holds what privileges—would mean turning the distributed network into a centralized one (at least for that one part of the network operation). Where users find such a solution unacceptable, a pragmatic solution can have the system 1) recognize when a race hazard has occurred; and 2) repair the ill effects.
A race condition exemplifies an anti-pattern.
A particularly poignant example of a race condition was one of the problems that plagued the Therac-25 (a Life-critical system) accidents. Another example is the Energy Management System used by Ohio-based FirstEnergy Corp., that had a race condition in the alarm subsystem; when three sagging power lines were tripped simultaneously, the condition prevented alerts being raised to the monitoring technicians. This software flaw eventually led to the North American Blackout of 2003.
Security exploits | Anti-patterns | Concurrency | Programming bugs
Wettlaufsituation | Condición de carrera | Lenktynių aplinka | Hazard (elektronika) | Состояние гонки | 競爭危害
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Race hazard".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world