首先,您需要确定您想要实现的目标是什么,以及不同线程的指令可以通过哪些方式交织并防止这种情况发生。
C中的自增运算符++x
通常实现为将 i 的值从内存读取到寄存器中;增加寄存器;将值写入内存:
r1 ← xglobal
r1 ← r1 + 1
xglobal ← r1
So the value of xglobal is incremented by one.
如果您有两个并行线程,那么它们可能会破坏性地交错
initial xglobal = 99
r1 ← xglobal
compare r1 100
r2 ← xglobal
compare r2 100
r1 ← r1 + 1 == 100
r2 ← r2 + 1 == 100
xglobal ← r1 == 100
xglobal ← r2 == 100
r1 ← xglobal
compare r1 100
stop
r2 ← xglobal
compare r1 100
stop
final xglobal = 100
So the value of xglobal is incremented by one, despite both threads incrementing it.
(我忽略了缓存的影响,这意味着线程 2 中变量的读取可以表现得好像它发生在线程 1 中的写入之前,即使线程 1 的写入发生在挂钟时间读入之前。在 pthread 中获取和释放互斥体会导致内存屏障,迫使所有读取行为都发生在获取或释放之后,而写入行为就像发生在获取或释放之前。)
(上面相当于for ( int r; ( r = x ) < 100; x = r + 1 )
而不是for (; x < 100; x = x + 1 )
这可能会对 x 进行额外的读取,因此还有另一个线程可能干扰的点)
类似地,一个线程的增量可能会破坏另一个线程的增量,从而允许线程以 i
initial xglobal = 98
r2 ← xglobal
r1 ← xglobal
compare r1 100
r1 ← r1 + 1 == 99
xglobal ← r1
r1 ← xglobal
compare r1 100
r1 ← r1 + 1 == 100
xglobal ← r1
r1 ← xglobal
compare r1 100
stop
compare r2 100
r2 ← r2 + 1 == 99
xglobal ← r2
...
final xglobal = 99
因此,左线程的第二个增量被第一个增量覆盖,并且它将以全局可见值 x
您可能知道所有这些,并且可能希望使用一种机制来防范它。
我说“可能”是因为你的要求不清楚 - 当 x 达到 100 时上面的线程终止;要求没说,那里也没说。
So, since no thread will terminate without writing xglobal ← 100, the requirement may in fact be met without any locking, but x
may be incremented n
*100 times rather than 100 times. ( if the limit was larger than a byte then the writing of x might be non-atomic on some platforms, which could result in an infinite loop if bytes from different threads are mixed together, but for a limit of 100 that won't happen )
One technique is to use a mutex http://opengroup.org/onlinepubs/007908775/xsh/pthread_mutex_init.html which blocks other threads from running when one thread holds the lock on the mutex. If the lock is acquired before xglobal is read, and not released until xglobal is written, then the reads and writes of the thread cannot interlace.
initial xglobal = 98
lock (mutex)
mutex locked
lock(mutex)
blocked
r1 ← xglobal
compare r1 100
r1 ← r1 + 1 == 99
xglobal ← r1
release ( mutex )
mutex locked
r2 ← xglobal
compare r2 100
r2 ← r2 + 1 == 100
xglobal ← r2
release ( mutex )
lock (mutex)
mutex locked
r1 ← xglobal
compare r1 100
release ( mutex )
stop
...
final xglobal = 100
在 pthreads 之外,您可能想要使用平台的比较和交换操作(__sync_val_compare_and_swap
in gcc http://gcc.gnu.org/onlinedocs/gcc-4.4.1/gcc/Atomic-Builtins.html#Atomic-Builtins),它采用一个地址、一个旧值和一个新值,并且如果该地址处的内存等于旧值,则自动将其设置为新值。这使您可以将逻辑编写为:
for ( int v = 0; v < 100; ) {
int x_prev = __sync_val_compare_and_swap ( &x, v, v + 1 );
// if the CAS succeeds, the value of x has been set to is x_prev + 1
// otherwise, try again from current last value
if ( x_prev == v )
v = x_prev + 1;
else
v = x_prev;
}
So if
initial xglobal = 98
initial v1 = 0
initial v2 = 0
cmp v1 100
x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set fails with x == 98 )
cmp v2 100
x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set fails with x == 98 )
v1 ← x_prev1 = 98 // x_prev != v
v2 ← x_prev2 = 98
cmp v2 100
x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 98 ( set succeeds with x == 99 )
v2 ← x_prev2 + 1 = 99 // as x_prev == v
cmp v1 100
x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 99 ( set fails with x == 99 )
v1 ← x_prev1 = 99 // as x_prev != v
cmp v1 100
x_prev1 ← CASV ( xglobal, v1, v1 + 1 ) = 99 ( set succeeds with x == 100)
v1 ← x_prev1 + 1 = 100 // as x_prev == v
cmp v2 100
x_prev2 ← CASV ( xglobal, v1, v1 + 1 ) = 100 ( set fails with x == 100 )
v2 ← x_prev2 = 100 // as x_prev != v
cmp v1 100
cmp v2 100
stop
stop
On each loop, xglobal will atomically be set to the value of r1 + 1 if and only if its previous value was r1; if not, r1 will be set to the value of xglobal tested during the CASV operation. This reduces the amount of time locks are held on most implementations ( though it still requires locking the memory bus for the duration of the CAS operation, only those operations will be serialised. As performing the CAS is expensive on multi-cores, it probably won't be much better for such a simple case as this. )