__syncthreads()是cuda的内建函数,用于块内线程通信.
__syncthreads() is you garden variety thread barrier. Any thread reaching the barrier waits until all of the other threads in that block also reach it. It is
designed for avoiding race conditions when loading shared memory, and the compiler will not move memory reads/writes around a __syncthreads().
其中,最重要的理解是那些可以到达__syncthreads()的线程需要其他可以到达该点的线程,而不是等待块内所有其他线程。
一般使用__syncthreads()程序结构如下:
1 __share__ val[];
2 ...
3 if(index < n)
4 {