These are all barriers. Barriers prevent code execution beyond the barrier until some condition is met.
- cudaDeviceSynchronize() halts execution in the CPU/host thread (that the cudaDeviceSynchronize was issued in) until the GPU has finished processing all previously requested cuda tasks (kernels, data copies, etc.)
- cudaThreadSynchronize() as you've discovered, is just a deprecated version of
cudaDeviceSynchronize
. Deprecated just means that it still works for now, but it's recommended not to use it (use cudaDeviceSynchronize instead) and in the future, it may become unsupported. But cudaThreadSynchronize
() and cudaDeviceSynchronize
() are basically identical.
- cudaStreamSynchronize() is similar to the above two functions, but it prevents further execution in the CPU host thread until the GPU has finished processing all previously requested cuda tasks that were issued in the referenced stream. So
cudaStreamSynchronize
() takes a stream id as it's only parameter. cuda tasks issued in other streams may or may not be complete when CPU code execution continues beyond this barrier.
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)