他们一定采用了一种更加动态的方法,称为sub-group
: https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf https://www.khronos.org/registry/OpenCL/specs/opencl-2.2.pdf
Sub-group: Sub-groups are an implementation-dependent grouping of work-items within a
work-group. The size and number of sub-groups is implementation-defined.
and
Work-groups are further divided into sub-groups,
which provide an additional level of control over execution.
and
The mapping of work-items to
sub-groups is implementation-defined and may be queried at runtime.
所以即使它不被称为波前,它现在可以在运行时查询并且
在没有同步功能(例如屏障)的情况下,
子组内的工作项可以被序列化。在......的存在下
子组功能、子组内的工作项可以序列化
在任何给定的子组函数之前,在动态遇到的之间
子组职能对以及工作组职能和
内核的末尾。
即使步调一致的方式有时也可能会丢失。
除此之外,
sub_group_all() and
sub_group_broadcast() and are described in OpenCL C++ kernel language and IL specifications.
The use of these sub-group functions implies sequenced-before relationships between statements
within the execution of a single work-item in order to satisfy data dependencies.
表示存在某种子组内通信。因为现在 opencl 有子内核定义:
Device-side enqueue: A mechanism whereby a kernel-instance is enqueued by a kernel-instance
running on a device without direct involvement by the host program. This produces nested
parallelism; i.e. additional levels of concurrency are nested inside a running kernel-instance.
The kernel-instance executing on a device (the parent kernel) enqueues a kernel-instance (the
child kernel) to a device-side command queue. Child and parent kernels execute asynchronously
though a parent kernel does not complete until all of its child-kernels have completed.
最终,像这样的东西
kernel void launcher()
{
ndrange_t ndrange = ndrange_1D(1);
enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange,
^{
size_t id = get_global_id(0);
}
);
}
您应该能够生成您自己的(升级的?)波前,具有您需要的任何大小,并且它们与父内核同时工作(并且可以通信子组内线程),但它们不被称为波前,因为它们不是由硬件硬编码的(恕我直言) 。
2.0 api 规范说:
Extreme care should be exercised when writing code that uses
subgroups if the goal is to write portable OpenCL applications.
这让人想起 AMD 的 16 宽 simd 和 nvidia 的 32 宽 simd 与一些虚构的 FPGA 的 95 宽计算核心。也许是伪波前?