Skip to content

Commit 77ae995

Browse files
committed
drm/i915: Enable userspace to opt-out of implicit fencing
Userspace is faced with a dilemma. The kernel requires implicit fencing to manage resource usage (we always must wait for the GPU to finish before releasing its PTE) and for third parties. However, userspace may wish to avoid this serialisation if it is either using explicit fencing between parties and wants more fine-grained access to buffers (e.g. it may partition the buffer between uses and track fences on ranges rather than the implicit fences tracking the whole object). It follows that userspace needs a mechanism to avoid the kernel's serialisation on its implicit fences before execbuf execution. The next question is whether this is an object, execbuf or context flag. Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on shared winsys buffers, but implicit fencing on internal surfaces) require a per-object level flag. Given that this flag need to be only set once for the lifetime of the object, this reduces the convenience of having an execbuf or context level flag (and avoids having multiple pieces of uABI controlling the same feature). Incorrect use of this flag will result in rendering corruption and GPU hangs - but will not result in use-after-free or similar resource tracking issues. Serious caveat: write ordering is not strictly correct after setting this flag on a render target on multiple engines. This affects all subsequent GEM operations (execbuf, set-domain, pread) and shared dma-buf operations. A fix is possible - but costly (both in terms of further ABI changes and runtime overhead). Testcase: igt/gem_exec_async Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Joonas Lahtinen <[email protected]> Acked-by: Chad Versace <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
1 parent 40f62bb commit 77ae995

File tree

3 files changed

+32
-1
lines changed

3 files changed

+32
-1
lines changed

drivers/gpu/drm/i915/i915_drv.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,7 @@ static int i915_getparam(struct drm_device *dev, void *data,
349349
case I915_PARAM_HAS_EXEC_HANDLE_LUT:
350350
case I915_PARAM_HAS_COHERENT_PHYS_GTT:
351351
case I915_PARAM_HAS_EXEC_SOFTPIN:
352+
case I915_PARAM_HAS_EXEC_ASYNC:
352353
/* For the time being all of these are always true;
353354
* if some supported hardware does not have one of these
354355
* features this value needs to be provided from

drivers/gpu/drm/i915/i915_gem_execbuffer.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1111,6 +1111,9 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
11111111
list_for_each_entry(vma, vmas, exec_list) {
11121112
struct drm_i915_gem_object *obj = vma->obj;
11131113

1114+
if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC)
1115+
continue;
1116+
11141117
ret = i915_gem_request_await_object
11151118
(req, obj, obj->base.pending_write_domain);
11161119
if (ret)

include/uapi/drm/i915_drm.h

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,12 @@ typedef struct drm_i915_irq_wait {
397397
#define I915_PARAM_HAS_SCHEDULER 41
398398
#define I915_PARAM_HUC_STATUS 42
399399

400+
/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of
401+
* synchronisation with implicit fencing on individual objects.
402+
* See EXEC_OBJECT_ASYNC.
403+
*/
404+
#define I915_PARAM_HAS_EXEC_ASYNC 43
405+
400406
typedef struct drm_i915_getparam {
401407
__s32 param;
402408
/*
@@ -737,8 +743,29 @@ struct drm_i915_gem_exec_object2 {
737743
#define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3)
738744
#define EXEC_OBJECT_PINNED (1<<4)
739745
#define EXEC_OBJECT_PAD_TO_SIZE (1<<5)
746+
/* The kernel implicitly tracks GPU activity on all GEM objects, and
747+
* synchronises operations with outstanding rendering. This includes
748+
* rendering on other devices if exported via dma-buf. However, sometimes
749+
* this tracking is too coarse and the user knows better. For example,
750+
* if the object is split into non-overlapping ranges shared between different
751+
* clients or engines (i.e. suballocating objects), the implicit tracking
752+
* by kernel assumes that each operation affects the whole object rather
753+
* than an individual range, causing needless synchronisation between clients.
754+
* The kernel will also forgo any CPU cache flushes prior to rendering from
755+
* the object as the client is expected to be also handling such domain
756+
* tracking.
757+
*
758+
* The kernel maintains the implicit tracking in order to manage resources
759+
* used by the GPU - this flag only disables the synchronisation prior to
760+
* rendering with this object in this execbuf.
761+
*
762+
* Opting out of implicit synhronisation requires the user to do its own
763+
* explicit tracking to avoid rendering corruption. See, for example,
764+
* I915_PARAM_HAS_EXEC_FENCE to order execbufs and execute them asynchronously.
765+
*/
766+
#define EXEC_OBJECT_ASYNC (1<<6)
740767
/* All remaining bits are MBZ and RESERVED FOR FUTURE USE */
741-
#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1)
768+
#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_ASYNC<<1)
742769
__u64 flags;
743770

744771
union {

0 commit comments

Comments
 (0)