Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Conversation

@annarev
Copy link
Contributor

@annarev annarev commented Jun 12, 2020

The feedback phase will be open for two weeks until Tuesday June 30, 2020.

Status Accepted
RFC # 257
Author(s) Anna Revinskaya ([email protected]), Penporn Koanantakool ([email protected]), Yi Situ ([email protected]), Russell Power ([email protected])
Sponsor Gunhan Gulsoy ([email protected])
Updated 2020-09-08

Objective

Provide basic device management C API to allow new devices to modularly connect
to the current TensorFlow runtime.

yisitu and others added 4 commits June 26, 2020 10:04
Defined a portable bool type TF_BOOL.
Added structure size definitions and updated usage examples.
Fixed typos where member functions are supposed to be member function pointers.
Fixed missing extern "C".
TF_Status* status);

// Causes the host code to synchronously wait for the event to complete.
void (*block_host_for_event)(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @annarev It seems that block_host_until_done is removed, can we keep this API? block_host_until_done is a more large-grained API, it provides more flexibility for the backend to implement, and lots of device runtime already provide the capability to match this API(CUDA: cuStreamSynchronize(stream),SYCL queue.wait() ). I saw current implementation is composed of create_event+ record_event + block_host_for_event + delete event, that means each back-end needs to implements all of these API, which limits the flexibility, and I am not sure whether it will bring some performance penalty (for DPC++, we need to enqueue a barrier to retrieve the event)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@annarev
Copy link
Contributor Author

annarev commented Sep 9, 2020

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

typedef struct SP_DeviceDescription {
  size_t struct_size;
  
  // Device hardware name. Used for printing.
  // Must be null-terminated.
  char* name;
  
  // Device vendor name. Used for printing.
  // Must be null-terminated.
  char* device_vendor;

  // Returns the PCI bus identifier for this device, of the form
  // [domain]:[bus]:[device].[function]
  // Used for printing.
  // Must be null-terminated.
  char* pci_bus_id;

  TF_Bool has_numa_node;  // True if `numa_node` is set.
  // Returns the NUMA node associated with this device, for use in
  // determining socket locality.
  int numa_node;

  TF_Bool has_memory_bandwidth;  // True if `memory_bandwidth` is set.
  // Device's memory bandwidth in bytes/sec.  (This is for reads/writes to/from
  // the device's own memory, not for transfers between the host and device.)
  int64_t memory_bandwidth;

  TF_Bool has_gflops;  // True if `gflops` field is set.
  // Estimate of floating point operations per second for this device * 10e-9.
  double gflops;
} SP_DeviceDescription;

penpornk and others added 10 commits September 9, 2020 10:20
* Added the Implementation Conventions and Usage Overview section.
* Replaced the Stability / User Impact with the Versioning Strategy and Stability section. Also moved the section higher.
* Updated the API design according to the real implementation.
* Updated code examples.
…gableDevice RFC (PR tensorflow#262).

* Referenced semver and TensorFlow's Version Compatibility page.
* Added the Updating Guidelines, Convention, and Detecting Incompatibility sections.
* Merged the current code examples with @yisitu's newer ones on PR tensorflow#262 and simplified them a bit.
Also some more minor edits.
Changed "shall" to "must" to make sure people would not misinterpret "shall".
Updates based on recent changes and discussion on RFC PR tensorflow#262
@kulinseth
Copy link
Contributor

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription for version compatibility. Is that correct?

@annarev
Copy link
Contributor Author

annarev commented Sep 11, 2020

I am planning to add the following SP_DeviceDescription struct. I made some fields optional and some required. @wchao1115, @kulinseth, @jzhoulon let me know if this looks reasonable (for e.g. if some other fields should be optional as well, such as pci_bus_id or if some optional fields should be required).

Thanks!. This looks good to me. Just to confirm, all the versioning information is subsumed in struct_size, and we don't need any extra fields in DeviceDescription for version compatibility. Is that correct?

Yes, we rely on struct_size and version value in SE_PlatformRegistrationParams.

@annarev
Copy link
Contributor Author

annarev commented Sep 17, 2020

I updated the doc to add a way to use a custom allocator:
21be6e9
Addition of a custom allocator was requested in Pluggable Device RFC:
#262

@penpornk
Copy link
Member

This RFC has been up for a while. The implementation has even been checked in over a month ago. I think we should wrap this up (i.e., merge this PR).

Changes to StreamExecutor C API since the last combined design review meeting with PluggableDevice on 8/13/20:

  • Added more explanation of the overall functionality.
  • Added more clarifications on versioning strategy.
  • Corrected typos. Minor functionality additions such as block_host_until_done, unified_memory_allocate/deallocate, etc.

If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

@kulinseth
Copy link
Contributor

If anyone has anything else that needs to be discussed/resolved before this RFC can be merged (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

Thanks Penporn, for reaching out. Nothing from our side.

@penpornk
Copy link
Member

@kulinseth Thank you for the quick confirmation! :)

@ematejska @alextp I believe there is no outstanding issue now. Could we move forward?

@alextp
Copy link
Contributor

alextp commented Sep 25, 2020 via email

Copy link
Contributor

@theadactyl theadactyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wdfnst
Copy link

wdfnst commented Sep 30, 2022

@penpornk Hi, is there any PluggableDevice template for beginner? TKS

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes RFC: Accepted RFC Design Document: Accepted by Review

Projects

None yet

Development

Successfully merging this pull request may close these issues.