hidet.cuda¶
Contents¶
Device Management
Returns True if CUDA is available, False otherwise. |
|
Get the number of available CUDA devices. |
|
Get the current cuda device. |
|
Set the current cuda device. |
|
Get the properties of a CUDA device. |
|
Get the compute capability of a CUDA device. |
|
Synchronize the host thread with the device. |
|
Mark the start of a profiling range. |
|
Mark the end of a profiling range. |
Memory Management
Allocate memory on the current device. |
|
Allocate memory on the current device asynchronously. |
|
Allocate pinned host memory. |
|
Free memory on the current cuda device. |
|
Free memory on the current cuda device asynchronously. |
|
Free pinned host memory. |
|
Set the gpu memory to a given value. |
|
Set the gpu memory to given value asynchronously. |
|
Copy gpu memory from one location to another. |
|
Copy gpu memory from one location to another asynchronously. |
|
Get the free and total memory on the current device in bytes. |
Stream and Event
A CUDA stream. |
|
An external CUDA stream created from a handle. |
|
A CUDA event. |
|
Get the current stream. |
|
Get the default stream. |
|
Set the current stream. |
CUDA Graph
Create a cuda graph to capture and replay the execution of a series of cuda kernels launched in a function. |
Device Management¶
- hidet.cuda.available()[source]¶
Returns True if CUDA is available, False otherwise.
Use ctypes to check if libcuda.so is available instead of calling cudart directly.
- Returns:
ret – Whether CUDA is available.
- Return type:
bool
- hidet.cuda.device_count()[source]¶
Get the number of available CUDA devices.
- Returns:
count – The number of available CUDA devices.
- Return type:
int
- hidet.cuda.current_device()[source]¶
Get the current cuda device.
- Returns:
device_id – The ID of the cuda device.
- Return type:
int
- hidet.cuda.set_device(device_id)[source]¶
Set the current cuda device.
- Parameters:
device_id (int) – The ID of the cuda device.
- hidet.cuda.properties(device_id=0)[source]¶
Get the properties of a CUDA device.
- Parameters:
device_id (int) – The ID of the device.
- Returns:
prop – The properties of the device.
- Return type:
cudaDeviceProp
- hidet.cuda.compute_capability(device_id=0)[source]¶
Get the compute capability of a CUDA device.
- Parameters:
device_id (int) – The ID of the device to query.
- Returns:
(major, minor) – The compute capability of the device.
- Return type:
Tuple[int, int]
Memory Allocation¶
- hidet.cuda.malloc(num_bytes)[source]¶
Allocate memory on the current device.
- Parameters:
num_bytes (int) – The number of bytes to allocate.
- Returns:
addr – The address of the allocated memory.
- Return type:
int
- hidet.cuda.malloc_async(num_bytes, stream=None)[source]¶
Allocate memory on the current device asynchronously.
- Parameters:
num_bytes (int) – The number of bytes to allocate.
stream (Optional[Union[Stream, cudaStream_t, int]]) – The stream to use for the allocation. If None, the current stream is used.
- Returns:
addr – The address of the allocated memory. When the allocation failed due to insufficient memory, 0 is returned.
- Return type:
int
- hidet.cuda.malloc_host(num_bytes)[source]¶
Allocate pinned host memory.
- Parameters:
num_bytes (int) – The number of bytes to allocate.
- Returns:
addr – The address of the allocated memory.
- Return type:
int
- hidet.cuda.free(addr)[source]¶
Free memory on the current cuda device.
- Parameters:
addr (int) – The address of the memory to free. This must be the address of memory allocated with
malloc()
ormalloc_async()
.- Return type:
None
- hidet.cuda.free_async(addr, stream=None)[source]¶
Free memory on the current cuda device asynchronously.
- Parameters:
addr (int) – The address of the memory to free. This must be the address of memory allocated with
malloc()
ormalloc_async()
.stream (Union[Stream, cudaStream_t, int], optional) – The stream to use for the free. If None, the current stream is used.
- Return type:
None
- hidet.cuda.free_host(addr)[source]¶
Free pinned host memory.
- Parameters:
addr (int) – The address of the memory to free. This must be the address of memory allocated with
malloc_host()
.- Return type:
None
- hidet.cuda.memset(addr, value, num_bytes)[source]¶
Set the gpu memory to a given value.
- Parameters:
addr (int) – The start address of the memory region to set.
value (int) – The byte value to set the memory region to.
num_bytes (int) – The number of bytes to set.
- Return type:
None
- hidet.cuda.memset_async(addr, value, num_bytes, stream=None)[source]¶
Set the gpu memory to given value asynchronously.
- Parameters:
addr (int) – The start address of the memory region to set.
value (int) – The byte value to set the memory region to.
num_bytes (int) – The number of bytes to set.
stream (Union[Stream, cudaStream_t, int], optional) – The stream to use for the memset. If None, the current stream is used.
- Return type:
None
- hidet.cuda.memcpy(dst, src, num_bytes)[source]¶
Copy gpu memory from one location to another.
- Parameters:
dst (int) – The destination address.
src (int) – The source address.
num_bytes (int) – The number of bytes to copy.
- Return type:
None
- hidet.cuda.memcpy_async(dst, src, num_bytes, stream=None)[source]¶
Copy gpu memory from one location to another asynchronously.
- Parameters:
dst (int) – The destination address.
src (int) – The source address.
num_bytes (int) – The number of bytes to copy.
stream (Union[Stream, cudaStream_t, int], optional) – The stream to use for the memcpy. If None, the current stream is used.
- Return type:
None
CUDA Stream and Event¶
- class hidet.cuda.Stream(device=None, blocking=False, priority=0, **kwargs)[source]¶
A CUDA stream.
- Parameters:
device (int or hidet.Device, optional) – The device on which to create the stream. If None, the current device will be used.
blocking (bool) – Whether to enable the implicit synchronization between this stream and the default stream. When enabled, any operation enqueued in the stream will wait for all previous operations in the default stream to complete before beginning execution.
priority (int) – The priority of the stream. The priority is a hint to the CUDA driver that it can use to reorder operations in the stream relative to other streams. The priority can be 0 (default priority) and -1 (high priority). By default, all streams are created with priority 0.
- device_id()[source]¶
Get the device ID of the stream.
- Returns:
device_id – The device ID of the stream.
- Return type:
int
- handle()[source]¶
Get the handle of the stream.
- Returns:
handle – The handle of the stream.
- Return type:
cudaStream_t
- class hidet.cuda.ExternalStream(handle, device_id=None)[source]¶
An external CUDA stream created from a handle.
- Parameters:
handle (int or cudaStream_t) – The handle of the stream.
device_id (int, optional) – The device ID of the stream. If None, the current device will be used.
- class hidet.cuda.Event(enable_timing=False, blocking=False)[source]¶
A CUDA event.
- Parameters:
enable_timing (bool) – When enabled, the event is able to record the time between itself and another event.
blocking (bool) – When enabled, we can use the
synchronize()
method to block the current host thread until the event completes.
- handle()[source]¶
Get the handle of the event.
- Returns:
handle – The handle of the event.
- Return type:
cudaEvent_t
- elapsed_time(start_event)[source]¶
Get the elapsed time between the start event and this event in milliseconds.
- Parameters:
start_event (Event) – The start event.
- Returns:
elapsed_time – The elapsed time in milliseconds.
- Return type:
float
- record(stream=None)[source]¶
Record the event in the given stream.
After the event is recorded:
We can synchronize the event to block the current host thread until all the tasks before the event are completed via
Event.synchronize()
.We can also get the elapsed time between the event and another event via
Event.elapsed_time()
(when enable_timing is True).We can also let another stream to wait for the event via
Stream.wait_event()
.
- Parameters:
stream (Stream, optional) – The stream where the event is recorded.
- hidet.cuda.current_stream(device=None)[source]¶
Get the current stream.
- Parameters:
device (int or hidet.Device, optional) – The device on which to get the current stream. If None, the current device will be used.
- Returns:
stream – The current stream.
- Return type:
CUDA Graph¶
- class hidet.cuda.graph.CudaGraph(f_create_inputs, f_run, ref_objs)[source]¶
Create a cuda graph to capture and replay the execution of a series of cuda kernels launched in a function.
The graph is created by calling the constructor with the following arguments:
- Parameters:
f_create_inputs (Callable[[], List[Tensor]]) – A function that creates the input tensors of the graph. This function is called before f_run.
f_run (Callable[[List[Tensor]], List[Tensor]]) – A function that runs the graph. Only the cuda kernels launched in this function will be captured. Rerunning this function must launch the same cuda kernels in the same order. The input tensors of this function will be the output tensors of the f_create_inputs function.
ref_objs (Any) – The objects that should keep alive during the lifetime of the cuda graph. It may contain the weight tensors that are used in the graph.
- run(inputs=None)[source]¶
Run the cuda graph synchronously. If the inputs are provided, the inputs will be copied to the internal inputs of the cuda graph before running.