Note

Go to the end to download the full example code.

Quick Start¶

This guide walks through the key functionality of Hidet for tensor computation.

Optimize PyTorch model with Hidet¶

Note

torch.compile(...) requires PyTorch 2.3+.

The easiest way to use Hidet is to use the torch.compile() function with hidet as the backend, such as

model_opt = torch.compile(model, backend='hidet')

Next, we use resnet18 model as an example to show how to optimize a PyTorch model with Hidet.

Tip

Because tf32 is enabled by default for torch’s cudnn backend, the torch’s precision is slightly low. You could disable the tf32 (See also PyTorch TF32).

import hidet
import torch

# take resnet18 as an example
x = torch.randn(1, 3, 224, 224, dtype=torch.float16).cuda()
model = torch.hub.load('pytorch/vision:v0.9.0', 'resnet18', pretrained=True, verbose=False)
model = model.cuda().eval().to(torch.float16)

# optimize the model with 'hidet' backend
model_opt = torch.compile(model, backend='hidet', mode='max-autotune')

# run the optimized model
y1 = model_opt(x)
y2 = model(x)

# check the correctness
torch.testing.assert_close(actual=y1, expected=y2, rtol=2e-2, atol=2e-2)


# benchmark the performance
for name, model in [('eager', model), ('hidet', model_opt)]:
    start_event = torch.cuda.Event(enable_timing=True)
    end_event = torch.cuda.Event(enable_timing=True)
    torch.cuda.synchronize()
    start_event.record()
    for _ in range(100):
        y = model(x)
    end_event.record()
    torch.cuda.synchronize()
    print('{:>10}: {:.3f} ms'.format(name, start_event.elapsed_time(end_event) / 100.0))

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<26:45,  3.19it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 848.02it/s]
Gen IR:  10%|██▉                            | 495/5120 [00:01<00:10, 428.45it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 4372.32it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:14<03:35, 14.34s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:14<00:00,  1.11it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.58it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 37.54it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.90s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.10it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<27:58,  3.05it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 867.71it/s]
Gen IR:  34%|█████████▉                   | 1761/5120 [00:00<00:00, 3389.12it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:00<00:00, 6650.55it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:13<03:28, 13.91s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:14<00:00,  1.14it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.26it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 35.74it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.89s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.12it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<26:13,  3.25it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 901.47it/s]
Gen IR:  34%|█████████▉                   | 1761/5120 [00:00<00:00, 3700.03it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:00<00:00, 7025.29it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:13<03:15, 13.03s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:13<00:00,  1.22it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.10it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 35.47it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.88s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.13it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<24:55,  3.42it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:01<00:10, 436.80it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 4656.63it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:13<03:18, 13.22s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:13<00:00,  1.21it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.87it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 37.60it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.88s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.12it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<25:58,  3.29it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 879.40it/s]
Gen IR:  33%|█████████▌                   | 1681/5120 [00:01<00:01, 1857.98it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 4517.34it/s]

Appling fusing:   0%|                                   | 0/177 [00:00<?, ?it/s]
Appling fusing:   1%|▏                          | 1/177 [00:00<01:44,  1.69it/s]
Appling fusing:  28%|███████▏                  | 49/177 [00:00<00:01, 82.21it/s]
Appling fusing:  55%|█████████████▋           | 97/177 [00:01<00:00, 123.71it/s]
Appling fusing:  77%|██████████████████▍     | 136/177 [00:01<00:00, 170.78it/s]
Appling fusing:  92%|██████████████████████▉  | 162/177 [00:01<00:00, 80.87it/s]
Appling fusing: 100%|█████████████████████████| 177/177 [00:01<00:00, 90.77it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:13<03:19, 13.29s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:13<00:00,  1.20it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<29:24,  2.90it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 854.03it/s]
Gen IR:  34%|█████████▉                   | 1761/5120 [00:00<00:01, 3320.49it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:00<00:00, 6490.48it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:12<03:11, 12.76s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:12<00:00,  1.25it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.30it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 36.38it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.89s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.10it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<27:44,  3.08it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:01<00:11, 420.22it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 4442.97it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:14<03:38, 14.59s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:14<00:00,  1.09it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.39it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 36.21it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.91s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.07it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<33:43,  2.53it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:06, 779.38it/s]
Gen IR:  10%|███                            | 501/5120 [00:01<00:10, 420.66it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 4127.89it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:13<03:20, 13.39s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:13<00:00,  1.19it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.70it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 37.81it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.91s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.08it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<27:59,  3.05it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 821.95it/s]
Gen IR:  10%|██▉                            | 493/5120 [00:00<00:05, 801.44it/s]
Gen IR:  34%|█████████▉                   | 1761/5120 [00:01<00:02, 1674.48it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 3763.89it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:13<03:17, 13.15s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:13<00:00,  1.21it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.95it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 38.53it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.95s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:02<00:00,  2.98it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<28:11,  3.03it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:00<00:05, 831.78it/s]
Gen IR:  34%|█████████▉                   | 1761/5120 [00:00<00:00, 3410.13it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:00<00:00, 6530.87it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:16<04:11, 16.78s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:16<00:00,  1.06s/it]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.30it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 35.82it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.98s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:02<00:00,  2.99it/s]

Gen IR:   0%|                                          | 0/5120 [00:00<?, ?it/s]
Gen IR:   0%|                                  | 1/5120 [00:00<28:37,  2.98it/s]
Gen IR:   8%|██▍                            | 401/5120 [00:01<00:10, 431.55it/s]
Gen IR: 100%|█████████████████████████████| 5120/5120 [00:01<00:00, 4549.37it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:12<03:08, 12.60s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:12<00:00,  1.26it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  6.61it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 32.29it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:02<00:10,  2.09s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:02<00:00,  2.71it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.28it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 35.59it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:01<00:09,  1.92s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:01<00:00,  3.04it/s]

Gen IR:   0%|                                             | 0/6 [00:00<?, ?it/s]
Gen IR:  17%|██████▏                              | 1/6 [00:00<00:00,  7.15it/s]
Gen IR: 100%|█████████████████████████████████████| 6/6 [00:00<00:00, 35.27it/s]

Compiling:   0%|                                          | 0/6 [00:00<?, ?it/s]
Compiling:  17%|█████▋                            | 1/6 [00:02<00:10,  2.08s/it]
Compiling: 100%|██████████████████████████████████| 6/6 [00:02<00:00,  2.80it/s]

Gen IR:   0%|                                         | 0/10240 [00:00<?, ?it/s]
Gen IR:   0%|                               | 1/10240 [00:00<1:07:30,  2.53it/s]
Gen IR:   2%|▍                             | 161/10240 [00:00<00:46, 216.22it/s]
Gen IR:   3%|▉                             | 321/10240 [00:00<00:23, 416.77it/s]
Gen IR:   9%|██▊                           | 961/10240 [00:01<00:11, 827.93it/s]
Gen IR: 100%|███████████████████████████| 10240/10240 [00:01<00:00, 6169.22it/s]

Appling fusing:   0%|                                   | 0/241 [00:00<?, ?it/s]
Appling fusing:   0%|                           | 1/241 [00:00<03:15,  1.22it/s]
Appling fusing:  27%|███████                   | 65/241 [00:01<00:02, 71.09it/s]
Appling fusing:  32%|████████▎                 | 77/241 [00:01<00:03, 42.95it/s]
Appling fusing:  60%|███████████████          | 145/241 [00:02<00:01, 79.20it/s]
Appling fusing:  87%|████████████████████▊   | 209/241 [00:02<00:00, 116.41it/s]
Appling fusing:  94%|██████████████████████▌ | 226/241 [00:02<00:00, 113.78it/s]
Appling fusing: 100%|█████████████████████████| 241/241 [00:02<00:00, 86.65it/s]

Compiling:   0%|                                         | 0/16 [00:00<?, ?it/s]
Compiling:   6%|██                               | 1/16 [00:18<04:36, 18.47s/it]
Compiling: 100%|████████████████████████████████| 16/16 [00:18<00:00,  1.16s/it]
     eager: 2.275 ms
     hidet: 0.309 ms

One operator can have multiple equivalent implementations (i.e., kernel programs) with different performance. We usually need to try different implementations for each concrete input shape to find the best one for the specific input shape. This process is called kernel tuning. To enable kernel tuning, we can use the following config in hidet:

# 0 - no tuning, default kernel will be used
# 1 - tuning in a small search space
# 2 - tuning in a large search space, will take longer time and achieves better performance
hidet.torch.dynamo_config.search_space(2)

When kernel tuning is enabled, hidet can achieve the following performance on NVIDIA RTX 4090:

eager: 1.176 ms
hidet: 0.286 ms

Hidet provides some configurations to control the optimization of hidet backend. such as

Search Space: you can choose the search space of operator kernel tuning. A larger schedule space usually achieves the better performance, but takes longer time to optimize.
Correctness Checking: print the correctness checking report. You can know the numerical difference between the hidet generated operator and the original pytorch operator.
Other Configurations: you can also configure the other optimizations of hidet backend, such as using a lower precision of data type automatically (e.g., float16), or control the behavior of parallelization of the reduction dimension of the matrix multiplication and convolution operators.

Define tensors¶

Tip

Besides randn(), we can also use zeros(), ones(), full(), empty() to create tensors with different initialized values. We can use from_torch() to convert a PyTorch tensor to Hidet tensor that shares the same memory. We can also use asarray() to convert python list or numpy ndarray to Hidet tensor.

A tensor is a n-dimension array. As other machine learning framework, Hidet takes Tensor as the core object to compute and manipulate. The following code defines a tensor with randomly initialized tensor with hidet.randn().

a = hidet.randn([2, 3], device='cuda')
print(a)

Tensor(shape=(2, 3), dtype='float32', device='cuda:0')
[[ 0.55223507 -0.38891074  0.46868673]
 [ 1.0056821  -1.8648937  -1.0117671 ]]

Each Tensor has dtype to define the type of each tensor element, and device to tell which device this tensor resides on, and shape to indicate the size of each dimension. The example defines a float32 tensor on cuda device with shape [2, 3].

Run operators¶

Hidet provides a bunch of operators (e.g., matmul() and conv2d()) to compute and manipulate tensors. We can do a matrix multiplication as follows:

b = hidet.randn([3, 2], device='cuda')
c = hidet.randn([2], device='cuda')
d = hidet.ops.matmul(a, b)
d = d + c  # 'd + c' is equivalent to 'hidet.ops.add(d, c)'
print(d)

Tensor(shape=(2, 2), dtype='float32', device='cuda:0')
[[ 0.07206896  1.8569803 ]
 [-2.989338    1.3373796 ]]

In this example, the operator is executed on the device at the time we call it, thus it is in an imperative style of execution. Imperative execution is intuitive and easy to debug. But it prevents some graph-level optimization opportunities and suffers from higher kernel dispatch latency.

In the next section, we would introduce another way to execute operators.

Symbolic tensor and flow graph¶

In hidet, each tensor has an optional storage attribute that represents a block of memory that stores the contents of the tensor. If the storage attribute is None, the tensor is a symbolic tensor. We could use hidet.symbol_like() or hidet.symbol() to create a symbolic tensor. Symbolic tensors are returned if any input tensor of an operator is symbolic. We could know how the symbolic tensor is computed via the trace attribute. It is a tuple (op, idx) where op is the operator produces this tensor and idx is the index of this tensor in the operator’s outputs.

def linear_bias(x, b, c):
    return hidet.ops.matmul(x, b) + c


x = hidet.symbol_like(a)
y = linear_bias(x, b, c)

assert x.trace is None
assert y.trace is not None

print('x:', x)
print('y:', y)

x: Tensor(shape=(2, 3), dtype='float32', device='cuda:0')
y: Tensor(shape=(2, 2), dtype='float32', device='cuda:0')
from (<hidet.graph.ops.arithmetic.AddOp object at 0x745838ec62c0>, 0)

We can use trace attribute to construct the computation graph, starting from the symbolic output tensor(s). This is what function hidet.trace_from() does. In hidet, we use hidet.graph.FlowGraph to represent the data flow graph (a.k.a, computation graph).

graph: hidet.FlowGraph = hidet.trace_from(y)
print(graph)

Graph(x: float32[2, 3][cuda]){
  c = Constant(float32[3, 2][cuda])
  c_1 = Constant(float32[2][cuda])
  x_1: float32[2, 2][cuda] = Matmul(x, c, require_prologue=False, transpose_b=False)
  x_2: float32[2, 2][cuda] = Add(x_1, c_1)
  return x_2
}

Optimize flow graph¶

Tip

We may config optimizations with PassContext. Potential configs:

Whether to use tensor core.
Whether to use low-precision data type (e.g., float16).

Flow graph is the basic unit of graph-level optimizations in hidet. We can optimize a flow graph with hidet.graph.optimize(). This function applies the predefined passes to optimize given flow graph. In this example, we fused the matrix multiplication and element-wise addition into a single operator.

opt_graph: hidet.FlowGraph = hidet.graph.optimize(graph)
print(opt_graph)

Graph(x: float32[2, 3][cuda]){
  c = Constant(float32[2][cuda])
  c_1 = Constant(float32[3, 2][cuda])
  x_1: float32[2, 2][cuda] = FusedBatchMatmul(c, x, c_1, fused_graph=FlowGraph(Broadcast, Broadcast, BatchMatmul, Reshape, Add), anchor=2)
  return x_1
}

Run flow graph¶

We can directly call the flow graph to run it:

y1 = opt_graph(a)
print(y1)

Tensor(shape=(2, 2), dtype='float32', device='cuda:0')
[[ 0.0722584  1.857559 ]
 [-2.9872227  1.338318 ]]

For CUDA device, a more efficient way is to create a cuda graph to dispatch the kernels in a flow graph to the NVIDIA GPU.

cuda_graph = opt_graph.cuda_graph()
outputs = cuda_graph.run([a])
y2 = outputs[0]
print(y2)

Tensor(shape=(2, 2), dtype='float32', device='cuda:0')
[[ 0.0722584  1.857559 ]
 [-2.9872227  1.338318 ]]

Summary¶

In this quick start guide, we walk through several important functionalities of hidet:

Define tensors.
Run operators imperatively.
Use symbolic tensor to create computation graph (e.g., flow graph).
Optimize and run flow graph.

Next Step¶

It is time to learn how to use hidet in your project. A good start is to Optimize PyTorch Model and Optimize ONNX Model with Hidet.

Total running time of the script: (5 minutes 0.661 seconds)

Gallery generated by Sphinx-Gallery

Quick Start

Contents

Quick Start¶

Optimize PyTorch model with Hidet¶

Define tensors¶

Run operators¶

Symbolic tensor and flow graph¶

Optimize flow graph¶

Run flow graph¶

Summary¶

Next Step¶