Note
Go to the end to download the full example code
Using Rule-based Scheduling¶
In the previous tutorial, we have learned how to define the computation using compute primitives and wrap it into a
Task
. In this tutorial, we will learn how to add an operator (i.e.,
Operator
) with given computation definition, and use hidet’s provided rule-based scheduler to
automatically schedule the computation into a tensor program.
Three steps to define a new operator¶
There are three steps to define a new operator in Hidet.
Batch Matrix Multiplication Example¶
We will take the batch matrix multiplication as an example to illustrate the three steps.
1. Define the computation task class¶
We define the computation task class BatchMatmulTask by inheriting Task
class. The
BatchMatmulTask class’s constructor function takes two arguments, a and b that are the input tensor nodes
of the batch matrix multiplication.
from hidet.ir.compute import TensorNode, compute, reduce
from hidet.ir.task import Task
class BatchMatmulTask(Task):
def __init__(self, a: TensorNode, b: TensorNode):
# get the input sizes
batch_size, m_size, k_size = a.shape
batch_size, k_size, n_size = b.shape
# define the computation
c = compute(
name='c',
shape=[batch_size, m_size, n_size],
fcompute=lambda p, i, j: reduce(
shape=[k_size], fcompute=lambda k: a[p, i, k] * b[p, k, j], reduce_type='sum'
),
)
# call the parent class constructor to initialize the task
super().__init__(
name='batch_matmul', # the name of the task
inputs=[a, b], # the input tensor nodes
outputs=[c], # the output tensor nodes
)
2. Define the operator class¶
Our next step is to define the operator class BatchMatmulOp by inheriting Operator
class.
from hidet.graph import Operator, Tensor
from hidet.graph.ops.utils import input_like
class BatchMatmulOp(Operator):
def __init__(self, a: Tensor, b: Tensor):
# call the parent class constructor to initialize the operator
super().__init__(
inputs=[a, b], # the input tensors
attributes={},
task=BatchMatmulTask( # the task of the operator
# create tensor nodes (TensorNode) with the same shape and dtype as the tensors (Tensor)
input_like(a, 'a'),
input_like(b, 'b'),
),
)
3. Define a function to create the operator instance¶
We define a function batch_matmul to create the operator instance BatchMatmulOp and return the output tensor.
def batch_matmul(a: Tensor, b: Tensor) -> Tensor:
# get_output(0) returns the first output tensor of the operator
return BatchMatmulOp(a, b).outputs[0]
Use the defined operator¶
The new operator has no difference with the hidet provided operators, as we define hidet operators in the same way. For example, when we optimize the flow graph, this new operator can also fuse surrounding operators.
import hidet
def demo_usage():
a = hidet.randn([2, 2, 3])
b = hidet.randn([2, 3, 2])
c = batch_matmul(a, b)
print(a)
print(b)
print(c)
demo_usage()
Tensor(shape=(2, 2, 3), dtype='float32', device='cpu')
[[[ 1.44 1.14 -1.01]
[-0.01 -2.93 0.51]]
[[ 1.38 1.18 1.04]
[ 0.64 0.6 0.77]]]
Tensor(shape=(2, 3, 2), dtype='float32', device='cpu')
[[[ 0.14 -0.82]
[-0.56 -1.91]
[ 0.93 0.78]]
[[ 0.14 0.38]
[-1.55 0.03]
[-1.08 1.53]]]
Tensor(shape=(2, 2, 2), dtype='float32', device='cpu')
[[[-1.38 -4.15]
[ 2.12 6. ]]
[[-2.74 2.15]
[-1.67 1.44]]]
Two Scheduling Machanisms¶
We only define the computation of the operator, and leave the scheduling to the rule-based scheduler provided by hidet. We call this method of scheduling as rule-based scheduling. Most hidet operators are using the same rule-based scheduler as we used in this example. Our experience shows that the rule-based scheduler can achieve good performance for operators that do not have large amount of reduction. However, for operators like matrix multiplication, convolution, etc., the rule-based scheduler may not be able to achieve the best performance as it does not use shared memory to cache the data loading. Thus, hidet also provides another scheduling mechanism, the template-based scheduling.
Summary¶
In this tutorial, we have learned how to define a new operator with given computation definition, and use hidet’s provided rule-based scheduler to automatically schedule the computation into a tensor program. In the next tutorial, we will learn how to use the template-based scheduling to achieve better performance.
Total running time of the script: (0 minutes 0.317 seconds)