hidet.option

Classes:

OptionContext()

The option context.

compile_server()

Compilation server related options.

cpu()

The CPU related options.

cuda()

The CUDA related options.

internal()

Internal options.

Functions:

bench_config([warmup, number, repeat])

Set the benchmark config of operator tuning.

cache_dir(new_dir)

Set the directory to store the cache.

cache_operator([enabled])

Whether to cache compiled operator on disk.

context()

Create a new option context.

current_context()

Get the current option context.

debug_cache_tuning([enabled])

Whether to cache the generated kernels during tuning.

debug_enable_var_id([enable])

Whether to enable var id in the IR.

debug_show_var_id([enable])

Whether to show the var id in the IR.

debug_show_verbose_flow_graph([enable])

Whether to show verbose information (like task) when we convert flow graph in to human-readable text.

debug_strict_broadcast_check([enable])

Whether to enforce equality of shapes in symbolic broadcasts.

dump_options()

Dump the options in option context stack.

execution_mode([kind])

Use 'symbolic', 'interpreter', or 'compilation' mode for run() function in Operator allowed.

fix_gpu_frequency_for_tuning([enabled])

Whether to fix the GPU frequency during tuning to avoid frequency throttling.

get_bench_config()

Get the benchmark config of operator tuning.

get_cache_dir()

Get the directory to store the cache.

get_cache_operator()

Get the option value of whether to cache compiled operator on disk.

get_execution_mode()

Get which of 'symbolic', 'interpreter', 'compilation' mode to use for run() function in Operator allowed.

get_hexcute_matmul()

Get strategy to enable the hexcute matmul kernels.

get_num_local_workers()

Get the number of local worker processes to use for parallel compilation/tuning.

get_option(name)

Get the value of an option in current option context.

get_parallel_build()

Get the option value of whether to build operators in parallel.

get_parallel_k()

Get parallelization on k dimension of the matrix multiplication

get_parallel_tune()

Get the option value of whether to build operators in parallel.

get_runtime_check()

Get whether to check shapes and dtypes of all input arguments to compiled Graphs or Tasks.

get_save_lower_ir()

Get the option value of whether to save the lower IR.

get_search_space()

Get the schedule search space of tunable operator.

hexcute_matmul([strategy])

Whether to enable hexcute matmul kernels, such as the Hexcute matmul kernels.

is_fix_gpu_frequency_for_tuning()

Get the option value of whether to fix the GPU frequency during tuning to avoid frequency throttling.

is_option_exist(name)

Checking is options exist/registered.

is_use_torch_stream()

Checking if currently is using torch stream

num_local_workers([num_workers])

Set the number of local worker processes to use for parallel compilation/tuning.

parallel_build([enabled])

Whether to build operators in parallel.

parallel_k([strategy])

Parallelization on k dimension of the matrix multiplication Candidates are: default, disabled, search, 2, 3, 4...

parallel_tune([max_parallel_jobs, ...])

Specify the maximum number of parallel compilation jobs to do, and the number of GiB preserved for each job.

restore_options(dumped_options)

Restore the options from dumped options.

runtime_check([enable])

Whether to check shapes and dtypes of all input arguments to compiled Graphs or Tasks.

save_lower_ir([enabled])

Whether to save the lower IR.

search_space(space)

Set the schedule search space of tunable operator.

set_option(name, value)

Set the value of an option in current option context.

use_torch_stream(use)

Set the flag for whether to use torch steam

class hidet.option.OptionContext[source]

The option context.

Methods:

current()

Get the current option context.

get_option(name)

Get the value of an option in the self option context.

set_option(name, value)

Set the value of an option in the self option context.

static current()[source]

Get the current option context.

Returns:

ret – The current option context.

Return type:

OptionContext

get_option(name)[source]

Get the value of an option in the self option context.

Parameters:

name (str) – The name of the option.

Returns:

ret – The value of the option.

Return type:

Any

set_option(name, value)[source]

Set the value of an option in the self option context.

Parameters:
  • name (str) – The name of the option.

  • value (Any) – The value of the option.

class hidet.option.compile_server[source]

Compilation server related options.

Methods:

addr(addr)

Set the address of the compile server.

enable([flag])

Enable or disable the compile server.

enabled()

Get whether the compile server is enabled.

get_num_workers()

Get the number of worker processes of the compile server.

num_workers(num_workers)

Set the number of worker processes of the compile server.

password(password)

Set the password to access the compile server.

port(port)

Set the port of the compile server.

repo(repo_url[, version])

Set the repository that the remote server will use.

username(username)

Set the username to access the compile server.

static addr(addr)[source]

Set the address of the compile server.

Parameters:

addr (str) – The address of the compile server. Can be an IP address or a domain name.

static enable(flag=True)[source]

Enable or disable the compile server.

The compile server is disabled by default. We need to enable it before using it.

Parameters:

flag (bool) – Whether to enable the compile server.

static enabled()[source]

Get whether the compile server is enabled.

Returns:

ret – Whether the compile server is enabled.

Return type:

bool

static get_num_workers()[source]

Get the number of worker processes of the compile server.

Returns:

ret – The number of worker processes of the compile server.

Return type:

int

static num_workers(num_workers)[source]

Set the number of worker processes of the compile server.

Parameters:

num_workers (int) – The number of worker processes of the compile server.

static password(password)[source]

Set the password to access the compile server.

Parameters:

password (str) – The password to access the compile server.

static port(port)[source]

Set the port of the compile server.

Parameters:

port (int) – The port of the compile server.

static repo(repo_url, version='main')[source]

Set the repository that the remote server will use.

When we compile a tensor program with remote server, it will clone the given repository and checkout to the given version. Then, it will use the code in the repository to compile the tensor program. Thus, it is important to make sure the code in the repository is consistent with the code used to compile the tensor program.

Parameters:
  • repo_url (str) – The URL of the repository that the remote server will use. By default, it is the official repository of hidet hidet-org/hidet.

  • version (str) – The version (e.g., branch, commit, or tag) that the remote server will use. By default, it is the main branch: ‘main’.

static username(username)[source]

Set the username to access the compile server.

Parameters:

username (str) – The username to access the compile server.

class hidet.option.cpu[source]

The CPU related options.

Methods:

arch([arch])

Set the CPU architecture to use when building CPU kernels.

get_arch()

Get the CPU architecture to use when building CPU kernels.

static arch(arch='auto')[source]

Set the CPU architecture to use when building CPU kernels.

Parameters:

arch (Optional[str]) – The CPU architecture, e.g., ‘x86-64’, ‘alderlake’, etc. “auto” means using the architecture of the CPU on the current machine. Default “auto”.

static get_arch()[source]

Get the CPU architecture to use when building CPU kernels.

Returns:

ret – The CPU architecture, e.g., ‘x86-64’, ‘alderlake’, etc.

Return type:

str

class hidet.option.cuda[source]

The CUDA related options.

Methods:

arch([arch])

Set the CUDA architecture to use when building CUDA kernels.

get_arch()

Get the CUDA architecture to use when building CUDA kernels.

get_arch_pair()

Get the CUDA architecture to use when building CUDA kernels, with major and minor version as a tuple.

static arch(arch='auto')[source]

Set the CUDA architecture to use when building CUDA kernels.

Parameters:

arch (Optional[str]) – The CUDA architecture, e.g., ‘sm_35’, ‘sm_70’, ‘sm_80’, etc. “auto” means using the architecture of the first CUDA GPU on the current machine. Default “auto”.

static get_arch()[source]

Get the CUDA architecture to use when building CUDA kernels.

Returns:

ret – The CUDA architecture, e.g., ‘sm_35’, ‘sm_70’, ‘sm_80’, etc.

Return type:

str

static get_arch_pair()[source]

Get the CUDA architecture to use when building CUDA kernels, with major and minor version as a tuple.

Returns:

ret – The CUDA architecture, e.g., (3, 5), (7, 0), (8, 0), etc.

Return type:

Tuple[int, int]

class hidet.option.internal[source]

Internal options.

Methods:

is_torch_api_use_example_input_shapes()

Get whether to use example_inputs shapes instead of fx.graph shapes.

torch_api_use_example_input_shapes([enable])

Applicable when using torch.compile only.

static is_torch_api_use_example_input_shapes()[source]

Get whether to use example_inputs shapes instead of fx.graph shapes.

Returns:

ret – Whether to use example_inputs shapes instead of fx.graph shapes.

Return type:

bool

static torch_api_use_example_input_shapes(enable=False)[source]

Applicable when using torch.compile only. Use example_inputs shapes instead of fx.graph shapes.

Parameters:

enable (bool) – Applicable when using torch.compile only. Use example_inputs shapes instead of fx.graph shapes.

hidet.option.bench_config(warmup=1, number=5, repeat=5)[source]

Set the benchmark config of operator tuning.

To profile a schedule, hidet will run the following code:

for i in range(warmup):
    run()
latency = []
for i in range(repeat):
    synchronize device
    t1 = time()
    for j in range(number):
        run()
    synchronize device
    t2 = time()
    latency.append((t2 - t1) / number)
return median of latency

Thus, there will be total warmup + number * repeat times of execution.

Parameters:
  • warmup (int) – The number of warmup runs.

  • number (int) – The number of runs in a repeat.

  • repeat (int) – The number of repeats.

hidet.option.cache_dir(new_dir)[source]

Set the directory to store the cache.

The default cache directory:

  • If the hidet code is in a git repo, the cache will be stored in the repo root: hidet-repo/.hidet_cache.

  • Otherwise, the cache will be stored in the user home directory: ~/.hidet/cache.

Parameters:

new_dir (str) – The new directory to store the cache.

hidet.option.cache_operator(enabled=True)[source]

Whether to cache compiled operator on disk.

By default, hidet would cache all compiled operator and reuse whenever possible.

If user wants to disable the cache, run

hidet.option.cache_operator(False)
Parameters:

enabled (bool) – Whether to cache the compiled operator.

hidet.option.context()[source]

Create a new option context.

To set options in the new context, use the with statement:

with hidet.option.context() as ctx:
    hidet.option.cache_dir('./new_cache_dir')               # set predefined option
    hidet.option.set_option('other_option', 'other_value')  # set a custom option
    ...
Returns:

ctx – The new option context.

Return type:

OptionContext

hidet.option.current_context()[source]

Get the current option context.

To get the value of an option in the current context:

ctx = hidet.option.current_context()
cache_dir: str = ctx.get_option('cache_dir')
cache_operator: bool = ctx.get_option('cache_operator')
...
Returns:

ctx – The current option context.

Return type:

OptionContext

hidet.option.debug_cache_tuning(enabled=True)[source]

Whether to cache the generated kernels during tuning.

Note

This option is only used for debugging purpose. It will generate a lot of files in the cache directory and take a lot of disk space.

Parameters:

enabled (bool) – Whether to debug cache tuning.

hidet.option.debug_enable_var_id(enable=True)[source]

Whether to enable var id in the IR.

When this option is enabled, each variable (i.e., hidet.ir.Var) will have a unique id. Otherwise, each variable’s ID will be 0.

Parameters:

enable (bool) – Whether to enable var id in the IR.

hidet.option.debug_show_var_id(enable=True)[source]

Whether to show the var id in the IR.

When this option is enabled, the IR will show the var id with the format var@id, like x@1 and d_1@1732. Variable (i.e., hidet.ir.Var) a and b is the same var if and only if a is b evaluates to True in Python).

Parameters:

enable (bool) – Whether to show the var id in the IR.

hidet.option.debug_show_verbose_flow_graph(enable=True)[source]

Whether to show verbose information (like task) when we convert flow graph in to human-readable text.

Parameters:

enable (bool) – Whether to show verbose information when we convert flow graph in to human-readable text.

hidet.option.debug_strict_broadcast_check(enable=False)[source]

Whether to enforce equality of shapes in symbolic broadcasts.

If set to True, the symbolic equivalence checker is used to prove correctness of broadcasts, so broadcasting shapes [n] to [m] will raise ValueError. If set to False, broadcasting between shapes [n] and [m] will proceed assuming n == m.

Parameters:

enable (bool) – Whether to enforce equality of shapes in symbolic broadcasts.

hidet.option.dump_options()[source]

Dump the options in option context stack.

Returns:

ret – The dumped options, as a dict.

Return type:

Dict[str, Any]

hidet.option.execution_mode(kind='compilation')[source]

Use ‘symbolic’, ‘interpreter’, or ‘compilation’ mode for run() function in Operator allowed.

Parameters:

kind (str) – Use ‘symbolic’, ‘interpreter’, or ‘compilation’ mode for run() function in Operator allowed.

hidet.option.fix_gpu_frequency_for_tuning(enabled=False)[source]

Whether to fix the GPU frequency during tuning to avoid frequency throttling.

Parameters:

enabled (bool) – Whether to fix the GPU frequency during tuning to avoid frequency throttling.

hidet.option.get_bench_config()[source]

Get the benchmark config of operator tuning.

Returns:

ret – The benchmark config.

Return type:

Tuple[int, int, int]

hidet.option.get_cache_dir()[source]

Get the directory to store the cache.

Returns:

ret – The directory to store the cache.

Return type:

str

hidet.option.get_cache_operator()[source]

Get the option value of whether to cache compiled operator on disk.

Returns:

ret – Whether to cache the compiled operator.

Return type:

bool

hidet.option.get_execution_mode()[source]

Get which of ‘symbolic’, ‘interpreter’, ‘compilation’ mode to use for run() function in Operator allowed.

Returns:

ret – Get which of ‘symbolic’, ‘interpreter’, ‘compilation’ mode to use for run() function in Operator allowed.

Return type:

str

hidet.option.get_hexcute_matmul()[source]

Get strategy to enable the hexcute matmul kernels.

Returns:

ret – Get strategy to enable the hexcute matmul kernels.

Return type:

str

hidet.option.get_num_local_workers()[source]

Get the number of local worker processes to use for parallel compilation/tuning.

Returns:

ret – The number of local worker processes.

Return type:

int

hidet.option.get_option(name)[source]

Get the value of an option in current option context.

Parameters:

name (str) – The name of the option.

Returns:

ret – The value of the option.

Return type:

Any

hidet.option.get_parallel_build()[source]

Get the option value of whether to build operators in parallel.

Returns:

ret – Whether to build operators in parallel.

Return type:

bool

hidet.option.get_parallel_k()[source]

Get parallelization on k dimension of the matrix multiplication

Returns:

ret – Get parallelization strategy.

Return type:

Union[str, int]

hidet.option.get_parallel_tune()[source]

Get the option value of whether to build operators in parallel.

Returns:

ret – Get the maximum number of jobs and minumum amount of memory reserved for tuning.

Return type:

Tuple[int, float]

hidet.option.get_runtime_check()[source]

Get whether to check shapes and dtypes of all input arguments to compiled Graphs or Tasks.

Returns:

ret – Get whether to check shapes and dtypes of all input arguments to compiled Graphs or Tasks.

Return type:

bool

hidet.option.get_save_lower_ir()[source]

Get the option value of whether to save the lower IR.

Return type:

bool

hidet.option.get_search_space()[source]

Get the schedule search space of tunable operator.

Returns:

ret – The schedule space level.

Return type:

int

hidet.option.hexcute_matmul(strategy='enable')[source]

Whether to enable hexcute matmul kernels, such as the Hexcute matmul kernels.

  • enable:

    Always enable the hexcute matmul kernels on all the GPU platforms.

  • disable:

    Always disable the hexcute matmul kernels on all the GPU platforms.

  • auto:

    Use a heuristic strategy to decide whether to enable the hexcute matmul kernels. The decision is based on the metrics of the current GPU platform, such as the memory bandwidth, the compute throughput, etc.

Parameters:

strategy (str) – The strategy to enable the hexcute matmul kernels.

hidet.option.is_fix_gpu_frequency_for_tuning()[source]

Get the option value of whether to fix the GPU frequency during tuning to avoid frequency throttling.

Returns:

ret – Whether to fix the GPU frequency during tuning to avoid frequency throttling.

Return type:

bool

hidet.option.is_option_exist(name)[source]

Checking is options exist/registered.

Parameters:

name (str) – Name of the option.

Returns:

ret – True if option exists/registered, False otherwise.

Return type:

bool

hidet.option.is_use_torch_stream()[source]

Checking if currently is using torch stream

Returns:

ret – True if using torch stream, False otherwise.

Return type:

bool

hidet.option.num_local_workers(num_workers=None)[source]

Set the number of local worker processes to use for parallel compilation/tuning.

Parameters:

num_workers (Optional[int]) – The number of local worker processes. If None, use os.cpu_count().

hidet.option.parallel_build(enabled=True)[source]

Whether to build operators in parallel.

Parameters:

enabled (bool) – Whether to build operators in parallel.

hidet.option.parallel_k(strategy='default')[source]

Parallelization on k dimension of the matrix multiplication Candidates are: default, disabled, search, 2, 3, 4…

  • default:

    Default parallelization strategy. A heuristic strategy is used to decide whether to parallelize on k dimension and the size of split factor

  • disabled:

    Disable parallelization on k dimension

  • search:

    Search for the best parallelization strategy. Takes more time but usually achieves the best performance.

Parameters:

strategy (str) – The parallelization strategy.

hidet.option.parallel_tune(max_parallel_jobs=-1, mem_gb_per_job=1.5, max_candidates_per_job=32)[source]

Specify the maximum number of parallel compilation jobs to do, and the number of GiB preserved for each job.

Parameters:
  • max_parallel_jobs (int) – The maximum number of parallel jobs allowed, default -1 (the number of available vcpu returned by os.cpu_count()).

  • mem_gb_per_job (float) – The minimum amount of memory (in GiB) reserved for each tuning job, default 1.5GiB.

  • max_candidates_per_job (int) –

hidet.option.restore_options(dumped_options)[source]

Restore the options from dumped options.

Parameters:

dumped_options (Dict[str, Any]) – The dumped options.

hidet.option.runtime_check(enable=True)[source]

Whether to check shapes and dtypes of all input arguments to compiled Graphs or Tasks.

Parameters:

enable (bool) – Whether to check shapes and dtypes of all input arguments to compiled Graphs or Tasks.

hidet.option.save_lower_ir(enabled=True)[source]

Whether to save the lower IR.

Parameters:

enabled (bool) – Whether to save the lower IR.

hidet.option.search_space(space)[source]

Set the schedule search space of tunable operator.

Some operators can be tuned in hidet to achieve the best performance, such as matrix multiplication.

During tuning, different operator schedules will be tried and profiled to get the best one.

We call the space of the tried operator schedule schedule space. There is a trade-off between the tuning time and the operator execution time. If we try more schedules, the tuning process would take longer time, and we are likely to find better schedule.

This function allows user to set the space level that controls the search space we tried.

By convention, we have space level

  • 0 for schedule space contains only a single schedule.

  • 1 for schedule space contains tens of schedules so that the tuning time will be less than 1 minute.

  • 2 for arbitrary large space.

Usage

hidet.search_space(2)

After calling above function, all subsequent compilation would use space level 2, until we call this function again with another space level.

Parameters:

space (int) – The space level to use. Candidates: 0, 1, and 2.

hidet.option.set_option(name, value)[source]

Set the value of an option in current option context.

The option must be registered before setting via hidet.option.register_option().

Parameters:
  • name (str) – The name of the option.

  • value (Any) – The value of the option.

hidet.option.use_torch_stream(use)[source]

Set the flag for whether to use torch steam

Parameters:

use (bool) – whether to set