Ir.Low_levelmodule Scope_id : sig ... endval sexp_of_scope_id : scope_id -> Sexplib0.Sexp.tval hash_fold_scope_id :
Ppx_hash_lib.Std.Hash.state ->
scope_id ->
Ppx_hash_lib.Std.Hash.stateval hash_scope_id : scope_id -> Ppx_hash_lib.Std.Hash.hash_valuetype t = | Noop| Comment of Base.string| Staged_compilation of Base.unit -> PPrint.document| Seq of t * t| For_loop of {index : Indexing.symbol;from_ : Base.int;to_ : Base.int;body : t;trace_it : Base.bool;}| Zero_out of Tnode.t| Set of {tn : Tnode.t;idcs : Indexing.axis_index Base.array;llsc : scalar_t;mutable debug : Base.string;}| Set_from_vec of {tn : Tnode.t;idcs : Indexing.axis_index Base.array;length : Base.int;vec_unop : Ops.vec_unop;arg : scalar_arg;mutable debug : Base.string;}| Set_local of scope_id * scalar_t| Declare_local of {id : scope_id;needs_init : Base.bool;}Cases: t -- code, scalar_t -- single number at some precision.
and scalar_t = | Local_scope of {id : scope_id;body : t;orig_indices : Indexing.axis_index Base.array;}| Get_local of scope_id| Get of Tnode.t * Indexing.axis_index Base.array| Get_dynamic of {tn : Tnode.t;The gathered table; treated as a read of tn, like Get.
idcs : Indexing.axis_index Base.array;Static everywhere except dyn_axis.
dyn_axis : Base.int;Which idcs slot is replaced by dyn_value at codegen time.
dyn_value : scalar_arg;Integer-valued index spliced into the row-major offset at dyn_axis. gh-343: produced only by rewrite_one_hot_reductions; never escapes low-level / backend codegen.
}| Get_merge_buffer of Tnode.t * Indexing.axis_index Base.array| Ternop of Ops.ternop * scalar_arg * scalar_arg * scalar_arg| Binop of Ops.binop * scalar_arg * scalar_arg| Unop of Ops.unop * scalar_arg| Constant of Base.float| Constant_bits of Base.int64Direct bit representation, primarily for uint4x32
*)| Embed_index of Indexing.axis_indexThe argument precision is preserved in heterogeneous precision operation arguments, and is ignored (overridden) in homogeneous precision operations.
val sexp_of_t : t -> Sexplib0.Sexp.tval sexp_of_scalar_t : scalar_t -> Sexplib0.Sexp.tval sexp_of_scalar_arg : scalar_arg -> Sexplib0.Sexp.tval equal_scalar_arg : scalar_arg -> scalar_arg -> Base.boolval compare_scalar_arg : scalar_arg -> scalar_arg -> Base.intval loop_over_dims :
Base.int Base.array ->
body:(Indexing.axis_index Base.array -> t) ->
tval unroll_dims :
Base.int Base.array ->
body:(Indexing.axis_index Base.array -> offset:Base.int -> t) ->
tval loop_over_padding_region :
dims:Base.int Base.array ->
padding:Ops.axis_padding Base.array ->
body:(Indexing.axis_index Base.array -> t) ->
tGenerate loops that iterate only over the padding margins of a tensor. For dimensions with padding, generates separate loops for left margin, middle (recursing), and right margin. The middle region continues recursing to find padding in other dimensions.
val virtualize_settings : virtualize_settingsval sexp_of_visits : visits -> Sexplib0.Sexp.tval visits_of_sexp : Sexplib0.Sexp.t -> visitsval visits : Base.int -> visitsval recurrent : visitsval is_visits : visits -> Base.boolval is_recurrent : visits -> Base.boolval visits_val : visits -> Base.int Base.optionval recurrent_val : visits -> Base.unit Base.optionmodule Variants_of_visits : sig ... endtype traced_array = {tn : Tnode.t;assignments : Base.int Base.array Base.Hash_set.t;accesses : (Base.int Base.array, visits) Base.Hashtbl.t;mutable zero_initialized_by_code : Base.bool;mutable zeroed_out : Base.bool;mutable read_before_write : Base.bool;The node is read before it is written (i.e. it is recurrent).
*)mutable read_only : Base.bool;Surprisingly, the notions of read-only and of constant memory mode come apart: small hosted constants are not read-only because they are initialized on devices by being assigned to; and a volatile memory mode is read-only from the devices' perspective.
*)mutable is_scalar_constexpr : Base.bool;True only if the tensor node has all axes of dimension 1, is either zeroed-out or assigned before accessed, is assigned at most once, and from an expression involving only constants or tensor nodes that were at the time is_scalar_constexpr.
*)mutable is_accessing : Base.bool;False only if the tensor node is built from index embeddings and scalar constant expressions.
*)mutable is_complex : Base.bool;True only if the tensor node is built from a genuinely complex scalar computation (one that accesses other non-constexpr computations). Sharing a loop symbol with another tensor does not, by itself, make a node complex (see #134).
*)mutable prefers_virtual_one_hot : Base.bool;True when at least one setter for this tensor is a one-hot selector assignment, i.e. a Cmpeq between the embedded range iterator and a loop-variable-free expression. When has_non_one_hot_setter is false this tensor is exempt from the visit-count Never_virtual rule (task-73617488).
mutable has_non_one_hot_setter : Base.bool;True when at least one setter is NOT a one-hot selector (including Set_from_vec). A tensor with prefers_virtual_one_hot && not has_non_one_hot_setter is the candidate for the one-hot virtualizer exemption.
mutable is_range_producer : Base.bool;True when at least one Set assigns this tensor from a bare Embed_index scalar, i.e. the tensor is a Range_over_offsets producer. Used by the indirect arm of is_one_hot_selector_assignment to prove that a Get(rtn, [k]) will inline to Embed_index k rather than arbitrary values (task-73617488).
}val sexp_of_traced_array : traced_array -> Sexplib0.Sexp.tval get_node :
(Tnode.t, traced_array) Base.Hashtbl.t ->
Tnode.t ->
traced_arraytype traced_store = (Tnode.t, traced_array) Base.Hashtbl.tval sexp_of_traced_store : traced_store -> Sexplib0.Sexp.ttype optimize_ctx = {computations : (Tnode.t,
(Indexing.axis_index Base.array Base.option * t) Base.list)
Base.Hashtbl.t;The computations (of the tensor node) are retrieved for optimization just as they are populated, so that the inlined code corresponds precisely to the changes to the arrays that would happen up till that point. Within the code blocks paired with an index tuple, all assignments and accesses must happen via the index tuple; if this is not the case for some assignment, the node cannot be virtual. Currently, we only allow for-loop symbols in assignment indices of virtual nodes.
*)}val sexp_of_optimize_ctx : optimize_ctx -> Sexplib0.Sexp.ttype optimized = {traced_store : traced_store;optimize_ctx : optimize_ctx;llc : t;merge_node : Tnode.t Base.option;}val sexp_of_optimized : optimized -> Sexplib0.Sexp.tval optimize :
optimize_ctx ->
unoptim_ll_source:(PPrint.document -> Base.unit) Base.option ->
ll_source:(PPrint.document -> Base.unit) Base.option ->
name:Base.string ->
Indexing.static_symbol Base.list ->
t ->
optimizedreads_scope_before_set id body returns true if id is read (via Get_local) before the first definitely-executed Set_local id in body. Use this at code-generation time to decide whether a Local_scope or Declare_local declaration needs a zero initializer.
gh-343: rewrites the narrow one-hot embedding pattern -- an Add reduction over a loop variable k whose body selects an embedding-table row via k == index_expr (a logical one-hot) -- into a guarded dynamic gather (Get_dynamic) that reads the table row at index_expr directly, with an in-range guard returning 0 out of [0, vocab_size) to preserve the one-hot semantics. Unmatched or unsupported reductions are left unchanged. Called internally by optimize between simplify_llc and eliminate_common_subexpressions; exposed for testing.
Eliminates common subexpressions within each statement's scalar expression tree. Replaces duplicate Local_scope nodes (structurally identical modulo scope_id) with Get_local references to the first occurrence. Called internally by optimize; exposed for testing.
Hoists shared Local_scope computations from sibling statements to the enclosing scope. When two or more sibling statements share an alpha-equivalent Local_scope node, the computation is extracted as a Declare_local + body preceding the first user, and all occurrences are replaced with Get_local.
val input_and_output_nodes :
optimized ->
(Base.Set.M(Ir.Tnode).t * Base.Set.M(Ir.Tnode).t) * Tnode.t Base.optionInputs are the materialized read-only and read-before-write (within the code) non-constant non-merge nodes. They are inputs in a broad sense, as they could be recurrent nodes or parameters. Outputs are all the materialized nodes written-to by the code. The last returned component is the input merge node, if used in the code.
val function_header_doc :
?name:Base.string ->
?static_indices:Indexing.static_symbol Base.list ->
Base.unit ->
PPrint.documentval to_doc_cstyle :
?name:Base.string ->
?static_indices:Indexing.static_symbol Base.list ->
Base.unit ->
t ->
PPrint.documentAdheres more to the C syntax, outputs implicit type casts.
val to_doc :
?name:Base.string ->
?static_indices:Indexing.static_symbol Base.list ->
Base.unit ->
t ->
PPrint.documentAdheres to the %cd syntax.