Ocannl.Nn_blocksThis file contains basic building blocks for neural networks, with limited functionality. Feel free to copy-paste and modify as needed.
Design principles, OCANNL fundamentals, and common patterns:
module Tn = Ocannl_tensor.Operation.DSL_modules.Ir.Tnodeval box_muller :
Ocannl_tensor.Tensor.grad_spec ->
(unit -> Ocannl_tensor.Tensor.t) ->
unit ->
Ocannl_tensor.Tensor.op_funval kaiming_impl :
?scale_sq:Base.float ->
Ocannl_tensor.Tensor.grad_spec ->
(unit -> Ocannl_tensor.Tensor.t) ->
unit ->
Ocannl_tensor.Tensor.op_funval xavier_impl :
?scale_sq:Base.float ->
Ocannl_tensor.Tensor.grad_spec ->
(unit -> Ocannl_tensor.Tensor.t) ->
unit ->
Ocannl_tensor.Tensor.op_funmodule DSL_modules : sig ... endval class_ids_of_int_list :
?label:Base.string ->
int list ->
Ocannl_tensor.Tensor.tConvert a list of integers to a compact tensor of class IDs (no num_classes allocation).
val one_hot_of_ids :
num_classes:Base__Int.t ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tBuild a logical one-hot tensor from a tensor of class IDs, using only existing operations (range + equality) so the compiler keeps the proof that the result is one-hot (enabling the gh-343 embedding gather optimization). No dense len * num_classes data is materialized on the host. With ids shaped as a len batch (output rank 0), the result is len; num_classes: one_hot[i, k] = (k == ids[i]).
val one_hot_of_int_list :
num_classes:Base__Int.t ->
int list ->
Ocannl_tensor.Tensor.tConvert a list of integers to a logical one-hot encoded tensor of shape len; num_classes. This composes class_ids_of_int_list and one_hot_of_ids: it stores only len compact IDs on the host and expresses the one-hot logically, rather than allocating a dense len * num_classes Bigarray. See dense_one_hot_of_int_list if a materialized host one-hot is genuinely required.
val dense_one_hot_of_int_list :
num_classes:Base__Int.t ->
int Base.List.t ->
Ocannl_tensor.Tensor.tConvert a list of integers to a dense, host-materialized one-hot Bigarray-backed tensor of shape len; num_classes. Prefer one_hot_of_int_list (logical) unless a dense host fixture is needed; a materialized Bigarray carries no proof that it is one-hot, so it cannot be optimized into an embedding gather.
val mlp_layer :
label:Base.string Base.list ->
hid_dim:Base.int ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval dropout :
rate:Base.Float.t ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tMasks and scales by 1/keep_prob to maintain expected value. When train_step = None, the dropout rate is ignored and the tensor is returned unmodified.
val mlp :
label:Base.string Base.list ->
hid_dims:Base.int Base.List.t ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tMulti-layer perceptron of depth List.length hid_dims + 1, with a linear output layer.
val softmax :
spec:Base.String.t ->
?temperature:Base.Float.t ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tSoftmax across specified axes. Does not support non-default row variables.
type position_embedding = | Learned_additiveCurrent default: learned parameter added to input embeddings.
*)| Sinusoidal_additive of {enc_encoding : DSL_modules.Tensor.t;dec_encoding : DSL_modules.Tensor.t;}Fixed sinusoidal encoding added to input embeddings. Use separate tensors for encoder and decoder when d_enc <> d_dec. For equal widths, the same tensor can be passed for both. Build with sinusoidal_position_encoding.
| RoPE of {freqs : DSL_modules.Tensor.t;positions : DSL_modules.Tensor.t;}Rotary embeddings applied to Q/K inside self-attention. No additive component.
*)| No_pos_embedNo position information.
*)Strategy for positional encoding in attention / transformer blocks.
val rope_frequencies :
half_d:Base.int ->
?base:Base.Float.t ->
unit ->
Ocannl_tensor.Tensor.tRoPE inverse frequencies: theta_k = base^(-2k/d) for k = 0..half_d-1.
val position_indices : seq_len:Base.int -> unit -> Ocannl_tensor.Tensor.tPosition indices 0, 1, ..., seq_len-1 as a non-learned batch-dim tensor.
val sinusoidal_position_encoding :
d_model:Base.int ->
max_len:Base.int ->
unit ->
Ocannl_tensor.Tensor.tSinusoidal positional encoding (Vaswani et al. 2017). Non-learned, shape: batch_dims=max_len, output_dims=d_model. Matches model width at the transformer input level, NOT per-head width.
val rope :
freqs:Ocannl_tensor.Tensor.t ->
positions:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tApply RoPE rotation to tensor x whose last output axis has even size d. Rotates within the last output axis (per-head width d) without crossing head boundaries. freqs has output=d/2, positions has batch=seq_len.
val multi_head_attention :
label:Base.string Base.list ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
?temperature:Base.Float.t ->
?dropout_rate:Base.Float.t ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
?mask:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval multi_head_att_workshop :
num_heads:Base.int ->
d_k:Base.int ->
d_v:Base.int ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval layer_norm :
label:Base.string Base.list ->
?epsilon:Base.float ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval transformer_encoder_block :
label:Base.string list ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval decoder_only_block :
label:Base.string list ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?dropout_rate:Base.Float.t ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
mask:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tDecoder-only transformer block: masked self-attention + FFN with post-norm LayerNorm. Like transformer_encoder_block but accepts a ~mask parameter for causal masking. No cross-attention — suitable for autoregressive language models.
val decoder_only :
label:Base.string list ->
num_layers:int ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?dropout_rate:Base.Float.t ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
mask:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tStack of decoder_only_block layers.
val cross_attention :
label:Base.string Base.list ->
num_heads:Base.int ->
d_k:Base.int ->
d_v:Base.int ->
?temperature:Base.Float.t ->
?dropout_rate:Base.Float.t ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
enc_output:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval transformer_decoder_block :
label:Base.string list ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
enc_output:Ocannl_tensor.Tensor.t ->
mask:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval transformer_encoder :
label:Base.string list ->
num_layers:int ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval transformer_decoder :
label:Base.string list ->
num_layers:int ->
num_heads:Base.int ->
d_k:Base.Int.t ->
d_v:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
Ocannl_tensor.Tensor.t ->
enc_output:Ocannl_tensor.Tensor.t ->
mask:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval transformer :
label:Base.string Base.list ->
num_encoder_layers:int ->
num_decoder_layers:int ->
num_heads:Base__Int.t ->
d_enc:Base.int ->
d_dec:Base.int ->
d_ff:Base.int ->
?epsilon:Base.float ->
?pos_embed:position_embedding ->
unit ->
train_step:Ir.Indexing.static_symbol option ->
src:Ocannl_tensor.Tensor.t ->
tgt:Ocannl_tensor.Tensor.t ->
mask:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tval transformer_with_loss :
label:'a ->
model:
(train_step:'b -> src:'c -> tgt:'d -> mask:'e -> Ocannl_tensor.Tensor.t) ->
unit ->
train_step:'b ->
src:'c ->
tgt_input:'d ->
tgt_target:Ocannl_tensor.Tensor.t ->
mask:'e ->
Ocannl_tensor.Tensor.t * Ocannl_tensor.Tensor.tTransformer with teacher forcing for autoregressive training.
TODO: Simplify once tensor shifting/slicing is better supported in shape inference. Currently requires pre-shifted tgt_input (all but last token) and tgt_target (all but first token). During training, the model learns to predict tgt_target given tgt_input.
val conv2d :
label:Base.string Base.list ->
?kernel_size:Base.int ->
?stride:Base.Int.t ->
?use_padding:bool ->
?out_channels:Base.int ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.t2D convolution layer with flexible padding and stride options.
When use_padding=false and stride > 1, the input spatial dimensions must satisfy: (input_size - kernel_size) mod stride = 0, otherwise shape inference will fail with "incompatible stride" error. The output size is (input_size - kernel_size) / stride + 1.
When use_padding=true, there is no such restriction and output size is input_size / stride.
val depthwise_separable_conv2d :
label:Base.string Base.list ->
?kernel_size:Base.int ->
?stride:Base.Int.t ->
?use_padding:bool ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tDepthwise separable convolution - more efficient for mobile/edge devices. Consists of depthwise conv (spatial filtering per channel) followed by pointwise conv (1x1 conv for channel mixing).
See conv2d for dimension constraints when use_padding=false.
val max_pool2d :
?stride:Base.Int.t ->
?window_size:Base.int ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tMax pooling for 2D spatial data - reduces spatial dimensions by taking maximum values.
The input spatial dimensions must satisfy: (input_size - window_size) mod stride = 0, otherwise shape inference will fail. The output size is (input_size - window_size) / stride + 1.
Note: The < in the einsum spec indicates no-padding mode (indices stay within bounds).
val avg_pool2d :
?stride:Base.Int.t ->
?window_size:Base.int ->
unit ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tAverage pooling for 2D spatial data - reduces spatial dimensions by averaging values.
See max_pool2d for dimension constraints.
val global_avg_pool2d : DSL_modules.Tensor.t -> Ocannl_tensor.Tensor.tGlobal average pooling - reduces each feature map to a single value by averaging. Commonly used before final classification layer.
val batch_norm2d :
label:Base.string Base.list ->
?epsilon:Base.float ->
?momentum:float ->
unit ->
train_step:'a option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tBatch normalization for CNN layers - normalizes across the batch dimension for each channel. Typically applied after convolutions and before activations.
val batch_norm1d :
label:Base.string Base.list ->
?epsilon:Base.float ->
?momentum:float ->
unit ->
train_step:'a option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tBatch normalization for MLP layers - normalizes across the batch axis only. Unlike batch_norm2d there are no spatial axes to reduce over; channel axes are carried through unchanged via the ..c.. row variable.
See the FIXME on batch_norm2d: running statistics are not implemented, so momentum is ignored and inference falls back to the learned gamma/beta parameters rather than population statistics. Acceptable for tutorial examples; do not rely on inference correctness for distribution-shifted inputs.
val conv_bn_relu :
label:Base.string list ->
?kernel_size:Base.int ->
?stride:Base.Int.t ->
unit ->
train_step:'a option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tConv block with conv -> batch norm -> activation pattern
val resnet_block :
label:Base.string list ->
?stride:Base.Int.t ->
unit ->
train_step:'a option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tResidual block for ResNet-style architectures. Features skip connections that help with gradient flow in deep networks.
val lenet :
?label:Base.string list ->
?out_channels1:Base.int ->
?out_channels2:Base.int ->
unit ->
train_step:'a ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tLeNet-style architecture for simple image classification (e.g., MNIST). Classic architecture: conv -> pool -> conv -> pool -> fc layers. Output shape is inferred from training data.
val vgg_block :
label:Base.string list ->
num_convs:int ->
?kernel_size:Base.int ->
unit ->
train_step:'a option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tVGG-style block - multiple convolutions with same filter count followed by pooling
val sokoban_cnn :
label:Base.string Base.list ->
?num_actions:Base.int ->
unit ->
train_step:'a option ->
grid_state:Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.t * Ocannl_tensor.Tensor.tSimple CNN for Sokoban-like grid environments. Processes grid states with multiple conv layers and outputs action logits.
val mobile_cnn :
label:Base.string Base.list ->
?num_classes:Base.int ->
?width_mult:float ->
unit ->
train_step:'a option ->
Ocannl_tensor.Tensor.t ->
Ocannl_tensor.Tensor.tModern CNN with depthwise separable convolutions for efficiency. Suitable for mobile/edge deployment.