Post
2890
nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb
Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb