culturay_el_32000

About

🇬🇷 A Greek tokenizer, trained on the Greek (el) subset of the CulturaY dataset.

Description

This is a character-level Modern Greek (el) tokenizer, trained on the corresponding subset of CulturaY. It has a vocabulary size of 32,000 (multiple of 128), which makes it fast for integration in models.

Usage

import tokenizers

dataset = tokenizers.Tokenizer.from_pretrained("gvlassis/culturay_el_32000")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model’s pipeline type. Check the docs .