CamemBERT 2.0: A Smarter French Language Model Aged to Perfection Paper โข 2411.08868 โข Published Nov 13, 2024 โข 12
Awesome Document AI Collection A collection of open-source document AI ๐ ๐ ๐ โข 27 items โข Updated Mar 11, 2024 โข 76
Harvesting Textual and Structured Data from the HAL Publication Repository Paper โข 2407.20595 โข Published Jul 30, 2024 โข 22
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus Paper โข 2406.08707 โข Published Jun 13, 2024 โข 15