François Remy

FremyCompany

AI & ML interests

NLP; Clinical NLP; Medical NLP; EHR; Web development;

Recent Activity

updated a collection 3 days ago
Biomedical NLP papers
updated a collection 3 days ago
Biomedical NLP papers
updated a collection 3 days ago
Biomedical NLP papers
View all activity

Organizations

AZ Delta R&D (RADar)'s profile picture Spaces-explorers's profile picture Speech Recognition Community Event Version 2's profile picture Tweeties in a Tweety World's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Parallia's profile picture

Posts 2

view post
Post
2178
Today, April 26, is the Day of the Tatar Language! 🌟
To celebrate, we release our new language model, Tweety Tatar 🐣

https://huggingface.co/Tweeties/tweety-tatar-base-7b-2024-v1

The model was converted from Mistral Instruct v0.2 using a novel technique called trans-tokenization. As a result, the model uses a brand-new tokenizer, fully tailored for the Tatar language.

We also release a model which can be finetuned for translation of English or Russian into Tatar, and achieves a performance similar to commercial offerings:

https://huggingface.co/Tweeties/tweety-tatar-hydra-base-7b-2024-v1

More details in our upcoming paper 👀
François REMY, Pieter Delobelle, Alfiya Khabibullina

Татар теле көне белән!
view post
Post
🔥 What's that biomedical model that got 170,763 downloads last month on HuggingFace?! Well, the paper is finally published! #BioLORD

📰 Read our article in the Journal of the American Medical Informatics Association:
https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocae029/7614965

📝TLDR: BioLORD-2023 is a series of semantic language models for the biomedical domain, capable of representing clinical concepts and sentences in a semantic space aligned with human preferences. Our new multilingual version supports 50+ languages and is further finetuned on 7 European languages. These models were trained contrastively and through distillations, using a corpus unifying in the same latent space the concept names of biomedical concepts and their descriptions. For concepts which didn't have a description written by humans in UMLS, we use information contained in the SnomedCT knowledge graph and the capabilities of ChatGPT to generate synthetic data and improve our results.

🤗 Access our models from the HuggingFace hub, including the new 2023-C and 2023-S variants:
FremyCompany/BioLORD-2023
FremyCompany/BioLORD-2023-M
FremyCompany/BioLORD-2023-S
FremyCompany/BioLORD-2023-C