Nicolay Rusnachenko

nicolay-r

https://nicolay-r.github.io/

AI & ML interests

Information Retrieval・Medical Multimodal NLP (🖼+📝) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

posted an update 7 days ago

📢 Through the 2024 we attempting in advancing opinion mining by proposing evaluation which involves explanations! A while ago we launched RuOpinionNE-2024 aimed at extraction of sentiment opinions with spans (as explanations) from mass media news written in Russian language. The formed competition is at the final stage on codalab platform: 📊 https://codalab.lisn.upsaclay.fr/competitions/20244 🔎 What we already observe? For the two type of sentiment labels (positive and negative), our recent findings were that the top performing submission results in F1=0.34 while the baseline LLM approach results in F1=0.17 (see screenshot of the leaderboard below 📸) ⏰️ We finally launch the final stage with a decent amount of submissions which lasts until 15th of January 2025. 🙌 Everyone who wish to evaluate most recent advances on explainable opinion mining during the final stage are welcome! Codalab main page: https://codalab.lisn.upsaclay.fr/competitions/20244#learn_the_details More details on github: https://github.com/dialogue-evaluation/RuOpinionNE-2024

replied to as-cle-bert's post 8 days ago

I got my GitHub Wrapped for 2024 today!🥂 Get yours here on HuggingFace 👉 https://huggingface.co/spaces/as-cle-bert/what-a-git-year GitHub repo with the code to reproduce it 👉 https://github.com/AstraBert/what-a-git-year Hope that everybody had a Git year!🎉

reacted to davanstrien's post with ❤️ 10 days ago

🇸🇰 Hovorte po slovensky? Help build better AI for Slovak! We only need 90 more annotations to include Slovak in the next Hugging Face FineWeb2-C dataset (https://huggingface.co/datasets/data-is-better-together/fineweb-c) release! Your contribution will help create better language models for 5+ million Slovak speakers. Annotate here: https://huggingface.co/spaces/data-is-better-together/fineweb-c. Read more about why we're doing it: https://huggingface.co/blog/davanstrien/fineweb2-community

View all activity

Organizations

None yet

Posts 38

Post

600

📢 Through the 2024 we attempting in advancing opinion mining by proposing evaluation which involves explanations!

A while ago we launched RuOpinionNE-2024 aimed at extraction of sentiment opinions with spans (as explanations) from mass media news written in Russian language. The formed competition is at the final stage on codalab platform:
📊 https://codalab.lisn.upsaclay.fr/competitions/20244

🔎 What we already observe? For the two type of sentiment labels (positive and negative), our recent findings were that the top performing submission results in F1=0.34 while the baseline LLM approach results in F1=0.17 (see screenshot of the leaderboard below 📸)

⏰️ We finally launch the final stage with a decent amount of submissions which lasts until
15th of January 2025.

🙌 Everyone who wish to evaluate most recent advances on explainable opinion mining during the final stage are welcome!

Codalab main page:
https://codalab.lisn.upsaclay.fr/competitions/20244#learn_the_details
More details on github:
https://github.com/dialogue-evaluation/RuOpinionNE-2024

Post

2086

📢 Deligted to share the most recent milestone on quick deployment of Named Entity Recognition (NER) in Gen-AI powered systems.

Releasing the bulk-ner 0.25.0 which represent a tiny framework that would save you time for deploing NER with any model.

💎 Why is this important? In the era of GenAI the handling out textual output might be challenging. Instead, recognizing named-entities via domain-oriented systems for your donwstream LLM would be preferable option.

📦: https://pypi.org/project/bulk-ner/0.25.0/
🌟: https://github.com/nicolay-r/bulk-ner

I noticed that the direct adaptaion of the LM for NER would result in spending signifcant amount of time on formatting your texts according to the NER-model needs.
In particular:
1. Processing CONLL format with B-I-O tags from model outputs
2. Input trimming: long input content might not be completely fitted

To cope with these problems, in version 0.25.0 I made a huge steps forward by providing:
✅ 🐍 Python API support: see screenshot below for a quick deployment (see screenshot below 📸)
✅ 🪶 No-string: dependencies are now clear, so it is purely Python implementation for API calls.
✅ 👌 Simplified output formatting: we use lists to represent texts with inner lists that refer to annotated objects (see screenshot below 📸)

📒 We have a colab for a quick start here (or screenshot for bash / Python API 📸)
https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb

👏 The code for pipeline deployment is taken from the AREkit project:
https://github.com/nicolay-r/AREkit

View all posts