Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (πŸ–Ό+πŸ“) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Recent Activity

View all activity

Organizations

None yet

Posts 38

view post
Post
600
πŸ“’ Through the 2024 we attempting in advancing opinion mining by proposing evaluation which involves explanations!

A while ago we launched RuOpinionNE-2024 aimed at extraction of sentiment opinions with spans (as explanations) from mass media news written in Russian language. The formed competition is at the final stage on codalab platform:
πŸ“Š https://codalab.lisn.upsaclay.fr/competitions/20244

πŸ”Ž What we already observe? For the two type of sentiment labels (positive and negative), our recent findings were that the top performing submission results in F1=0.34 while the baseline LLM approach results in F1=0.17 (see screenshot of the leaderboard below πŸ“Έ)

⏰️ We finally launch the final stage with a decent amount of submissions which lasts until
15th of January 2025.

πŸ™Œ Everyone who wish to evaluate most recent advances on explainable opinion mining during the final stage are welcome!

Codalab main page:
https://codalab.lisn.upsaclay.fr/competitions/20244#learn_the_details
More details on github:
https://github.com/dialogue-evaluation/RuOpinionNE-2024
view post
Post
2086
πŸ“’ Deligted to share the most recent milestone on quick deployment of Named Entity Recognition (NER) in Gen-AI powered systems.

Releasing the bulk-ner 0.25.0 which represent a tiny framework that would save you time for deploing NER with any model.

πŸ’Ž Why is this important? In the era of GenAI the handling out textual output might be challenging. Instead, recognizing named-entities via domain-oriented systems for your donwstream LLM would be preferable option.

πŸ“¦: https://pypi.org/project/bulk-ner/0.25.0/
🌟: https://github.com/nicolay-r/bulk-ner

I noticed that the direct adaptaion of the LM for NER would result in spending signifcant amount of time on formatting your texts according to the NER-model needs.
In particular:
1. Processing CONLL format with B-I-O tags from model outputs
2. Input trimming: long input content might not be completely fitted

To cope with these problems, in version 0.25.0 I made a huge steps forward by providing:
βœ… 🐍 Python API support: see screenshot below for a quick deployment (see screenshot below πŸ“Έ)
βœ… πŸͺΆ No-string: dependencies are now clear, so it is purely Python implementation for API calls.
βœ… πŸ‘Œ Simplified output formatting: we use lists to represent texts with inner lists that refer to annotated objects (see screenshot below πŸ“Έ)

πŸ“’ We have a colab for a quick start here (or screenshot for bash / Python API πŸ“Έ)
https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb

πŸ‘ The code for pipeline deployment is taken from the AREkit project:
https://github.com/nicolay-r/AREkit

datasets

None public yet