Clelia (Astra) Bertelli's picture

Clelia (Astra) Bertelli PRO

as-cle-bert

AI & ML interests

Biology + Artificial Intelligence = โค๏ธ | AI for sustainable development, sustainable development for AI | Researching on Machine Learning Enhancement | I love automation for everyday things | Blogger | Open Source

Recent Activity

posted an update about 12 hours ago
Are you using Obsidian to write your notes? If the answer is yes, then this post might be for you!โœ… I recently created ๐จ๐›๐ฌ๐ข๐๐ข๐š๐ง-๐๐ข๐ ๐ž๐ฌ๐ญ, a Google Gemini-powered application that gives you feedback on style and contents of the documents you have been working on๐Ÿง  Repo ๐Ÿ‘‰ https://github.com/AstraBert/obsidian-digest PyPi Package ๐Ÿ‘‰ https://pypi.org/project/obsidian-digest/ The app is available as: - ๐œ๐จ๐ฆ๐ฆ๐š๐ง๐-๐ฅ๐ข๐ง๐ž ๐ญ๐จ๐จ๐ฅ: install it as a python package with ๐—ฝ๐—ถ๐—ฝ, and execute it from terminal anytime!๐Ÿ“ฆ -๐ƒ๐ข๐ฌ๐œ๐จ๐ซ๐ ๐๐จ๐ญ ๐›๐ฎ๐ข๐ฅ๐ญ ๐Ÿ๐ซ๐จ๐ฆ ๐ฌ๐จ๐ฎ๐ซ๐œ๐ž ๐œ๐จ๐๐ž: clone the GitHub repo, install the needed dependencies through ๐—ฐ๐—ผ๐—ป๐—ฑ๐—ฎ, and run the bot: you will get hourly messages with suggestions and considerations about your activity on Obsidian in the previous hour๐Ÿค– - ๐ƒ๐ข๐ฌ๐œ๐จ๐ซ๐ ๐๐จ๐ญ ๐๐ž๐ฉ๐ฅ๐จ๐ฒ๐ž๐ ๐ฅ๐จ๐œ๐š๐ฅ๐ฅ๐ฒ ๐ฐ๐ข๐ญ๐ก ๐๐จ๐œ๐ค๐ž๐ซ ๐œ๐จ๐ฆ๐ฉ๐จ๐ฌ๐ž: clone the GitHub repo and launch ๐—ฑ๐—ผ๐—ฐ๐—ธ๐—ฒ๐—ฟ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ฒ ๐˜‚๐—ฝ. Docker builds an image on the fly with all the needed dependencies and scripts, and runs them. You'll have the same functionalities as the ones from source code, but with a way easier deployment process๐Ÿ‹ Go check out the GitHub repo for more info ๐Ÿ‘‰ https://github.com/AstraBert/obsidian-digest Have fun!โœจ
replied to their post 2 days ago
๐ŸŽ‰๐„๐š๐ซ๐ฅ๐ฒ ๐๐ž๐ฐ ๐˜๐ž๐š๐ซ ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ๐ŸŽ‰ Hi HuggingFacers๐Ÿค—, I decided to ship early this year, and here's what I came up with: ๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft GitHub Repo ๐Ÿ‘‰ https://github.com/AstraBert/PdfItDown PyPi Package ๐Ÿ‘‰ https://pypi.org/project/pdfitdown/ ๐’๐ž๐ง๐“๐ซ๐„๐ฏ ๐ฏ๐Ÿ.๐ŸŽ.๐ŸŽ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น performance of your ๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด models, I have good news for you๐Ÿฅณ๐Ÿฅณ The new release for ๐’๐ž๐ง๐“๐ซ๐„๐ฏ now supports ๐—ฑ๐—ฒ๐—ป๐˜€๐—ฒ and ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ณ๐—ถ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐˜€ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€! GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv Release Notes ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0 PyPi Package ๐Ÿ‘‰ https://pypi.org/project/sentrev/ Happy New Year and have fun!๐Ÿฅ‚
View all activity

Articles

Organizations

Social Post Explorers's profile picture Hugging Face Discord Community's profile picture GreenFit AI's profile picture

as-cle-bert's activity

posted an update about 12 hours ago
view post
Post
193
Are you using Obsidian to write your notes?
If the answer is yes, then this post might be for you!โœ…
I recently created ๐จ๐›๐ฌ๐ข๐๐ข๐š๐ง-๐๐ข๐ ๐ž๐ฌ๐ญ, a Google Gemini-powered application that gives you feedback on style and contents of the documents you have been working on๐Ÿง 

Repo ๐Ÿ‘‰ https://github.com/AstraBert/obsidian-digest
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/obsidian-digest/

The app is available as:
- ๐œ๐จ๐ฆ๐ฆ๐š๐ง๐-๐ฅ๐ข๐ง๐ž ๐ญ๐จ๐จ๐ฅ: install it as a python package with ๐—ฝ๐—ถ๐—ฝ, and execute it from terminal anytime!๐Ÿ“ฆ
-๐ƒ๐ข๐ฌ๐œ๐จ๐ซ๐ ๐๐จ๐ญ ๐›๐ฎ๐ข๐ฅ๐ญ ๐Ÿ๐ซ๐จ๐ฆ ๐ฌ๐จ๐ฎ๐ซ๐œ๐ž ๐œ๐จ๐๐ž: clone the GitHub repo, install the needed dependencies through ๐—ฐ๐—ผ๐—ป๐—ฑ๐—ฎ, and run the bot: you will get hourly messages with suggestions and considerations about your activity on Obsidian in the previous hour๐Ÿค–
- ๐ƒ๐ข๐ฌ๐œ๐จ๐ซ๐ ๐๐จ๐ญ ๐๐ž๐ฉ๐ฅ๐จ๐ฒ๐ž๐ ๐ฅ๐จ๐œ๐š๐ฅ๐ฅ๐ฒ ๐ฐ๐ข๐ญ๐ก ๐๐จ๐œ๐ค๐ž๐ซ ๐œ๐จ๐ฆ๐ฉ๐จ๐ฌ๐ž: clone the GitHub repo and launch ๐—ฑ๐—ผ๐—ฐ๐—ธ๐—ฒ๐—ฟ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ฒ ๐˜‚๐—ฝ. Docker builds an image on the fly with all the needed dependencies and scripts, and runs them. You'll have the same functionalities as the ones from source code, but with a way easier deployment process๐Ÿ‹

Go check out the GitHub repo for more info ๐Ÿ‘‰ https://github.com/AstraBert/obsidian-digest

Have fun!โœจ
replied to their post 2 days ago
view reply

Hi and thanks a lot for the specification!๐Ÿฅฐ

Just as a note from my side, in the article I specify that there is a difference between "open weights" and "open source" models, and I link this blog post: https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/ for a deeper explanation of the difference. I never (and I would never) claimed that Llama is open source, let alone a free software (see the introduction in this article of mine on privacy and data "stealing" risks: https://huggingface.co/blog/as-cle-bert/build-an-ai-powered-search-engine-from-scratch).

And I would have gladly used also DeepSeek, if it had been available on HuggingChat! :)

I nevertheless highly appreciate your comment and I'll for sure be more cautious in using the word "open/open source" in the future. Thanks!โœจ

replied to their post 2 days ago
view reply

Both PdfItDown and SenTrEv only work with text for now: in future releases, support for image will be added :)
For text extraction, I use PyPDF + Langchain

posted an update 3 days ago
view post
Post
1918
๐ŸŽ‰๐„๐š๐ซ๐ฅ๐ฒ ๐๐ž๐ฐ ๐˜๐ž๐š๐ซ ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ๐ŸŽ‰

Hi HuggingFacers๐Ÿค—, I decided to ship early this year, and here's what I came up with:

๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo ๐Ÿ‘‰ https://github.com/AstraBert/PdfItDown
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/pdfitdown/

๐’๐ž๐ง๐“๐ซ๐„๐ฏ ๐ฏ๐Ÿ.๐ŸŽ.๐ŸŽ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น performance of your ๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด models, I have good news for you๐Ÿฅณ๐Ÿฅณ
The new release for ๐’๐ž๐ง๐“๐ซ๐„๐ฏ now supports ๐—ฑ๐—ฒ๐—ป๐˜€๐—ฒ and ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ณ๐—ถ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐˜€ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€!
GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv
Release Notes ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/sentrev/

Happy New Year and have fun!๐Ÿฅ‚
  • 2 replies
ยท
reacted to nroggendorff's post with โž• 3 days ago
view post
Post
5078
hey nvidia, can you send me a gpu?
comment or react if you want ~~me~~ to get one too. ๐Ÿ‘‰๐Ÿ‘ˆ
ยท
posted an update 5 days ago
view post
Post
509
Hi HF Community!๐Ÿค—

As my last 2024 contribution, I decided to write an article about a Competitive Debate Championship simulation I ran with 5 LLMs as competitors and 2 as judges:

https://huggingface.co/blog/as-cle-bert/debate-championship-for-llms

The article covers code, analyses and results, and you can find everything to reproduce this tournament in the GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/DebateLLM-Championship

I also released a dataset related to the data (motions, arguments, topics, winners...) collected during the tournament ๐Ÿ‘‰ as-cle-bert/DebateLLMs

Happy reading and happy new yeAIr!๐ŸŽ‰
  • 3 replies
ยท
posted an update 9 days ago
posted an update 11 days ago
view post
Post
1700
Hi HuggingFacers!๐Ÿคถ๐Ÿผ

As my last 2024 project, I've dropped a Discord Bot that knows a lot about Pokemons๐Ÿฆ‹

GitHub ๐Ÿ‘‰ https://github.com/AstraBert/Pokemon-Bot
Demo Space ๐Ÿ‘‰ as-cle-bert/pokemon-bot

The bot integrates:
- Chat features (Cohere's Command-R) with RAG functionalities (hybrid search and reranking with Qdrant) and chat memory (managed through PostgreSQL) to produce information about Pokemons
- Image-based search to identify Pokemons from their images (via Qdrant)
- Card package random extraction and description

HuggingFace๐Ÿค—, as usual, plays the most important role in the application stack, with the following models:

- sentence-transformers/LaBSE
- prithivida/Splade_PP_en_v1
- facebook/dinov2-large

And datasets:

- Karbo31881/Pokemon_images
- wanghaofan/pokemon-wiki-captions
- TheFusion21/PokemonCards

Have fun!๐Ÿ•
posted an update 24 days ago
posted an update about 1 month ago
view post
Post
1417
Hi HuggingFacers!๐Ÿค—
December is here and time has come, for most of us, to wrap up our code projects and take stock of our 2024 contributions๐Ÿ—“๏ธ
In order to do this, I made a small Gradio application, what-a-git-year:

as-cle-bert/what-a-git-year

that scrapes information from your GitHub profile and summarizes them, producing also nice plots๐Ÿ“Š
Find also the GitHub repo here: https://github.com/AstraBert/what-a-git-year โญ

Hope that everyone had a Git year!๐ŸŽ‰
posted an update about 1 month ago
view post
Post
1043
Hi there!๐Ÿค—

I just deployed a Streamlit-based space on HF that fetches your Home Feed on BlueSky and summarizes it with Cohere's CommandR via Langchain๐Ÿงช

Find it here:
as-cle-bert/bsky-feedllama-demo

I'm also working on a Gradio local implementation with Llama3.2 that for now only works with source code and doesn't have docs, but that will be soon supported by Docker๐Ÿณ and have a nice README:

https://github.com/AstraBert/bluesky-feedllama

Contributions and feedback are always welcome!๐Ÿค—๐Ÿฆ‹
posted an update about 1 month ago
view post
Post
1264
Hi HuggingFacers!๐Ÿค—
I'm thrilled to introduce my latest project: ๐—ฆ๐—ฒ๐—ป๐—ง๐—ฟ๐—˜๐˜ƒ (๐—ฆ๐—ฒ๐—ปtence ๐—ง๐—ฟansformers ๐—˜๐˜ƒaluator), a python package that offers simple customizable evaluation for text retrieval accuracy and time performance of Sentence Transformers-compatible text embedders on PDF data!๐Ÿ“Š

Learn more in my LinkedIn post: https://www.linkedin.com/posts/astra-clelia-bertelli-583904297_python-embedders-semanticsearch-activity-7266754133557190656-j1e3

And on the GitHub repo: https://github.com/AstraBert/SenTrEv

Have fun!๐Ÿ•
posted an update 2 months ago
view post
Post
1667
Hi HugginfgFacers!๐Ÿค—

If you're into biomedical sciences, you will know the pain that, sometimes, searching PubMed can be๐Ÿ™‡โ€โ™€๏ธ

For these purposes, I built a bot that scrapes PubMed for you, starting from the exact title of a publication or key word search - all beautifully rendered through Gradioโœ…

Find it here: as-cle-bert/BioMedicalPapersBot

And here's the GitHub repository๐Ÿฑ: https://github.com/AstraBert/BioMedicalPapersBot

It's also available as a Docker image!๐Ÿณ

docker pull ghcr.io/astrabert/biomedicalpapersbot:main


Best of luck with your research!

PS: in the very near future some AI summarization features will be included!
posted an update 2 months ago
view post
Post
764
Hi there HuggingFacers!๐Ÿค—

Are you working with Streamlit on Spaces and struggling with authentication and user management?๐Ÿง

Well, you can check out my last community article (https://huggingface.co/blog/as-cle-bert/streamlit-supabase-auth-ui) on a new python package I've been working on, that connects Supabase to Streamlit UI, in order to create a seamless authentication for your seamless Streamlit apps!๐Ÿš€

You can find a demo of it on Spaces: as-cle-bert/streamlit-supabase-auth-ui

Have fun!๐Ÿ•
posted an update 3 months ago
view post
Post
3216
Hi HuggingFacers!๐Ÿค—

As you may have probably heard, in the past weeks three Tech Giants (Microsoft, Amazon and Google) announced that they would bet on nuclear reactors to feed the surging energy demand of data centers, driven by increasing AI data and computational flows.

I try to explain the state of AI energy consumptions, its environmental impact and the key points of "turning AI nuclear" in my last article on HF community blog: https://huggingface.co/blog/as-cle-bert/ai-is-turning-nuclear-a-review

Enjoy the reading!๐ŸŒฑ
posted an update 3 months ago
view post
Post
1357
Hi there HuggingFacers!

Have you ever dreamt of an improbable books crossover, like Frodo from ๐˜“๐˜ฐ๐˜ณ๐˜ฅ ๐˜ฐ๐˜ง ๐˜ต๐˜ฉ๐˜ฆ ๐˜™๐˜ช๐˜ฏ๐˜จ๐˜ด becoming the main character of the ๐˜–๐˜ฅ๐˜บ๐˜ด๐˜ด๐˜ฆ๐˜บ or Emma Bovary from ๐˜”๐˜ข๐˜ฅ๐˜ข๐˜ฎ๐˜ฆ ๐˜‰๐˜ฐ๐˜ท๐˜ข๐˜ณ๐˜บ acting as a modern-days Shakespearean Juliet?

Well, all of this is now possible! I'm thrilled to introduce my latest opensource product for storytelling: ๐›๐จ๐จ๐ค๐ฌ-๐ฆ๐ข๐ฑ๐ž๐ซ-๐š๐ข ๐ฏ๐ŸŽ.๐ŸŽ.๐ŸŽ !

Built with ReactJS and shipped directly to you on Spaces thanks to Docker, this webapp combines the power of two AI tools:

- gpt-4o-mini by OpenAI, which takes care of cooking new and intriguing plots starting from the user's instructions, the titles and the summaries of the two books to mix (summaries are scraped through Wikipedia)
- text2img realtime API by ModelsLab, which provides a stable diffusion pipeline to create a thumbnail for your newly-generated story

Everything is provided under a simple and intuitive UI, which uses chatscope's React template kit.
Curious of trying? The app is already live at:

as-cle-bert/books-mixer-ai

And you can also have a tour of the GitHub repo (and leave a little โญ while you're there):

https://github.com/AstraBert/books-mixer-ai

The documentation is still under construction, but will become available soon๐Ÿ˜Š

Have fun!๐Ÿ“š๐Ÿ“š
posted an update 5 months ago
view post
Post
5060
Hi HF Community!๐Ÿค—

In the past days, OpenAI announced their search engine, SearchGPT: today, I'm glad to introduce you SearchPhi, an AI-powered and open-source web search tool that aims to reproduce similar features to SearchGPT, built upon microsoft/Phi-3-mini-4k-instruct, llama.cpp๐Ÿฆ™ and Streamlit.
Although not as capable as SearchGPT, SearchPhi v0.0-beta.0 is a first step toward a fully functional and multimodal search engine :)
If you want to know more, head over to the GitHub repository (https://github.com/AstraBert/SearchPhi) and, to test it out, use this HF space: as-cle-bert/SearchPhi
Have fun!๐Ÿฑ
posted an update 6 months ago
view post
Post
2601
Hi HF community!๐Ÿค—
Hope y'all are as excited as me for the release of Llama 3.1! ๐Ÿฆ™
Following the release, I built a space exploiting HF Inference API, thanks to a recipe you can find in this awesome GitHub repo (https://github.com/huggingface/huggingface-llama-recipes/): you can now run Llama-3.1-405B customizing its system instructions and other parameters, for free! ๐Ÿ˜‡
Follow this link: as-cle-bert/Llama-3.1-405B-FP8 and let the fun begin!๐Ÿ•
  • 1 reply
ยท
posted an update 6 months ago
view post
Post
1403
Hi HuggingFacers!๐Ÿค—

Good news concerning as-cle-bert/smolLM-arena, the chat arena where you can compare some of the Small Language Models (<1.7B) on the Hub and cast your vote to choose the best!๐Ÿ“ฑ
The space now has a new interface with chatbots instead of textboxs, it runs faster and it also comes with usage instructions :)
Have fun!๐Ÿ•
replied to their post 6 months ago
view reply

The SmolLM series is specifically designed to run on devices like smartphones, yes :) And, concerning the arena for models 7 to 20B, I didn't want to spoiler it, but It's coming soon! ;)