Clelia (Astra) Bertelli PRO

as-cle-bert

https://www.cleliasportfolio.xyz

AI & ML interests

Recent Activity

posted an update about 12 hours ago

Are you using Obsidian to write your notes? If the answer is yes, then this post might be for you!✅ I recently created 𝐨𝐛𝐬𝐢𝐝𝐢𝐚𝐧-𝐝𝐢𝐠𝐞𝐬𝐭, a Google Gemini-powered application that gives you feedback on style and contents of the documents you have been working on🧠 Repo 👉 https://github.com/AstraBert/obsidian-digest PyPi Package 👉 https://pypi.org/project/obsidian-digest/ The app is available as: - 𝐜𝐨𝐦𝐦𝐚𝐧𝐝-𝐥𝐢𝐧𝐞 𝐭𝐨𝐨𝐥: install it as a python package with 𝗽𝗶𝗽, and execute it from terminal anytime!📦 -𝐃𝐢𝐬𝐜𝐨𝐫𝐝 𝐁𝐨𝐭 𝐛𝐮𝐢𝐥𝐭 𝐟𝐫𝐨𝐦 𝐬𝐨𝐮𝐫𝐜𝐞 𝐜𝐨𝐝𝐞: clone the GitHub repo, install the needed dependencies through 𝗰𝗼𝗻𝗱𝗮, and run the bot: you will get hourly messages with suggestions and considerations about your activity on Obsidian in the previous hour🤖 - 𝐃𝐢𝐬𝐜𝐨𝐫𝐝 𝐁𝐨𝐭 𝐝𝐞𝐩𝐥𝐨𝐲𝐞𝐝 𝐥𝐨𝐜𝐚𝐥𝐥𝐲 𝐰𝐢𝐭𝐡 𝐝𝐨𝐜𝐤𝐞𝐫 𝐜𝐨𝐦𝐩𝐨𝐬𝐞: clone the GitHub repo and launch 𝗱𝗼𝗰𝗸𝗲𝗿 𝗰𝗼𝗺𝗽𝗼𝘀𝗲 𝘂𝗽. Docker builds an image on the fly with all the needed dependencies and scripts, and runs them. You'll have the same functionalities as the ones from source code, but with a way easier deployment process🐋 Go check out the GitHub repo for more info 👉 https://github.com/AstraBert/obsidian-digest Have fun!✨

replied to their post 2 days ago

Hi HF Community!🤗 As my last 2024 contribution, I decided to write an article about a Competitive Debate Championship simulation I ran with 5 LLMs as competitors and 2 as judges: https://huggingface.co/blog/as-cle-bert/debate-championship-for-llms The article covers code, analyses and results, and you can find everything to reproduce this tournament in the GitHub repo 👉 https://github.com/AstraBert/DebateLLM-Championship I also released a dataset related to the data (motions, arguments, topics, winners...) collected during the tournament 👉 https://huggingface.co/datasets/as-cle-bert/DebateLLMs Happy reading and happy new yeAIr!🎉

replied to their post 2 days ago

🎉𝐄𝐚𝐫𝐥𝐲 𝐍𝐞𝐰 𝐘𝐞𝐚𝐫 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬🎉 Hi HuggingFacers🤗, I decided to ship early this year, and here's what I came up with: 𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧 (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft GitHub Repo 👉 https://github.com/AstraBert/PdfItDown PyPi Package 👉 https://pypi.org/project/pdfitdown/ 𝐒𝐞𝐧𝐓𝐫𝐄𝐯 𝐯𝟏.𝟎.𝟎 (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 performance of your 𝘁𝗲𝘅𝘁 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 models, I have good news for you🥳🥳 The new release for 𝐒𝐞𝐧𝐓𝐫𝐄𝐯 now supports 𝗱𝗲𝗻𝘀𝗲 and 𝘀𝗽𝗮𝗿𝘀𝗲 retrieval (thanks to FastEmbed by Qdrant) with 𝘁𝗲𝘅𝘁-𝗯𝗮𝘀𝗲𝗱 𝗳𝗶𝗹𝗲 𝗳𝗼𝗿𝗺𝗮𝘁𝘀 (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new 𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝗰𝗲 𝗺𝗲𝘁𝗿𝗶𝗰𝘀! GitHub repo 👉 https://github.com/AstraBert/SenTrEv Release Notes 👉 https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0 PyPi Package 👉 https://pypi.org/project/sentrev/ Happy New Year and have fun!🥂

View all activity

Articles

Organizations

as-cle-bert's activity

posted an update about 12 hours ago

Post

193

Are you using Obsidian to write your notes?
If the answer is yes, then this post might be for you!✅
I recently created 𝐨𝐛𝐬𝐢𝐝𝐢𝐚𝐧-𝐝𝐢𝐠𝐞𝐬𝐭, a Google Gemini-powered application that gives you feedback on style and contents of the documents you have been working on🧠

Repo 👉 https://github.com/AstraBert/obsidian-digest
PyPi Package 👉 https://pypi.org/project/obsidian-digest/

The app is available as:
- 𝐜𝐨𝐦𝐦𝐚𝐧𝐝-𝐥𝐢𝐧𝐞 𝐭𝐨𝐨𝐥: install it as a python package with 𝗽𝗶𝗽, and execute it from terminal anytime!📦
-𝐃𝐢𝐬𝐜𝐨𝐫𝐝 𝐁𝐨𝐭 𝐛𝐮𝐢𝐥𝐭 𝐟𝐫𝐨𝐦 𝐬𝐨𝐮𝐫𝐜𝐞 𝐜𝐨𝐝𝐞: clone the GitHub repo, install the needed dependencies through 𝗰𝗼𝗻𝗱𝗮, and run the bot: you will get hourly messages with suggestions and considerations about your activity on Obsidian in the previous hour🤖
- 𝐃𝐢𝐬𝐜𝐨𝐫𝐝 𝐁𝐨𝐭 𝐝𝐞𝐩𝐥𝐨𝐲𝐞𝐝 𝐥𝐨𝐜𝐚𝐥𝐥𝐲 𝐰𝐢𝐭𝐡 𝐝𝐨𝐜𝐤𝐞𝐫 𝐜𝐨𝐦𝐩𝐨𝐬𝐞: clone the GitHub repo and launch 𝗱𝗼𝗰𝗸𝗲𝗿 𝗰𝗼𝗺𝗽𝗼𝘀𝗲 𝘂𝗽. Docker builds an image on the fly with all the needed dependencies and scripts, and runs them. You'll have the same functionalities as the ones from source code, but with a way easier deployment process🐋

Go check out the GitHub repo for more info 👉 https://github.com/AstraBert/obsidian-digest

Have fun!✨

replied to their post 2 days ago

Hi and thanks a lot for the specification!🥰

Just as a note from my side, in the article I specify that there is a difference between "open weights" and "open source" models, and I link this blog post: https://www.agora.software/en/llm-open-source-open-weight-or-proprietary/ for a deeper explanation of the difference. I never (and I would never) claimed that Llama is open source, let alone a free software (see the introduction in this article of mine on privacy and data "stealing" risks: https://huggingface.co/blog/as-cle-bert/build-an-ai-powered-search-engine-from-scratch).

And I would have gladly used also DeepSeek, if it had been available on HuggingChat! :)

I nevertheless highly appreciate your comment and I'll for sure be more cautious in using the word "open/open source" in the future. Thanks!✨

replied to their post 2 days ago

Both PdfItDown and SenTrEv only work with text for now: in future releases, support for image will be added :)
For text extraction, I use PyPDF + Langchain

posted an update 3 days ago

Post

1918

🎉𝐄𝐚𝐫𝐥𝐲 𝐍𝐞𝐰 𝐘𝐞𝐚𝐫 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬🎉

Hi HuggingFacers🤗, I decided to ship early this year, and here's what I came up with:

𝐏𝐝𝐟𝐈𝐭𝐃𝐨𝐰𝐧 (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo 👉 https://github.com/AstraBert/PdfItDown
PyPi Package 👉 https://pypi.org/project/pdfitdown/

𝐒𝐞𝐧𝐓𝐫𝐄𝐯 𝐯𝟏.𝟎.𝟎 (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 performance of your 𝘁𝗲𝘅𝘁 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴 models, I have good news for you🥳🥳
The new release for 𝐒𝐞𝐧𝐓𝐫𝐄𝐯 now supports 𝗱𝗲𝗻𝘀𝗲 and 𝘀𝗽𝗮𝗿𝘀𝗲 retrieval (thanks to FastEmbed by Qdrant) with 𝘁𝗲𝘅𝘁-𝗯𝗮𝘀𝗲𝗱 𝗳𝗶𝗹𝗲 𝗳𝗼𝗿𝗺𝗮𝘁𝘀 (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new 𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝗰𝗲 𝗺𝗲𝘁𝗿𝗶𝗰𝘀!
GitHub repo 👉 https://github.com/AstraBert/SenTrEv
Release Notes 👉 https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package 👉 https://pypi.org/project/sentrev/

Happy New Year and have fun!🥂

2 replies

reacted to nroggendorff's post with ➕ 3 days ago

Post

5078

hey nvidia, can you send me a gpu?
comment or react if you want ~~me~~ to get one too. 👉👈

22 replies

posted an update 5 days ago

Post

509

Hi HF Community!🤗

As my last 2024 contribution, I decided to write an article about a Competitive Debate Championship simulation I ran with 5 LLMs as competitors and 2 as judges:

https://huggingface.co/blog/as-cle-bert/debate-championship-for-llms

The article covers code, analyses and results, and you can find everything to reproduce this tournament in the GitHub repo 👉 https://github.com/AstraBert/DebateLLM-Championship

I also released a dataset related to the data (motions, arguments, topics, winners...) collected during the tournament 👉 as-cle-bert/DebateLLMs

Happy reading and happy new yeAIr!🎉

3 replies

posted an update 9 days ago

Post

2149

I got my GitHub Wrapped for 2024 today!🥂

Get yours here on HuggingFace 👉 as-cle-bert/what-a-git-year

GitHub repo with the code to reproduce it 👉 https://github.com/AstraBert/what-a-git-year

Hope that everybody had a Git year!🎉

1 reply

posted an update 11 days ago

Post

1700

Hi HuggingFacers!🤶🏼

As my last 2024 project, I've dropped a Discord Bot that knows a lot about Pokemons🦋

GitHub 👉 https://github.com/AstraBert/Pokemon-Bot
Demo Space 👉 as-cle-bert/pokemon-bot

The bot integrates:
- Chat features (Cohere's Command-R) with RAG functionalities (hybrid search and reranking with Qdrant) and chat memory (managed through PostgreSQL) to produce information about Pokemons
- Image-based search to identify Pokemons from their images (via Qdrant)
- Card package random extraction and description

HuggingFace🤗, as usual, plays the most important role in the application stack, with the following models:

- sentence-transformers/LaBSE
- prithivida/Splade_PP_en_v1
- facebook/dinov2-large

And datasets:

- Karbo31881/Pokemon_images
- wanghaofan/pokemon-wiki-captions
- TheFusion21/PokemonCards

Have fun!🍕

posted an update 24 days ago

Post

598

Hi HF Community!

I just published a blog article on building PrAIvateSearch (https://github.com/AstraBert/PrAIvateSearch), a user-owend, local and open-source AI-powered search engine🔍:

https://huggingface.co/blog/as-cle-bert/build-an-ai-powered-search-engine-from-scratch

"Own your AI, search the web with it🌐😎"

Feel free to try it out and contribute to it on GitHub: let's make OSS AI grown and thrive!🚀

posted an update about 1 month ago

Post

1417

Hi HuggingFacers!🤗
December is here and time has come, for most of us, to wrap up our code projects and take stock of our 2024 contributions🗓️
In order to do this, I made a small Gradio application, what-a-git-year:

as-cle-bert/what-a-git-year

that scrapes information from your GitHub profile and summarizes them, producing also nice plots📊
Find also the GitHub repo here: https://github.com/AstraBert/what-a-git-year ⭐

Hope that everyone had a Git year!🎉

posted an update about 1 month ago

Post

1043

Hi there!🤗

I just deployed a Streamlit-based space on HF that fetches your Home Feed on BlueSky and summarizes it with Cohere's CommandR via Langchain🧪

Find it here:
as-cle-bert/bsky-feedllama-demo

I'm also working on a Gradio local implementation with Llama3.2 that for now only works with source code and doesn't have docs, but that will be soon supported by Docker🐳 and have a nice README:

https://github.com/AstraBert/bluesky-feedllama

Contributions and feedback are always welcome!🤗🦋

posted an update about 1 month ago

Post

1264

Hi HuggingFacers!🤗
I'm thrilled to introduce my latest project: 𝗦𝗲𝗻𝗧𝗿𝗘𝘃 (𝗦𝗲𝗻tence 𝗧𝗿ansformers 𝗘𝘃aluator), a python package that offers simple customizable evaluation for text retrieval accuracy and time performance of Sentence Transformers-compatible text embedders on PDF data!📊

Learn more in my LinkedIn post: https://www.linkedin.com/posts/astra-clelia-bertelli-583904297_python-embedders-semanticsearch-activity-7266754133557190656-j1e3

And on the GitHub repo: https://github.com/AstraBert/SenTrEv

Have fun!🍕

posted an update 2 months ago

Post

1667

Hi HugginfgFacers!🤗

If you're into biomedical sciences, you will know the pain that, sometimes, searching PubMed can be🙇‍♀️

For these purposes, I built a bot that scrapes PubMed for you, starting from the exact title of a publication or key word search - all beautifully rendered through Gradio✅

Find it here: as-cle-bert/BioMedicalPapersBot

And here's the GitHub repository🐱: https://github.com/AstraBert/BioMedicalPapersBot

It's also available as a Docker image!🐳

docker pull ghcr.io/astrabert/biomedicalpapersbot:main

Best of luck with your research!

PS: in the very near future some AI summarization features will be included!

posted an update 2 months ago

Post

764

Hi there HuggingFacers!🤗

Are you working with Streamlit on Spaces and struggling with authentication and user management?🧐

Well, you can check out my last community article (https://huggingface.co/blog/as-cle-bert/streamlit-supabase-auth-ui) on a new python package I've been working on, that connects Supabase to Streamlit UI, in order to create a seamless authentication for your seamless Streamlit apps!🚀

You can find a demo of it on Spaces: as-cle-bert/streamlit-supabase-auth-ui

Have fun!🍕

posted an update 3 months ago

Post

3216

Hi HuggingFacers!🤗

As you may have probably heard, in the past weeks three Tech Giants (Microsoft, Amazon and Google) announced that they would bet on nuclear reactors to feed the surging energy demand of data centers, driven by increasing AI data and computational flows.

I try to explain the state of AI energy consumptions, its environmental impact and the key points of "turning AI nuclear" in my last article on HF community blog: https://huggingface.co/blog/as-cle-bert/ai-is-turning-nuclear-a-review

Enjoy the reading!🌱

posted an update 3 months ago

Post

1357

Hi there HuggingFacers!

Have you ever dreamt of an improbable books crossover, like Frodo from 𝘓𝘰𝘳𝘥 𝘰𝘧 𝘵𝘩𝘦 𝘙𝘪𝘯𝘨𝘴 becoming the main character of the 𝘖𝘥𝘺𝘴𝘴𝘦𝘺 or Emma Bovary from 𝘔𝘢𝘥𝘢𝘮𝘦 𝘉𝘰𝘷𝘢𝘳𝘺 acting as a modern-days Shakespearean Juliet?

Well, all of this is now possible! I'm thrilled to introduce my latest opensource product for storytelling: 𝐛𝐨𝐨𝐤𝐬-𝐦𝐢𝐱𝐞𝐫-𝐚𝐢 𝐯𝟎.𝟎.𝟎 !

Built with ReactJS and shipped directly to you on Spaces thanks to Docker, this webapp combines the power of two AI tools:

- gpt-4o-mini by OpenAI, which takes care of cooking new and intriguing plots starting from the user's instructions, the titles and the summaries of the two books to mix (summaries are scraped through Wikipedia)
- text2img realtime API by ModelsLab, which provides a stable diffusion pipeline to create a thumbnail for your newly-generated story

Everything is provided under a simple and intuitive UI, which uses chatscope's React template kit.
Curious of trying? The app is already live at:

as-cle-bert/books-mixer-ai

And you can also have a tour of the GitHub repo (and leave a little ⭐ while you're there):

https://github.com/AstraBert/books-mixer-ai

The documentation is still under construction, but will become available soon😊

Have fun!📚📚

posted an update 5 months ago

Post

5060

Hi HF Community!🤗

In the past days, OpenAI announced their search engine, SearchGPT: today, I'm glad to introduce you SearchPhi, an AI-powered and open-source web search tool that aims to reproduce similar features to SearchGPT, built upon microsoft/Phi-3-mini-4k-instruct, llama.cpp🦙 and Streamlit.
Although not as capable as SearchGPT, SearchPhi v0.0-beta.0 is a first step toward a fully functional and multimodal search engine :)
If you want to know more, head over to the GitHub repository (https://github.com/AstraBert/SearchPhi) and, to test it out, use this HF space: as-cle-bert/SearchPhi
Have fun!🐱

posted an update 6 months ago

Post

2601

Hi HF community!🤗
Hope y'all are as excited as me for the release of Llama 3.1! 🦙
Following the release, I built a space exploiting HF Inference API, thanks to a recipe you can find in this awesome GitHub repo (https://github.com/huggingface/huggingface-llama-recipes/): you can now run Llama-3.1-405B customizing its system instructions and other parameters, for free! 😇
Follow this link: as-cle-bert/Llama-3.1-405B-FP8 and let the fun begin!🍕

1 reply

posted an update 6 months ago

Post

1403

Hi HuggingFacers!🤗

Good news concerning as-cle-bert/smolLM-arena, the chat arena where you can compare some of the Small Language Models (<1.7B) on the Hub and cast your vote to choose the best!📱
The space now has a new interface with chatbots instead of textboxs, it runs faster and it also comes with usage instructions :)
Have fun!🍕

replied to their post 6 months ago

The SmolLM series is specifically designed to run on devices like smartphones, yes :) And, concerning the arena for models 7 to 20B, I didn't want to spoiler it, but It's coming soon! ;)

Clelia (Astra) Bertelli PRO

AI & ML interests

Recent Activity

Articles

Debate Championship for LLMs

Building an AI-powered search engine from scratch

streamlit_supabase_auth_ui

AI is turning nuclear: a review

Is AI carbon footprint worrisome?

_Repetita iuvant_: how to improve AI code generation

BrAIn: next generation neurons?

What is going on with AlphaFold3?

Organizations

as-cle-bert's activity