Voxel51

company
Verified
Activity Feed

AI & ML interests

Visual AI, Computer vision, Multimodal AI, Data Centric AI

Recent Activity

Voxel51's activity

abhishekย 
posted an update about 1 month ago
view post
Post
1713
๐ŸŽ‰ SUPER BLACK FRIDAY DEAL ๐ŸŽ‰

Train almost any model on a variety of tasks such as llm finetuning, text classification/regression, summarization, question answering, image classification/regression, object detection, tabular data, etc for FREE using AutoTrain locally. ๐Ÿ”ฅ
https://github.com/huggingface/autotrain-advanced
abhishekย 
posted an update about 2 months ago
view post
Post
5535
INTRODUCING Hugging Face AutoTrain Client ๐Ÿ”ฅ
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks ๐Ÿค—

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced
  • 6 replies
ยท
abhishekย 
posted an update 3 months ago
abhishekย 
posted an update 4 months ago
abhishekย 
posted an update 5 months ago
view post
Post
1856
๐Ÿšจ NEW TASK ALERT ๐Ÿšจ
Extractive Question Answering: because sometimes generative is not all you need ๐Ÿ˜‰
AutoTrain is the only open-source, no code solution to offer so many tasks across different modalities. Current task count: 23 ๐Ÿš€
Check out the blog post on getting started with this task: https://huggingface.co/blog/abhishek/extractive-qa-autotrain
harpreetsahotaย 
posted an update 7 months ago
view post
Post
2147
The Coachella of Computer Vision, CVPR, is right around the corner. In anticipation of the conference, I curated a dataset of the papers.

I'll have a technical blog post out tomorrow doing some analysis on the dataset, but I'm so hyped that I wanted to get it out to the community ASAP.

The dataset consists of the following fields:

- An image of the first page of the paper
- title: The title of the paper
- authors_list: The list of authors
- abstract: The abstract of the paper
- arxiv_link: Link to the paper on arXiv
- other_link: Link to the project page, if found
- category_name: The primary category this paper according to [arXiv taxonomy](https://arxiv.org/category_taxonomy)
- all_categories: All categories this paper falls into, according to arXiv taxonomy
- keywords: Extracted using GPT-4o

Here's how I created the dataset ๐Ÿ‘‡๐Ÿผ

Generic code for building this dataset can be found [here](https://github.com/harpreetsahota204/CVPR-2024-Papers).

This dataset was built using the following steps:

- Scrape the CVPR 2024 website for accepted papers
- Use DuckDuckGo to search for a link to the paper's abstract on arXiv
- Use arXiv.py (python wrapper for the arXiv API) to extract the abstract and categories, and download the pdf for each paper
- Use pdf2image to save the image of paper's first page
- Use GPT-4o to extract keywords from the abstract

Voxel51/CVPR_2024_Papers
abhishekย 
posted an update 7 months ago
abhishekย 
posted an update 8 months ago
view post
Post
2936
๐Ÿšจ NEW TASK ALERT ๐Ÿšจ
๐ŸŽ‰ AutoTrain now supports Object Detection! ๐ŸŽ‰
Transform your projects with these powerful new features:
๐Ÿ”น Fine-tune any supported model from the Hugging Face Hub
๐Ÿ”น Seamless logging with TensorBoard or W&B
๐Ÿ”น Support for local and hub datasets
๐Ÿ”น Configurable training for tailored results
๐Ÿ”น Train locally or leverage Hugging Face Spaces
๐Ÿ”น Deployment-ready with API inference or Hugging Face endpoints
AutoTrain: https://hf.co/autotrain
abhishekย 
posted an update 8 months ago
view post
Post
3063
๐Ÿš€๐Ÿš€๐Ÿš€๐Ÿš€ Introducing AutoTrain Configs! ๐Ÿš€๐Ÿš€๐Ÿš€๐Ÿš€
Now you can train models using yaml config files! ๐Ÿ’ฅ These configs are easy to understand and are not at all overwhelming. So, even a person with almost zero knowledge of machine learning can train state of the art models without writing any code. Check out example configs in the config directory of autotrain-advanced github repo and feel free to share configs by creating a pull request ๐Ÿค—
Github repo: https://github.com/huggingface/autotrain-advanced
  • 2 replies
ยท
abhishekย 
posted an update 9 months ago
abhishekย 
posted an update 9 months ago
view post
Post
2371
Trained another version of llama3-8b-instruct which beats the base model. This time without losing too many points on gsm8k benchmark. Again, using AutoTrain ๐Ÿ’ฅ pip install autotrain-advanced
Trained model: abhishek/autotrain-llama3-orpo-v2
  • 1 reply
ยท
abhishekย 
posted an update 9 months ago
view post
Post
3476
With AutoTrain, you can already finetune the latest llama3 models without writing a single line of code. Here's an example finetune of llama3 8b model: abhishek/autotrain-llama3-no-robots
  • 2 replies
ยท
jamarksย 
posted an update 9 months ago
view post
Post
2172
FiftyOne Datasets <> Hugging Face Hub Integration!

As of yesterday's release of FiftyOne 0.23.8, the FiftyOne open source library for dataset curation and visualization is now integrated with the Hugging Face Hub!

You can now load Parquet datasets from the hub and have them converted directly into FiftyOne datasets. To load MNIST, for example:

pip install -U fiftyone


import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub(
    "mnist",
    format="ParquetFilesDataset",
    classification_fields="label",
)
session = fo.launch_app(dataset)


You can also load FiftyOne datasets directly from the hub. Here's how you load the first 1000 samples from the VisDrone dataset:

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub("jamarks/VisDrone2019-DET", max_samples=1000)

# Launch the App
session = fo.launch_app(dataset)


And tying it all together, you can push your FiftyOne datasets directly to the hub:

import fiftyone.zoo as foz
import fiftyone.utils.huggingface as fouh

dataset = foz.load_zoo_dataset("quickstart")
fouh.push_to_hub(dataset, "my-dataset")


Major thanks to @tomaarsen @davanstrien @severo @osanseviero and @julien-c for helping to make this happen!!!

Full documentation and details here: https://docs.voxel51.com/integrations/huggingface.html#huggingface-hub
ยท
harpreetsahotaย 
posted an update 10 months ago
view post
Post
google/gemma-7b-it is super good!

I wasn't convinced at first, but after vibe-checking it...I'm quite impressed.

I've got a notebook here, which is kind of a framework for vibe-checking LLMs.

In this notebook, I take Gemma for a spin on a variety of prompts:
โ€ข [nonsensical tokens]( harpreetsahota/diverse-token-sampler
โ€ข [conversation where I try to get some PII)( harpreetsahota/red-team-prompts-questions)
โ€ข [summarization ability]( lighteval/summarization)
โ€ข [instruction following]( harpreetsahota/Instruction-Following-Evaluation-for-Large-Language-Models
โ€ข [chain of thought reasoning]( ssbuild/alaca_chain-of-thought)

I then used LangChain evaluators (GPT-4 as judge), and track everything in LangSmith. I made public links to the traces where you can inspect the runs.

I hope you find this helpful, and I am certainly open to feedback, criticisms, or ways to improve.

Cheers:

You can find the notebook here: https://colab.research.google.com/drive/1RHzg0FD46kKbiGfTdZw9Fo-DqWzajuoi?usp=sharing
harpreetsahotaย 
posted an update 12 months ago
view post
Post
โœŒ๐ŸผTwo new models dropped today ๐Ÿ‘‡๐Ÿฝ

1) ๐Ÿ‘ฉ๐Ÿพโ€๐Ÿ’ป ๐ƒ๐ž๐œ๐ข๐‚๐จ๐๐ž๐ซ-๐Ÿ”๐

๐Ÿ‘‰๐Ÿฝ Supports ๐Ÿ– ๐ฅ๐š๐ง๐ ๐ฎ๐š๐ ๐ž๐ฌ: C, C# C++, GO, Rust, Python, Java, and Javascript.

๐Ÿ‘‰๐Ÿฝ Released under the ๐€๐ฉ๐š๐œ๐ก๐ž ๐Ÿ.๐ŸŽ ๐ฅ๐ข๐œ๐ž๐ง๐ฌ๐ž

๐ŸฅŠ ๐๐ฎ๐ง๐œ๐ก๐ž๐ฌ ๐š๐›๐จ๐ฏ๐ž ๐ข๐ญ๐ฌ ๐ฐ๐ž๐ข๐ ๐ก๐ญ ๐œ๐ฅ๐š๐ฌ๐ฌ ๐จ๐ง ๐‡๐ฎ๐ฆ๐š๐ง๐„๐ฏ๐š๐ฅ: Beats out CodeGen 2.5 7B and StarCoder 7B on most supported languages. Has a 3-point lead over StarCoderBase 15.5B for Python

๐Ÿ’ป ๐‘ป๐’“๐’š ๐’Š๐’• ๐’๐’–๐’•:

๐Ÿƒ ๐Œ๐จ๐๐ž๐ฅ ๐‚๐š๐ซ๐: Deci/DeciCoder-6B

๐Ÿ““ ๐๐จ๐ญ๐ž๐›๐จ๐จ๐ค: https://colab.research.google.com/drive/1QRbuser0rfUiFmQbesQJLXVtBYZOlKpB

๐Ÿชง ๐‡๐ฎ๐ ๐ ๐ข๐ง๐ ๐…๐š๐œ๐ž ๐’๐ฉ๐š๐œ๐ž: Deci/DeciCoder-6B-Demo

2) ๐ŸŽจ ๐ƒ๐ž๐œ๐ข๐ƒ๐ข๐Ÿ๐Ÿ๐ฎ๐ฌ๐ข๐จ๐ง ๐ฏ๐Ÿ.๐ŸŽ

๐Ÿ‘‰๐Ÿฝ Produces quality images on par with Stable Diffusion v1.5, but ๐Ÿ.๐Ÿ” ๐ญ๐ข๐ฆ๐ž๐ฌ ๐Ÿ๐š๐ฌ๐ญ๐ž๐ซ ๐ข๐ง ๐Ÿ’๐ŸŽ% ๐Ÿ๐ž๐ฐ๐ž๐ซ ๐ข๐ญ๐ž๐ซ๐š๐ญ๐ข๐จ๐ง๐ฌ

๐Ÿ‘‰๐Ÿฝ Employs a ๐ฌ๐ฆ๐š๐ฅ๐ฅ๐ž๐ซ ๐š๐ง๐ ๐Ÿ๐š๐ฌ๐ญ๐ž๐ซ ๐”-๐๐ž๐ญ ๐œ๐จ๐ฆ๐ฉ๐จ๐ง๐ž๐ง๐ญ ๐ฐ๐ก๐ข๐œ๐ก ๐ก๐š๐ฌ ๐Ÿ–๐Ÿ”๐ŸŽ ๐ฆ๐ข๐ฅ๐ฅ๐ข๐จ๐ง ๐ฉ๐š๐ซ๐š๐ฆ๐ž๐ญ๐ž๐ซ๐ฌ.

๐Ÿ‘‰๐Ÿฝ Uses an optimized scheduler, ๐’๐ช๐ฎ๐ž๐ž๐ณ๐ž๐๐ƒ๐๐Œ++, which ๐œ๐ฎ๐ญ๐ฌ ๐๐จ๐ฐ๐ง ๐ญ๐ก๐ž ๐ง๐ฎ๐ฆ๐›๐ž๐ซ ๐จ๐Ÿ ๐ฌ๐ญ๐ž๐ฉ๐ฌ ๐ง๐ž๐ž๐๐ž๐ ๐ญ๐จ ๐ ๐ž๐ง๐ž๐ซ๐š๐ญ๐ž ๐š ๐ช๐ฎ๐š๐ฅ๐ข๐ญ๐ฒ ๐ข๐ฆ๐š๐ ๐ž ๐Ÿ๐ซ๐จ๐ฆ ๐Ÿ๐Ÿ” ๐ญ๐จ ๐Ÿ๐ŸŽ.

๐Ÿ‘‰๐Ÿฝ Released under the ๐‚๐ซ๐ž๐š๐ญ๐ข๐ฏ๐ž๐Œ๐‹ ๐Ž๐ฉ๐ž๐ง ๐‘๐€๐ˆ๐‹++-๐Œ ๐‹๐ข๐œ๐ž๐ง๐ฌ๐ž.

๐Ÿ’ป ๐‘ป๐’“๐’š ๐’Š๐’• ๐’๐’–๐’•:

๐Ÿƒ ๐Œ๐จ๐๐ž๐ฅ ๐‚๐š๐ซ๐: Deci/DeciDiffusion-v2-0

๐Ÿ““ ๐๐จ๐ญ๐ž๐›๐จ๐จ๐ค: https://colab.research.google.com/drive/11Ui_KRtK2DkLHLrW0aa11MiDciW4dTuB

๐Ÿชง ๐‡๐ฎ๐ ๐ ๐ข๐ง๐ ๐…๐š๐œ๐ž ๐’๐ฉ๐š๐œ๐ž: Deci/DeciDiffusion-v2-0

Help support the projects by liking the model cards and the spaces!

Cheers and happy hacking!
abhishekย 
posted an update 12 months ago
view post
Post
Happy to announce, brand new, open-source Hugging Face Competitions platform ๐Ÿš€ Now, create a machine learning competition for your friends, colleagues or the world for FREE* and host it on Hugging Face: the AI community building the future. Creating a competition requires only two steps: pip install competitions, then run competitions create and create competition by answering a few questions ๐Ÿ’ฅ Checkout the github repo: https://github.com/huggingface/competitions and docs: https://hf.co/docs/competitions
ยท
abhishekย 
posted an update about 1 year ago