OuteAI

company

Verified

https://www.outeai.com/

OuteAI

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

edwko new activity 14 days ago

OuteAI/OuteTTS-0.2-500M:Commerical use

edwko new activity 14 days ago

OuteAI/OuteTTS-0.2-500M:Audio prompt?

edwko new activity 18 days ago

OuteAI/OuteTTS-0.2-500M-GGUF:Truncated audios and Latency in generation of speech

View all activity

OuteAI's activity

edwko

in OuteAI/OuteTTS-0.2-500M 14 days ago

Commerical use

#4 opened 21 days ago by

hans00

Audio prompt?

#5 opened 14 days ago by

apepkuss79

edwko

in OuteAI/OuteTTS-0.2-500M-GGUF 18 days ago

Truncated audios and Latency in generation of speech

#1 opened about 1 month ago by

tushar310

edwko

updated a model 23 days ago

OuteAI/wavtokenizer-large-75token-interface

Updated 23 days ago • 3

edwko

in OuteAI/OuteTTS-0.2-500M-GGUF 28 days ago

More language - French

#2 opened 28 days ago by

Hayate72

edwko

in OuteAI/OuteTTS-0.2-500M-Demo about 1 month ago

What GPU are you using for the Gradio demo?

#3 opened about 1 month ago by

Arete7

reach-vb

posted an update about 1 month ago

Post

3613

VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! 🔥

edwko

updated 2 models about 1 month ago

OuteAI/OuteTTS-0.2-500M

Text-to-Speech • Updated Dec 3, 2024 • 6.96k • 275

OuteAI/OuteTTS-0.2-500M-GGUF

Text-to-Speech • Updated Dec 3, 2024 • 2.25k • 70

edwko

in OuteAI/OuteTTS-0.2-500M-Demo about 1 month ago

added Zero-GPU merge and switch in Zero-Spaces

#2 opened about 1 month ago by

ameerazam08

edwko

in OuteAI/OuteTTS-0.2-500M about 1 month ago

Add new language

#2 opened about 1 month ago by

HassanStar

When I run the example code, it automatically downloads the file.Can I save the downloaded file in a specified directory?

#3 opened about 1 month ago by

DenisDing

edwko

updated 2 models about 1 month ago

OuteAI/OuteTTS-0.1-350M-GGUF

Text-to-Speech • Updated Nov 27, 2024 • 247 • 34

OuteAI/OuteTTS-0.1-350M

Text-to-Speech • Updated Nov 27, 2024 • 5.51k • 297

edwko

updated a collection about 1 month ago

OuteTTS

Collection

4 items • Updated Nov 25, 2024 • 12

edwko

updated a Space about 1 month ago

Running

🐠

OuteTTS 0.2 500M Demo

reach-vb

posted an update about 1 month ago

Post

3502

Massive week for Open AI/ ML:

Mistral Pixtral & Instruct Large - ~123B, 128K context, multilingual, json + function calling & open weights
mistralai/Pixtral-Large-Instruct-2411
mistralai/Mistral-Large-Instruct-2411

Allen AI Tülu 70B & 8B - competive with claude 3.5 haiku, beats all major open models like llama 3.1 70B, qwen 2.5 and nemotron
allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
allenai/tulu-3-datasets-673b8df14442393f7213f372

Llava o1 - vlm capable of spontaneous, systematic reasoning, similar to GPT-o1, 11B model outperforms gemini-1.5-pro, gpt-4o-mini, and llama-3.2-90B-vision
Xkev/Llama-3.2V-11B-cot

Black Forest Labs Flux.1 tools - four new state of the art model checkpoints & 2 adapters for fill, depth, canny & redux, open weights
reach-vb/black-forest-labs-flux1-6743847bde9997dd26609817

Jina AI Jina CLIP v2 - general purpose multilingual and multimodal (text & image) embedding model, 900M params, 512 x 512 resolution, matroyoshka representations (1024 to 64)
jinaai/jina-clip-v2

Apple AIM v2 & CoreML MobileCLIP - large scale vision encoders outperform CLIP and SigLIP. CoreML optimised MobileCLIP models
apple/aimv2-6720fe1558d94c7805f7688c
apple/coreml-mobileclip

A lot more got released like, OpenScholar ( OpenScholar/openscholar-v1-67376a89f6a80f448da411a6), smoltalk ( HuggingFaceTB/smoltalk), Hymba ( nvidia/hymba-673c35516c12c4b98b5e845f), Open ASR Leaderboard ( hf-audio/open_asr_leaderboard) and much more..

Can't wait for the next week! 🤗

reach-vb

posted an update about 2 months ago

Post

4362

What a brilliant week for Open Source AI!

Qwen 2.5 Coder by Alibaba - 0.5B / 1.5B / 3B / 7B / 14B/ 32B (Base + Instruct) Code generation LLMs, with 32B tackling giants like Gemnini 1.5 Pro, Claude Sonnet
Qwen/qwen25-coder-66eaa22e6f99801bf65b0c2f

LLM2CLIP from Microsoft - Leverage LLMs to train ultra-powerful CLIP models! Boosts performance over the previous SOTA by ~17%
microsoft/llm2clip-672323a266173cfa40b32d4c

Athene v2 Chat & Agent by NexusFlow - SoTA general LLM fine-tuned from Qwen 2.5 72B excels at Chat + Function Calling/ JSON/ Agents
Nexusflow/athene-v2-6735b85e505981a794fb02cc

Orca Agent Instruct by Microsoft - 1 million instruct pairs covering text editing, creative writing, coding, reading comprehension, etc - permissively licensed
microsoft/orca-agentinstruct-1M-v1

Ultravox by FixieAI - 70B/ 8B model approaching GPT4o level, pick any LLM, train an adapter with Whisper as Audio Encoder
reach-vb/ultravox-audio-language-model-release-67373b602af0a52b2a88ae71

JanusFlow 1.3 by DeepSeek - Next iteration of their Unified MultiModal LLM Janus with RectifiedFlow
deepseek-ai/JanusFlow-1.3B

Common Corpus by Pleais - 2,003,039,184,047 multilingual, commercially permissive and high quality tokens!
PleIAs/common_corpus

I'm sure I missed a lot, can't wait for the next week!

Put down in comments what I missed! 🤗

reach-vb

posted an update 2 months ago

Post

1617

Smol TTS models are here! OuteTTS-0.1-350M - Zero shot voice cloning, built on LLaMa architecture, CC-BY license! 🔥

> Pure language modeling approach to TTS
> Zero-shot voice cloning
> LLaMa architecture w/ Audio tokens (WavTokenizer)
> BONUS: Works on-device w/ llama.cpp ⚡

Three-step approach to TTS:

> Audio tokenization using WavTokenizer (75 tok per second)
> CTC forced alignment for word-to-audio token mapping
> Structured prompt creation w/ transcription, duration, audio tokens

The model is extremely impressive for 350M parameters! Kudos to the
OuteAI team on such a brilliant feat - I'd love to see this be applied on larger data and smarter backbones like SmolLM 🤗

Check out the models here: OuteAI/outetts-6728aa71a53a076e4ba4817c

AI & ML interests

Recent Activity

Team members 2

OuteAI's activity

Commerical use

Audio prompt?

Truncated audios and Latency in generation of speech

More language - French

What GPU are you using for the Gradio demo?

added Zero-GPU merge and switch in Zero-Spaces

Add new language

When I run the example code, it automatically downloads the file.Can I save the downloaded file in a specified directory?

OuteTTS 0.2 500M Demo