Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
s-emanuilovΒ 
posted an update 5 days ago
Post
2503
Hey HF community! πŸ‘‹

Excited to share Monkt - a tool I built to solve the eternal headache of processing documents for ML/AI pipelines.

What it does: Converts PDFs, Word, PowerPoint, Excel, Web pages or raw HTML into clean Markdown or structured JSON.

Great for:
βœ” LLM training dataset preparation;
βœ” Knowledge base construction;
βœ” Research paper processing;
βœ” Technical documentation management.

It has API access for integration into ML pipelines.

Check it out at https://monkt.com/ if you want to save time on document processing infrastructure.

Looking forward to your feedback!

Not handeling the tables etc properly.

Β·

Yeah, the issues with the tables.

For office formats, it's mostly fine. You tried using PDF or images?

I will work on improving this.