In a Training Loop 🔄

John Smith PRO

John6666

John6666cat

AI & ML interests

None yet

Recent Activity

liked a model about 3 hours ago

LLMOX/FiscalOxLLM-base

liked a model about 3 hours ago

mradermacher/FiscalOxLLM-base-GGUF

reacted to aufklarer's post with 👍 about 7 hours ago

Speaker Diarization and VAD on Apple Silicon — MLX-Native Models Three MLX-optimized models for on-device speaker diarization and voice activity detection, running natively on Apple Silicon via https://github.com/ivan-digital/qwen3-asr-swift: - https://huggingface.co/aufklarer/Silero-VAD-v5-MLX — Streaming VAD, 309K params, ~1.2 MB. Processes 32ms chunks at 23× real-time on M2 Max. - https://huggingface.co/aufklarer/Pyannote-Segmentation-MLX — Multi-speaker segmentation, ~1.49M params, ~5.7 MB. 7-class powerset output for up to 3 simultaneous speakers. - https://huggingface.co/aufklarer/WeSpeaker-ResNet34-LM-MLX — Speaker embedding, ~6.6M params, ~25 MB. 256-dim L2-normalized vectors with BatchNorm fused into Conv2d. Together they form a diarization pipeline: pyannote segments → WeSpeaker embeds → agglomerative clustering links speakers across the recording. ~32 MB total. ```bash git clone https://github.com/ivan-digital/qwen3-asr-swift cd qwen3-asr-swift && swift build -c release .build/release/audio diarize meeting.wav --max-speakers 4 --json .build/release/audio vad-stream recording.wav ``` The library also includes ASR, TTS, multilingual synthesis, forced alignment, and speech-to-speech (PersonaPlex 7B). Apache 2.0. Full architecture details: https://blog.ivan.digital/speaker-diarization-and-voice-activity-detection-on-apple-silicon-native-swift-with-mlx Library: https://github.com/ivan-digital/qwen3-asr-swift

View all activity

Organizations

reacted to aufklarer's post with 👍 about 7 hours ago

Post

183

git clone https://github.com/ivan-digital/qwen3-asr-swift
cd qwen3-asr-swift && swift build -c release

.build/release/audio diarize meeting.wav --max-speakers 4 --json
.build/release/audio vad-stream recording.wav

The library also includes ASR, TTS, multilingual synthesis, forced alignment, and speech-to-speech (PersonaPlex 7B). Apache 2.0.

Full architecture details: https://blog.ivan.digital/speaker-diarization-and-voice-activity-detection-on-apple-silicon-native-swift-with-mlx

Library: https://github.com/ivan-digital/qwen3-asr-swift

reacted to ajibawa-2023's post with 🔥 about 7 hours ago

Post

266

Python-Code-Large
Dataset: ajibawa-2023/Python-Code-Large

Python-Code-Large is a large-scale corpus of Python source code comprising more than 2 million rows of Python code. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, software engineering automation, and program analysis for the Python ecosystem.

By providing a high-volume, language-specific corpus, Python-Code-Large enables systematic experimentation in Python-focused model training, domain adaptation, and downstream code understanding tasks.

Python-Code-Large addresses the need for a dedicated Python-only dataset at substantial scale, enabling focused research across data science, backend systems, automation, scientific computing, and AI-driven Python environments.

reacted to BibbyResearch's post with 👀 about 7 hours ago

Post

677

Announcement :-

BibbyResearch/China-Egocentric-Dataset-Robotics

Bibby AI - AI Latex Editor for Research writing has launched the above Chinese Egocentric Dataset for Robotics Research!

1 reply

reacted to imnotkitty's post with 🚀 about 7 hours ago

Post

340

In the Text-to-Video arena, Seedance 2.0 has first secured a spot in the LMArena Top 10, while Kling 3.0 has topped the Artificial Analysis leaderboard, with the Kling family claiming 7 spots in the top 15.

Which one do you prefer?

reacted to projectlosangeles's post with ❤️🔥 about 7 hours ago

Post

1091

🔥Check out brand new and greatly improved Orpheus model and space! 🔥

asigalov61/Orpheus-Music-Transformer

Please ❤️the space and the model repo if you enjoyed it! It really helps!

Thank you 🙏

Alex

Project Los Angeles
Tegridy Code 2026

victor-mir
@not-lain
@victor
@John6666
@Csplk
@alexkuz
@mimbres
@Timzoid

1 reply

reacted to SeaWolf-AI's post with 👍 about 7 hours ago

Post

1512

AI Is Training on Your Content Without Permission — Fight Back with Invisible Watermarks

FINAL-Bench/security-scan

Most generative AI training data is crawled without consent. Your text gets summarized, images reprocessed, videos clipped — with no way to prove you're the original creator. Existing watermarks are either visible or wiped out by a single AI preprocessing pass.

Detect Before, Track After

Pre-embed — Detect theft without any watermark. Text plagiarism check, image similarity analysis (perceptual hash, SSIM, color histogram, feature matching), and video temporal matching catch copies, edits, and excerpts.

Post-embed — Embed invisible multi-layer watermarks. If one layer is destroyed, others survive independently. Even full removal leaves forensic traces as evidence.

Text: 4 Independent Layers

Four mechanisms work simultaneously: zero-width Unicode characters at morpheme/word boundaries (Korean Kiwi + English NLP), style fingerprinting via synonym-ending-connective substitution, SHA-256 timestamped evidence packages, and punctuation-anchored micro-marks. Each layer uses a different Unicode category, so attacks on one cannot eliminate the others. Full bilingual support, zero readability impact.

34-Attack Defense

7 categories, 34 attacks simulated: Unicode normalization, invisible character removal, homoglyph substitution (9,619 confusables), and AI rewriting. Each scored on Signal (watermark survival) + Trace (forensic evidence of attack) — proving deliberate removal even when watermarks are destroyed.

Image & Video

Images: DCT frequency-domain watermarks surviving JPEG compression and resize. Videos: keyframe watermarking with temporal propagation and majority-vote extraction. Both support pre-embed similarity detection.

Who Is This For

Creators, rights holders needing legal evidence, media companies, and organizations tracking document leaks. Korean/English bilingual, open source, Gradio-based.

1 reply

reacted to YatharthS's post with 🔥 about 7 hours ago

Post

1372

Just open sourced LavaSR v2: a model that can enhance 5000 seconds of audio in 1 second while being higher quality than giant and slow 6gb diffusion models!

It works with any sampling rate from 8-48khz and is nearly 5000x faster than competition while being superior in objective benchmarks.

LavaSR v2 is Perfect for
- Enhancing TTS models.
- Fixing old audio datasets.
- Restoring low quality recordings.

You can check out the examples and run it locally or online:

Repo: https://github.com/ysharma3501/LavaSR.git
Demo: YatharthS/LavaSR
Model: YatharthS/LavaSR

reacted to scthornton's post with 👀 about 7 hours ago

Post

1221

# SecureCode Dataset Family Update: 2,185 Security Examples, Framework-Specific Patterns, Clean Parquet Loading

Hey y'all,

Quick update on the SecureCode dataset family. We've restructured things and fixed several issues:

**What changed:**

- The datasets are now properly split into three repos: [unified]( scthornton/securecode) (2,185), [web]( scthornton/securecode-web) (1,378), [AI/ML]( scthornton/securecode-aiml) (750)
- All repos now use Parquet format -- load_dataset() just works, no deprecated loading scripts
- SecureCode Web now includes 219 framework-specific examples (Express, Django, Spring Boot, Flask, Rails, Laravel, ASP.NET Core, FastAPI, NestJS)
- Data cards have been corrected and split sizes fixed

**Why it matters:**

With AI-generated code accounting for 60%+ of some codebases (Checkmarx 2025), security training data is more important than ever. Every example in SecureCode is grounded in a real CVE with 4-turn conversations that mirror actual developer-AI workflows.

If you're working on code generation models, I'd love to hear how you're approaching the security angle. Are there vulnerability categories or frameworks you'd like to see covered?

Paper: [arxiv.org/abs/2512.18542](https://arxiv.org/abs/2512.18542)

reacted to nyuuzyou's post with 👍 1 day ago

Post

1364

🌍 Street-Level Imagery Dataset nyuuzyou/streetview

934,191 image records index Eastern Europe and Northern Asia. Temporal links map historical views at identical coordinates across nine years.

Key Stats:

- 905,940 unique images
- Coverage spanning 2016 to 2025
- Average 14.3 historical links per location

Geographic bounds span 20.49° E to 152.32° E. Urban centers show higher data density.

reacted to sergiopaniego's post with 🚀 1 day ago

Post

1637

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

reacted to AbstractPhil's post with 👀 1 day ago

Post

1233

GLIP - Geometric Linear Interpolative Patchwork aka geolip.
https://github.com/AbstractEyes/glip-autoencoder
This is the repo that will contain the next experimental stage, which is based entirely on the research and structural boundaries applied by said research. It'll be a little rigid while I get Claude set up.

In order to directly train these layered topological response patchworks you must install and use the geovocab2, geofractal, and wide_compiler repos.

This is due to the wide_compiler's wide_linear high-speed efficiency for ensemble processing, the geovocab2 factory structure with multiple formulas including highly efficient designs meant for kernel compilation, and a series of reusable utilities in geofractal including some of the more complex losses and difficult to optimally tune gate structures surrounding them.

Many of the underlying formulas are outlined here;
AbstractPhil/geometric-experiment-history

Utilization and training USING the pretrained or untrained geolip patchwork will be as simple as loading the model in pytorch and will not require external dependencies of the geolip package, numpy, or pytorch depending on the task. It will come packaged with recommended losses but I encourage experimentation because I simply cannot cover all spectrums.

Experiments show you can train the patchwork directly with task losses and it retains some useful cohesion, but it will lose all identity without the correct losses making it difficult to task-orient the geometric behavior down the chain.

More details to come as development progresses. The system is coming together and the state of the utilizable autoencoder will be ready within a couple weeks. The entire system is built for convenience and reusability, so the structure will be built similarly to autoencoder systems that currently exist, with a few tweaks here and there for important elements - so the interface will be familiar to those who use it.

1 reply

reacted to OzTianlu's post with 🔥 1 day ago

Post

1344

Scaling UP in Kai! 🌊
NoesisLab/Kai-3B-Instruct

Introducing NoesisLab/Kai-3B-Instruct What happens when you force a 3B model to reason entirely in its latent space ?
Meet Kai-3B, our latest industrial-grade reasoning model fine-tuned using the Adaptive Dual Search (ADS) algorithm.
GSM8K (0-shot, Direct Answer): 39.27% 🤯 (Llama-2-7B is ~14.6%)
HumanEval (Pass@1): 39.02% 💻 (Overtakes Gemma-2-2B's 30%)
MMLU (5-shot): 53.62% 📚 (Crushing the 50% barrier)
ARC-Challenge: 51.88%🎯
PIQA: 77.53%
HellaSwag: 69.53%
Kai-3B proves that reasoning density doesn't strictly require parameter bloat or verbose generation. It acts as a perfect, cold-blooded Agent action-engine—ideal for JSON routing, SWE-bench patch generation, and anywhere you need absolute structured certainty without token waste.

2 replies

reacted to albertvillanova's post with 🤗 1 day ago

Post

1298

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0

reacted to GVA21q2's post with 👍 2 days ago

Post

1269

# π.Guy.AI — AI-Powered Neuropedagogy Math Lessons

Students with math anxiety, ADHD, dyslexia, or low working memory need different learning experiences — but teachers can't create individualized materials for every student.

**π.Guy.AI** generates interactive HTML math lessons adapted to 7 cognitive profiles, using a multi-agent AI pipeline:

1. **Neuro-Interpreter** — enriches prompts with profile-specific adaptations
2. **Creative Agent** — generates a 12-slide lesson with SVG visualizations
3. **Quality Control** — validates against 8 neuropedagogy principles

Each lesson is a standalone HTML file with inline CSS/JS/SVG — works offline, no dependencies.

## The Model

Fine-tuned **Qwen2.5-7B-Instruct** with LoRA on 313 curated Hebrew math lessons.

- Model: [GVA21q2/piguyai-lessons-v2-enhanced](https://huggingface.co/GVA21q2/piguyai-lessons-v2-enhanced)
- Dataset: [GVA21q2/pi-guy-ai-lessons](https://huggingface.co/datasets/GVA21q2/pi-guy-ai-lessons)
- Demo: [GVA21q2/pi-guy-ai-demo]( GVA21q2/pi-guy-ai-demo)
- Web app: [gva21q2.github.io/pi.guy.ai](https://gva21q2.github.io/pi.guy.ai/)

7 profiles: math anxiety, ADHD, dyslexia, dysgraphia, low working memory, visual processing, weak inhibition.

Built by [Guy Assal](https://www.guyassal.education)

reacted to branikita's post with 🚀 2 days ago

Post

1301

Our engineer Alan Subin from Robonine has started preparations for testing the manipulator on the mobile two-wheeled platform.

2 replies

reacted to abusyed's post with 👀 2 days ago

Post

112

I use multiple AI coding agents daily, Claude Code, Cursor, Codex (one of them's good at design, one's good at problem solving, one's good to just have an overall plan)... and I kept running into two problems that were driving me insane:

Context loss on every switch. Every time I moved from Cursor to Claude Code (or vice versa), I'd have to reexplain the entire project philosophy, past decisions, why I chose X architecture over Y. Half my prompts became "here's what the last agent did and why."

Agent drift — technically correct but philosophically wrong code. This is the sneaky one. I build AI tutors that force students to reason through problems instead of getting answers handed to them. One agent literally added a "Skip Reasoning" button to the UI. Technically valid code. Completely violates the entire product philosophy. And the agent had no way of knowing that because it couldn't see the design intent.

So I built LedgerSync - a file-based shared context protocol that solves both problems.

How it works:

An append-only ledger (.ledgersync/ledger.jsonl) logs every agent decision with full reasoning traces - not just what happened, but WHY

Agents read grounding documents (product philosophy, design constraints, user research) before making decisions

When you switch tools, the new agent reads the ledger and picks up where the last one left off - with full context

Auto-generates agent-specific instruction files (CLAUDE.md, .cursorrules, etc.)

No server, no accounts, no setup. Just files that live in your repo. Your agents already know how to read files - LedgerSync just gives them the right ones.

The key insight: the problem isn't that agents are bad at coding. It's that they have no memory and no product awareness. LedgerSync gives them both.

MIT licensed, early stage: https://github.com/Metacog-AI/ledgersync

Has anyone else dealt with the agent drift problem?

2 replies

reacted to tpwang199655's post with 👀 2 days ago

Post

118

Hi , I registered the huggingface pro, paid monthly fee. But still can't post on Blog. Am I missing necessary procedures? Any help will be highly appreciated! Thx in advance.

2 replies

replied to tpwang199655's post 2 days ago

maybe you need to join https://huggingface.co/blog-explorers

reacted to JonnaMat's post with 🔥 2 days ago

Post

1556

🤯 Edge-Grade Vision Reasoning. Now Practically Lossless. 🤯

Introducing
👉 embedl/Cosmos-Reason2-2B-W4A16-Edge2
Optimized for Jetson Orin Nano Super and AGX Orin

nvidia .

🚄 Try it out on Jetson (image+video+text):

docker run --rm -it \
  --network host \
  --shm-size=8g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm-serve \
  -e HF_TOKEN=hf_*** \
  -e HF_HOME=/root/.cache/huggingface \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve "embedl/Cosmos-Reason2-2B-W4A16-Edge2" \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.75 \
    --max-num-seqs 2

🤓 What is Edge2? Most weights → INT4 | Activations → FP16 | Select sensitive layers → kept in FP16.
Edge2 preserves precision where it matters most; while keeping the model small and fast enough for edge GPUs. 😎

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity