John Smith's picture
In a Training Loop 🔄

John Smith PRO

John6666

AI & ML interests

None yet

Recent Activity

liked a model about 3 hours ago
LLMOX/FiscalOxLLM-base
liked a model about 3 hours ago
mradermacher/FiscalOxLLM-base-GGUF
reacted to aufklarer's post with 👍 about 7 hours ago
Speaker Diarization and VAD on Apple Silicon — MLX-Native Models Three MLX-optimized models for on-device speaker diarization and voice activity detection, running natively on Apple Silicon via https://github.com/ivan-digital/qwen3-asr-swift: - https://huggingface.co/aufklarer/Silero-VAD-v5-MLX — Streaming VAD, 309K params, ~1.2 MB. Processes 32ms chunks at 23× real-time on M2 Max. - https://huggingface.co/aufklarer/Pyannote-Segmentation-MLX — Multi-speaker segmentation, ~1.49M params, ~5.7 MB. 7-class powerset output for up to 3 simultaneous speakers. - https://huggingface.co/aufklarer/WeSpeaker-ResNet34-LM-MLX — Speaker embedding, ~6.6M params, ~25 MB. 256-dim L2-normalized vectors with BatchNorm fused into Conv2d. Together they form a diarization pipeline: pyannote segments → WeSpeaker embeds → agglomerative clustering links speakers across the recording. ~32 MB total. ```bash git clone https://github.com/ivan-digital/qwen3-asr-swift cd qwen3-asr-swift && swift build -c release .build/release/audio diarize meeting.wav --max-speakers 4 --json .build/release/audio vad-stream recording.wav ``` The library also includes ASR, TTS, multilingual synthesis, forced alignment, and speech-to-speech (PersonaPlex 7B). Apache 2.0. Full architecture details: https://blog.ivan.digital/speaker-diarization-and-voice-activity-detection-on-apple-silicon-native-swift-with-mlx Library: https://github.com/ivan-digital/qwen3-asr-swift
View all activity

Organizations

Glide's profile picture open/ acc's profile picture Solving Real World Problems's profile picture FashionStash Group meeting's profile picture No More Copyright's profile picture SAGEA's profile picture XORTRON - Criminal Computing's profile picture