5 3 180

Gary Mitchell

echos-keeper

AI & ML interests

None yet

Recent Activity

liked a model about 13 hours ago

unsloth/Qwen3-Coder-Next-GGUF

liked a model about 13 hours ago

Qwen/Qwen3-Coder-Next

reacted to scthornton's post with 🔥 8 days ago

SecureCode: security-aware code models (3B–20B), trained for review + remediation I’ve been frustrated by how often code assistants recommend patterns that pass tests but fail security review (e.g., string-built SQL, brittle auth logic, unsafe parsing, insecure defaults, etc.). So I built **SecureCode**: a collection of **8 code models (3B → 20B)** trained to behave more like a security reviewer. What you should expect from SecureCode: - identify likely vuln patterns and explain *why* they’re risky - outline plausible abuse paths (defensive framing) - propose a secure rewrite (drop-in where possible) - include defense-in-depth guidance + regression tests/checks Links: - **Models:** https://huggingface.co/collections/scthornton/securecode - **Dataset:** https://huggingface.co/datasets/scthornton/securecode-v2 - **Paper:** https://arxiv.org/html/2512.18542v1 https://huggingface.co/papers/2512.18542 **How to test it (copy/paste prompt):** ``` > You are a senior application security engineer. Review the code below. > Output: (1) findings with severity, (2) likely exploit scenarios (high level), (3) secure rewrite, > (4) defense-in-depth recommendations, (5) regression tests/checks. > Code: `...` ``` **I’m looking for real-world feedback** - Your “this slipped through review once” snippets (sanitized is fine) - False positives / false negatives you observe - Contributions of new CVE-grounded examples If you drop a snippet, please include language/framework + what the *correct* remediation looks like in your environment. If you have any contributions or suggestions for the dataset, I'd be happy to hear them. I have some new features and enhancements planned for v3 that are already underway, but for now, I'm focused on testing as many use cases as possible. Appreciate you all!

View all activity

Organizations

None yet

liked 2 models about 13 hours ago

unsloth/Qwen3-Coder-Next-GGUF

Text Generation • 80B • Updated about 13 hours ago • 87

Qwen/Qwen3-Coder-Next

Text Generation • 80B • Updated about 14 hours ago • 233 • • 265

reacted to scthornton's post with 🔥 8 days ago

Post

2140

> You are a senior application security engineer. Review the code below.
>  Output: (1) findings with severity, (2) likely exploit scenarios (high level), (3) secure rewrite,
>  (4) defense-in-depth recommendations, (5) regression tests/checks.
>  Code: `...`

**I’m looking for real-world feedback**

- Your “this slipped through review once” snippets (sanitized is fine)
- False positives / false negatives you observe
- Contributions of new CVE-grounded examples

If you drop a snippet, please include language/framework + what the *correct* remediation looks like in your environment. If you have any contributions or suggestions for the dataset, I'd be happy to hear them. I have some new features and enhancements planned for v3 that are already underway, but for now, I'm focused on testing as many use cases as possible. Appreciate you all!

reacted to consome2's post with ❤️ 10 days ago

Post

5185

We’ve released two conversational speech datasets from oto on Hugging Face 🤗
Both are based on real, casual, full-duplex conversations, but with slightly different focuses.

Dataset 1: Processed / curated subset
otoearth/otoSpeech-full-duplex-processed-141h
* Full-duplex, spontaneous multi-speaker conversations
* Participants filtered for high audio quality
* PII removal and audio enhancement applied
* Designed for training and benchmarking S2S or dialogue models

Dataset 2: Larger raw(er) release
otoearth/otoSpeech-full-duplex-280h
* Same collection pipeline, with broader coverage
* More diversity in speakers, accents, and conversation styles
* Useful for analysis, filtering, or custom preprocessing experiments

We intentionally split the release to support different research workflows:
clean and ready-to-use vs. more exploratory and research-oriented use.

The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested.

If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together.

Feedback and ideas are very welcome!