Llama.cpp MTP on Negi AI Lab

Llama.cpp MTP on Negi AI Labhttps://ai.negi-lab.com/tags/llama.cpp-mtp/Recent content in Llama.cpp MTP on Negi AI LabNegi AI Labhttps://ai.negi-lab.com/images/og-default.pnghttps://ai.negi-lab.com/images/og-default.pngHugo -- 0.154.5jaSun, 10 May 2026 15:18:33 +0900Qwen 35B A3Bを12GB VRAMで高速化！llama.cpp MTP 使い方https://ai.negi-lab.com/posts/llamacpp-mtp-qwen-35b-high-speed-tutorial/Sun, 10 May 2026 00:00:00 +0900https://ai.negi-lab.com/posts/llamacpp-mtp-qwen-35b-high-speed-tutorial/<p><strong>所要時間:</strong> 約40分 | <strong>難易度:</strong> ★★★★☆</p> <h2 id="この記事で作るもの">この記事で作るもの</h2> <ul> <li>12GB VRAMのミドルクラスGPUで、Qwen3.6 35B A3B（MoEモデル）を毎秒80トークン以上の爆速で動作させる環境</li> <li>128Kの長大なコンテキストを維持しつつ、推論速度を犠牲にしないllama.cppのMTP設定</li> <li>Pythonからこの高速推論環境を呼び出し、実際の業務で活用するための推論スクリプト</li> </ul> <div style="border:1px solid #e0e0e0;border-radius:8px;padding:16px;margin:20px 0;background:#fafafa"> <p style="margin:0 0 4px;font-size:13px;color:#888">📦 この記事に関連する商品（楽天メインで価格確認）</p>Qwen 3.6 27Bをllama.cppで高速化して50 t/sを叩き出す方法https://ai.negi-lab.com/posts/qwen-3-6-27b-mtp-llamacpp-speedup-guide/Thu, 07 May 2026 00:00:00 +0900https://ai.negi-lab.com/posts/qwen-3-6-27b-mtp-llamacpp-speedup-guide/<p><strong>所要時間:</strong> 約40分 | <strong>難易度:</strong> ★★★★☆</p> <h2 id="この記事で作るもの">この記事で作るもの</h2> <ul> <li>Qwen 3.6 27B（MTP対応モデル）を、llama.cppの特定のパッチを適用して爆速で動かす環境</li> <li>100kコンテキストという広大なメモリ領域を使いつつ、秒間50トークン以上の推論速度を実現するセットアップ</li> <li>Pythonや複雑なライブラリに依存せず、軽量なC++バイナリでモデルを運用する基盤</li> </ul> <div style="border:1px solid #e0e0e0;border-radius:8px;padding:16px;margin:20px 0;background:#fafafa"> <p style="margin:0 0 4px;font-size:13px;color:#888">📦 この記事に関連する商品（楽天メインで価格確認）</p>