For the LLM model the Alibaba’s Qwen3 Coder 30B A3B with different quantization are used to show the difference in token generation per second and…
Author: neoX
llama-bench the LLama 3.1 8B and AMD Radeon Instinct Mi50 32Gb
This article shows the GPU-only inference with a relatively old GPU from 2018 year – AMD Radeon Instinct Mi50 32Gb. For the LLM model Meta…
llama-bench the LLama 4 Scout 17B 16E and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Meta’s Llama 4 Scout 17B…
llama-bench the Phi-4 14B and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Microsoft’s Phi-4 14B with different…
llama-bench the Qwen3 Coder 30B A3B and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Alibaba’s Qwen3 Coder 30B A3B…
llama-bench the Qwen3 32B and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Alibaba’s Qwen3 32B with different…
llama-bench the Gemma 3 27B and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Google’s Gemma 3 27B Instruct…
llama-bench the Mistral Large 123B and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Mistral Large Instruct 123B 2411…
llama-bench the Qwen 2.5 Coder 32B and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Qwen 2.5 Coder 32B with…
llama-bench the Qwen2 32B (QwQ-32B) and AMD EPYC 9554 CPU
This article shows the CPU-only inference with a modern server processor – AMD Epyc 9554. For the LLM model the Qwen2 32B (QwQ-32B) with different…