LLM inference benchmarks with llamacpp and AMD EPYC 7282 cpu

LLMs or large language models are really popular these days and many people and organization begin to rely on them. This article continues in the spirit of the CPU only benchmarks in the realm of LLM inteference. Check out the other article on the subject with a much better and expensive processor – LLM inference benchmarks with llamacpp and AMD EPYC 9554 CPU Several times cheaper is the setup presented in this article.
“Run it yourself” in your home or within business organization is always a way more secure and privacy safe than the cloud based AI chat bots/assistants/help. There are many open source LLMs, which will hold their owns on accuracy (intelligence?) and performance against the big ones as OpenAI, Google Gemini and more.
The purpose of the article is to show the performance of the 2th generation AMD processor AMD EPYC 7282 (Rome) with 16 cores in a dual socket board using 2 x 8 memory channels of DDR4 3200 MHz. The main testing software is llama.cpp with llama-bench.

Benchmark Results

Here are the benchmark results, which are summarized from the tests below.

N	model	parameters	quantization	tokens per second
1	DeepSeek R1 Llama	8B	Q4_K_M	2.84
2	DeepSeek R1 Llama	70B	Q4_K_M	22.17
4	Qwen – QwQ-32B	32B	Q4_K_M	5.81
5	Llama 3.1 8B Instruct	8B	Q4_K_M	22.24
6	Llama 3.3 70B Instruct	70B	Q4_K_M	2.832

Below the llama-bench output from the benchmarks, there are pure memory tests for this setup with mlc tool (Intel® Memory Latency Checker v3.11b) and sysbench memory.

Hardware – what to expect from the AMD EPYC 7282

2 x AMD EPYC 7282 – AMD 16 cores / 32 threads CPU – a 8 memory channel processor with theoretical memory bandwidth 80 GB/s (according to the official documents from AMD)
128G RAM total RAM, all memory channels are utilized on both processors, i.e. 2 x 8 channels.
Supermicro – H11DSU-iN dual socket board with 32 memory slots.
24 slots with 16 x 8G DDR4 Samsung 3200Mhz (M393A1K43DB2-CWE)
AMD K17 (Zen2) architecture
CPU dies: 1 per CPU
With similar dual socket motherboard (with 128G RAM) the price in eBay Quarter 1 2025 – $1200 ~ $1500 USD.

Software

All tests are made under Linux – Gentoo Linux.

Gentoo Linux, everything built with “-native”
Linux kernel – 6.13.3 (gentoo-kernel package)
GNU GCC – gcc version 14.2.1 20241221
Glibc – 2.41

Testing

Three main tests are going to be presented here using the llama.cpp for LLM inference.

Deepseek R1 – Distill Llama 70B and Distill Llama 8B – Q4
Qwen – QwQ-32B – Q4
meta-llama – Llama 3.3 70B Instruct and Llama 3.1 8B Instruct – Q4

Testing benchmark with Deepseek R1 Distill Llama-70B

1. Deepseek R1 Distill Llama 70B

First test uses the quantization 4 (Q4_K_M) with 70B Deepseek R1 Distill Llama and the files are downloaded from https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-70B-GGUF and it is used in ollama by default.

srv ~/llama.cpp/build/bin $ ./llama-bench --numa distribute -m /root/models/bartowski/DeepSeek-R1-Distill-Llama-70B-Q4_K_M.gguf -t 32 -p 0 -n 128,256,512,1024,2048
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |         tg128 |          2.88 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |         tg256 |          2.87 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |         tg512 |          2.86 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |        tg1024 |          2.82 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |        tg2048 |          2.78 ± 0.00 |

build: 51f311e0 (4753)

The generated speed is 2.6~2.7 tokens per second, which is not pretty fast for processor only set up. It is really questionable whether it is usable.

2. Deepseek R1 Distill Llama 8B

When using the (Q4_K_M) with 8B Deepseek R1 Distill Llama the speed is 7 times more quickly than the 80B. So a 8.75 times less model parameters 7.5 times more quickly generated tokens. 22 tokens per second is fast for generating a text for daily routines. The file was downloaded from here – huggingface.co.

srv ~/llama.cpp/build/bin $ ./llama-bench --numa distribute -m /root/models/bartowski/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf -t 32 -p 0 -n 128,256,512,1024,2048
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |         tg128 |         22.65 ± 0.03 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |         tg256 |         22.58 ± 0.01 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |         tg512 |         22.58 ± 0.02 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |        tg1024 |         22.06 ± 0.00 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |        tg2048 |         21.00 ± 0.03 |

build: 51f311e0 (4753)

3. The Qwen model QwQ-32B developed by Alibaba Cloud

The Qwen modes’ GUFFs could be downloaded from https://huggingface.co/Qwen.

srv ~/llama.cpp/build/bin $./llama-bench --numa distribute -m /root/models/Qwen/qwq-32b-q4_k_m.gguf -t 32 -p 0 -n 128,256,512,1024,2048
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| qwen2 32B Q4_K - Medium        |  18.48 GiB |    32.76 B | BLAS,RPC   |      32 |         tg128 |          5.85 ± 0.01 |
| qwen2 32B Q4_K - Medium        |  18.48 GiB |    32.76 B | BLAS,RPC   |      32 |         tg256 |          5.82 ± 0.00 |
| qwen2 32B Q4_K - Medium        |  18.48 GiB |    32.76 B | BLAS,RPC   |      32 |         tg512 |          5.85 ± 0.00 |
| qwen2 32B Q4_K - Medium        |  18.48 GiB |    32.76 B | BLAS,RPC   |      32 |        tg1024 |          5.83 ± 0.00 |
| qwen2 32B Q4_K - Medium        |  18.48 GiB |    32.76 B | BLAS,RPC   |      32 |        tg2048 |          5.70 ± 0.00 |

build: 51f311e0 (4753)

5-6 tokens per second for the new QwQ-32B model. So probably the model is usable on this processor for daily routines.

4. Meta Llama 3.3 70B Instruct

The Meta open source model Llama is widely used and here are the benchmark with Llama 3.3 70B Instruct Q4_K_M

srv ~/llama.cpp/build/bin $./llama-bench --numa distribute -m /root/models/MaziyarPanahi/Llama-3.3-70B-Instruct.Q4_K_M.gguf -t 32 -p 0 -n 128,256,512,1024,2048
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |         tg128 |          2.87 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |         tg256 |          2.86 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |         tg512 |          2.85 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |        tg1024 |          2.81 ± 0.00 |
| llama 70B Q4_K - Medium        |  39.59 GiB |    70.55 B | BLAS,RPC   |      32 |        tg2048 |          2.77 ± 0.00 |

build: 51f311e0 (4753)

5. Meta Llama 3.1 8B Instruct

A smaller model of the Meta Llama family. The results are around 22 tokens per second, which is fast enough for code generation. The GGUF file was downloaded from here – huggingface.co.

srv ~/llama.cpp/build/bin $ ./llama-bench --numa distribute -m /root/models/bartowski/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf -t 32 -p 0 -n 128,256,512,1024,2048
| model                          |       size |     params | backend    | threads |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ------: | ------------: | -------------------: |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |         tg128 |         22.72 ± 0.06 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |         tg256 |         22.67 ± 0.01 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |         tg512 |         22.61 ± 0.01 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |        tg1024 |         22.11 ± 0.01 |
| llama 8B Q4_K - Medium         |   4.58 GiB |     8.03 B | BLAS,RPC   |      32 |        tg2048 |         21.10 ± 0.01 |

build: 51f311e0 (4753)

Memory bandwidth tests

According to the AMD documentations the theoretical memory bandwidth is 80 GB/s.
First tests are with the Intel® Memory Latency Checker v3.11b and the results are that the mlc command-line manages to read almost 66GB/s and for the total of the two CPUs – 130GB/s.

srv ~/mlc_v3/Linux $ ./mlc --memory_bandwidth_scan
Intel(R) Memory Latency Checker - v3.11b
Command line parameters: --memory_bandwidth_scan 
Running memory bandwidth scan using 32 threads on numa node 0 accessing memory on numa node 0
Reserved 33 1GB pages
Now allocating 33 1GB pages. This may take several minutes..
1GB page allocation completed
Allocating remaining 30809672 KB memory in 4KB pages
Totally 61 GB memory allocated in 4KB+1GB pages on NUMA node 0
Measuring memory bandwidth for each of those 1GB memory regions..

Histogram report of BW in MB/sec across each 1GB region on NUMA node 0
BW_range(MB/sec)        #_of_1GB_regions
----------------        ----------------
[65000-69999]   61

Detailed BW report for each 1GB region allocated as contiguous 1GB page on NUMA node 0
phys_addr       GBaligned_page# MB/sec
---------       --------------- ------
0x140000000     5       66033
0x180000000     6       66363
0x1c0000000     7       66177
0x200000000     8       65846
0x240000000     9       65674
0x280000000     10      66303
0x2c0000000     11      65940
0x380000000     14      65692
0x4c0000000     19      65691
0x540000000     21      65739
0x6c0000000     27      65741
0x840000000     33      66066
0x880000000     34      66001
0x940000000     37      65762
0xb00000000     44      65871
0xb40000000     45      67632
0xb80000000     46      66186
0xbc0000000     47      65758
0xc00000000     48      65780
0xc40000000     49      65905
0xcc0000000     51      65830
0xd00000000     52      65779
0xd40000000     53      65721
0xd80000000     54      66023
0xdc0000000     55      65996
0xe00000000     56      65558
0xe40000000     57      65855
0xe80000000     58      66048
0xec0000000     59      66050
0xf00000000     60      65896
0xf40000000     61      66353
0xf80000000     62      66171
0xfc0000000     63      65889

Detailed BW report for each 1GB region allocated as 4KB page on NUMA node 0
phys_addr       MB/sec
---------       ------
0x6bdbd4000     65942
0x697ef7000     65964
0x622301000     65943
0x77170a000     65869
0x7ccf13000     65910
0x83e31d000     65958
0x90f326000     65936
0x9a032f000     66037
0xa15b39000     65814
0xa77742000     66043
0x4bdf4c000     65898
0x354355000     66206
0x3e0f5e000     65982
0x5bfb68000     65933
0x727f71000     65915
0x5db77b000     66004
0xc88784000     65942
0x100c38d000    66241
0x404b97000     65653
0x4817a0000     65965
0x51ffaa000     65673
0x5babb3000     65830
0x7013bc000     65843
0x81cbc6000     65786
0x9d0fcf000     65974
0xa413d8000     66212
0xae27e2000     65908
0x18c29000      66060

And just for the record, mlc executed without arguments.

srv ~/mlc_v3/Linux $ ./mlc 
Intel(R) Memory Latency Checker - v3.11b
Measuring idle latencies for random access (in ns)...
                Numa node
Numa node            0       1  
       0         119.9   261.7  
       1         261.7   119.7  

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      131319.2
3:1 Reads-Writes :      132123.5
2:1 Reads-Writes :      134047.2
1:1 Reads-Writes :      143820.7
Stream-triad like:      143737.8

Measuring Memory Bandwidths between nodes within system 
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
                Numa node
Numa node            0       1  
       0        65452.7 30597.0 
       1        30466.6 65363.4 

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject  Latency Bandwidth
Delay   (ns)    MB/sec
==========================
 00000  550.75   131339.0
 00002  553.37   131099.9
 00008  550.86   131628.0
 00015  550.84   131525.3
 00050  547.06   131630.8
 00100  517.39   132043.8
 00200  156.93   111832.4
 00300  145.16    78346.4
 00400  140.68    60374.9
 00500  138.92    49102.7
 00700  131.48    35717.7
 01000  129.31    25379.9
 01300  125.52    19746.9
 01700  125.54    15262.5
 02500  126.38    10571.3
 03500  126.99     7708.2
 05000  127.94     5549.0
 09000  128.91     3307.1
 20000  130.10     1758.0

Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT  latency        23.1
Local Socket L2->L2 HITM latency        23.3
Remote Socket L2->L2 HITM latency (data address homed in writer socket)
                        Reader Numa Node
Writer Numa Node     0       1  
            0        -   278.7  
            1    277.6       -  
Remote Socket L2->L2 HITM latency (data address homed in reader socket)
                        Reader Numa Node
Writer Numa Node     0       1  
            0        -   283.1  
            1    284.9       -

Second tests with sysbench and with the 64 threads it reaches 162450.34 MiB/sec – full potential of the dual processors system. The first test is with the 64 threads and the second test is with half of them, i.e. the CPUs’ cores 32. Because there are two NUMA nodes and the second CPU is not utilized fully, the second test is just 90666.83 MiB/sec, which suggests the sysbench did not use the full potential because of NUMA awareness.

srv ~ $ sysbench memory --memory-oper=read --memory-block-size=1K --memory-total-size=2000G --threads=64 run
sysbench 1.0.20 (using system LuaJIT 2.1.1731601260)

Running the test with following options:
Number of threads: 64
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1KiB
  total size: 2048000MiB
  operation: read
  scope: global

Initializing worker threads...

Threads started!

Total operations: 1663878927 (166349143.55 per second)

1624881.76 MiB transferred (162450.34 MiB/sec)


General statistics:
    total time:                          10.0011s
    total number of events:              1663878927

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                   10.01
         95th percentile:                        0.00
         sum:                                96569.21

Threads fairness:
    events (avg/stddev):           25998108.2344/710279.75
    execution time (avg/stddev):   1.5089/0.06

srv ~ $ sysbench memory --memory-oper=read --memory-block-size=1K --memory-total-size=2000G --threads=32 run
sysbench 1.0.20 (using system LuaJIT 2.1.1731601260)

Running the test with following options:
Number of threads: 32
Initializing random number generator from current time


Running memory speed test with the following options:
  block size: 1KiB
  total size: 2048000MiB
  operation: read
  scope: global

Initializing worker threads...

Threads started!

Total operations: 928597066 (92842833.78 per second)

906833.07 MiB transferred (90666.83 MiB/sec)


General statistics:
    total time:                          10.0006s
    total number of events:              928597066

Latency (ms):
         min:                                    0.00
         avg:                                    0.00
         max:                                    0.70
         95th percentile:                        0.00
         sum:                                42355.25

Threads fairness:
    events (avg/stddev):           29018658.3125/142227.06
    execution time (avg/stddev):   1.3236/0.01


<h2>Notes</h2>
<h4>System cache<h4>
Before all tests a drop of the system cache was executed:
[code lang="bash" highligh="1,2"]
echo 0 > /proc/sys/kernel/numa_balancing 
echo 3 > /proc/sys/vm/drop_caches

Whether dropping the cache or not may have a significant impact on the test results.

NUMA and hardware topology

The NUMA configuration is 2 nodes:

srv ~ $ numactl -s
policy: default
preferred node: current
physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 
cpubind: 0 1 
nodebind: 0 1 
membind: 0 1 
preferred:

The hardware CPU, cores, threads, cache and NUMA topology with likwid-topology

srv ~ $/usr/local/likwid/bin/likwid-topology 
--------------------------------------------------------------------------------
CPU name:       AMD EPYC 7282 16-Core Processor                
CPU type:       AMD K17 (Zen2) architecture
CPU stepping:   0
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:                2
CPU dies:               2
Cores per socket:       16
Threads per core:       2
--------------------------------------------------------------------------------
HWThread        Thread        Core        Die        Socket        Available
0               0             0           0          0             *                
1               0             1           0          0             *                
2               0             2           0          0             *                
3               0             3           0          0             *                
4               0             4           0          0             *                
5               0             5           0          0             *                
6               0             6           0          0             *                
7               0             7           0          0             *                
8               0             8           0          0             *                
9               0             9           0          0             *                
10              0             10          0          0             *                
11              0             11          0          0             *                
12              0             12          0          0             *                
13              0             13          0          0             *                
14              0             14          0          0             *                
15              0             15          0          0             *                
16              0             16          0          1             *                
17              0             17          0          1             *                
18              0             18          0          1             *                
19              0             19          0          1             *                
20              0             20          0          1             *                
21              0             21          0          1             *                
22              0             22          0          1             *                
23              0             23          0          1             *                
24              0             24          0          1             *                
25              0             25          0          1             *                
26              0             26          0          1             *                
27              0             27          0          1             *                
28              0             28          0          1             *                
29              0             29          0          1             *                
30              0             30          0          1             *                
31              0             31          0          1             *                
32              1             0           0          0             *                
33              1             1           0          0             *                
34              1             2           0          0             *                
35              1             3           0          0             *                
36              1             4           0          0             *                
37              1             5           0          0             *                
38              1             6           0          0             *                
39              1             7           0          0             *                
40              1             8           0          0             *                
41              1             9           0          0             *                
42              1             10          0          0             *                
43              1             11          0          0             *                
44              1             12          0          0             *                
45              1             13          0          0             *                
46              1             14          0          0             *                
47              1             15          0          0             *                
48              1             16          0          1             *                
49              1             17          0          1             *                
50              1             18          0          1             *                
51              1             19          0          1             *                
52              1             20          0          1             *                
53              1             21          0          1             *                
54              1             22          0          1             *                
55              1             23          0          1             *                
56              1             24          0          1             *                
57              1             25          0          1             *                
58              1             26          0          1             *                
59              1             27          0          1             *                
60              1             28          0          1             *                
61              1             29          0          1             *                
62              1             30          0          1             *                
63              1             31          0          1             *                
--------------------------------------------------------------------------------
Socket 0:               ( 0 32 1 33 2 34 3 35 4 36 5 37 6 38 7 39 8 40 9 41 10 42 11 43 12 44 13 45 14 46 15 47 )
Socket 1:               ( 16 48 17 49 18 50 19 51 20 52 21 53 22 54 23 55 24 56 25 57 26 58 27 59 28 60 29 61 30 62 31 63 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:                  1
Size:                   32 kB
Cache groups:           ( 0 32 ) ( 1 33 ) ( 2 34 ) ( 3 35 ) ( 4 36 ) ( 5 37 ) ( 6 38 ) ( 7 39 ) ( 8 40 ) ( 9 41 ) ( 10 42 ) ( 11 43 ) ( 12 44 ) ( 13 45 ) ( 14 46 ) ( 15 47 ) ( 16 48 ) ( 17 49 ) ( 18 50 ) ( 19 51 ) ( 20 52 ) ( 21 53 ) ( 22 54 ) ( 23 55 ) ( 24 56 ) ( 25 57 ) ( 26 58 ) ( 27 59 ) ( 28 60 ) ( 29 61 ) ( 30 62 ) ( 31 63 )
--------------------------------------------------------------------------------
Level:                  2
Size:                   512 kB
Cache groups:           ( 0 32 ) ( 1 33 ) ( 2 34 ) ( 3 35 ) ( 4 36 ) ( 5 37 ) ( 6 38 ) ( 7 39 ) ( 8 40 ) ( 9 41 ) ( 10 42 ) ( 11 43 ) ( 12 44 ) ( 13 45 ) ( 14 46 ) ( 15 47 ) ( 16 48 ) ( 17 49 ) ( 18 50 ) ( 19 51 ) ( 20 52 ) ( 21 53 ) ( 22 54 ) ( 23 55 ) ( 24 56 ) ( 25 57 ) ( 26 58 ) ( 27 59 ) ( 28 60 ) ( 29 61 ) ( 30 62 ) ( 31 63 )
--------------------------------------------------------------------------------
Level:                  3
Size:                   16 MB
Cache groups:           ( 0 32 1 33 2 34 3 35 ) ( 4 36 5 37 6 38 7 39 ) ( 8 40 9 41 10 42 11 43 ) ( 12 44 13 45 14 46 15 47 ) ( 16 48 17 49 18 50 19 51 ) ( 20 52 21 53 22 54 23 55 ) ( 24 56 25 57 26 58 27 59 ) ( 28 60 29 61 30 62 31 63 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:           2
--------------------------------------------------------------------------------
Domain:                 0
Processors:             ( 0 32 1 33 2 34 3 35 4 36 5 37 6 38 7 39 8 40 9 41 10 42 11 43 12 44 13 45 14 46 15 47 )
Distances:              10 32
Free memory:            39754.6 MB
Total memory:           64246.9 MB
--------------------------------------------------------------------------------
Domain:                 1
Processors:             ( 16 48 17 49 18 50 19 51 20 52 21 53 22 54 23 55 24 56 25 57 26 58 27 59 28 60 29 61 30 62 31 63 )
Distances:              32 10
Free memory:            39940 MB
Total memory:           64493.9 MB
--------------------------------------------------------------------------------

Any IT here? Help Me!

LLM inference benchmarks with llamacpp and AMD EPYC 7282 CPU