New M1 Chipset, SIMD
Hello *,
I wonder if the new M1 chipset supports SIMD intrinsic instructions or provide any similar?
Best Regards,
Ramin
MacBook
Coming soon: Apple Event on 9/9 at 10 a.m. PT
Hello *,
I wonder if the new M1 chipset supports SIMD intrinsic instructions or provide any similar?
Best Regards,
Ramin
MacBook
The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.
The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.
For the example I provided, I used sse2neon which clones the x86-64 SIMD intrinsics (MMX, SSE, AES) with their Neon counterparts. Therefore, the only change to the C code to allow compilation on the M1 was this conditional:
#ifdef __x86_64__
#include <immintrin.h>
#else
#include "sse2neon.h"
#endif
This allows you to use the same intrinsics for both architectures. Intel provides a great guide for using the x86-64 intrinsics.
ramin-raeisi wrote:
Hello *,
I wonder if the new M1 chipset supports SIMD intrinsic instructions or provide any similar?
Best Regards,
Ramin
What replaces x86 intrinsics for C when Apple ditches Intel ...
rorden wrote:
The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.
Nice.
Thanks for your post rorden.
New M1 Chipset, SIMD