New M1 Chipset, SIMD

Hello *,


I wonder if the new M1 chipset supports SIMD intrinsic instructions or provide any similar?


Best Regards,

Ramin

MacBook

Posted on Nov 21, 2020 5:25 AM

Reply
Question marked as Top-ranking reply

Posted on Nov 25, 2020 5:10 AM

The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.

4 replies
Question marked as Top-ranking reply

Nov 25, 2020 5:10 AM in response to ramin-raeisi

The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.

Nov 25, 2020 6:58 AM in response to leroydouglas

For the example I provided, I used sse2neon which clones the x86-64 SIMD intrinsics (MMX, SSE, AES) with their Neon counterparts. Therefore, the only change to the C code to allow compilation on the M1 was this conditional:


#ifdef __x86_64__

#include <immintrin.h>

#else

#include "sse2neon.h"

#endif


This allows you to use the same intrinsics for both architectures. Intel provides a great guide for using the x86-64 intrinsics.

Nov 25, 2020 6:21 AM in response to rorden

rorden wrote:

The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.


Nice.


Thanks for your post rorden.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

New M1 Chipset, SIMD

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.