Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

New M1 Chipset, SIMD

Hello *,


I wonder if the new M1 chipset supports SIMD intrinsic instructions or provide any similar?


Best Regards,

Ramin

MacBook

Posted on Nov 21, 2020 5:25 AM

Reply
Question marked as Best reply

Posted on Nov 25, 2020 5:10 AM

The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.

4 replies
Question marked as Best reply

Nov 25, 2020 5:10 AM in response to ramin-raeisi

The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.

Nov 25, 2020 6:58 AM in response to leroydouglas

For the example I provided, I used sse2neon which clones the x86-64 SIMD intrinsics (MMX, SSE, AES) with their Neon counterparts. Therefore, the only change to the C code to allow compilation on the M1 was this conditional:


#ifdef __x86_64__

#include <immintrin.h>

#else

#include "sse2neon.h"

#endif


This allows you to use the same intrinsics for both architectures. Intel provides a great guide for using the x86-64 intrinsics.

Nov 25, 2020 6:21 AM in response to rorden

rorden wrote:

The M1 supports Neon (128-bit) SIMD instructions. It does not support SVE SIMD instructions. Here is a benchmark where scalar C code is compared with explicitly-vectorized Neon code. No difference is observed, either reflecting that the test is constrained by the memory wall or that the Clang/LLVM compiler is automatically vectorizing scalar code. Regardless, the floating-point performance and memory bandwidth of the M1 are not to be denied.


Nice.


Thanks for your post rorden.

New M1 Chipset, SIMD

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.