AVX指令集

这里简要介绍AVX指令集的一些基本指令,可以通过调用C++的库函数实现SIMD。

历史

并行编程

  • MME, 1996
  • SSE, 1999
  • AVX, 2008
  • AVX2, 2011

数据类型

Data TypeDescription
__m128128-bit vector containing 4 floats
__m128d128-bit vector containing 2 doubles
__m128i128-bit vector containing integers
__m256256-bit vector containing 8 floats
__m256d256-bit vector containing 4 doubles
__m256i256-bit vector containing integers

  • integers can be chars, shorts, ints, or longs

函数命名规范(naming conventions)

_mm<bit_width>_<name>_<data_type>

  • <bit_width>: the return size, 128 - empty, 256 - 256
  • <name>: describes the operation performed by the intrinsic
  • <data_type>: the function’s primary arguments

InstructionsDescription
pspacked single-precision
pdpacked double-precision
epi8/epi16/epi32/epi64signed integers
epu8/epu16/epu32/epu64unsigned integers
si128/si256unspecified vector
m128/m128i/m128d

m256/m256i/m256d

input vector types

举例:_mm256_srlv_epi64 64-bit signed int -> 256-bit vector

完整例子

#include <immintrin.h>

#include <stdio.h>

intmain(){

/* Initialize the two argument vectors */

__m256evens=_mm256_set_ps(2.0,4.0,6.0,8.0,10.0,12.0,14.0,16.0);

__m256odds=_mm256_set_ps(1.0,3.0,5.0,7.0,9.0,11.0,13.0,15.0);

/* Compute the difference between the two vectors */

__m256result=_mm256_sub_ps(evens,odds);

/* Display the elements of the result vector */

float*f=(float*)&result;// type conversion

printf("%f %f %f %f %f %f %f %f\n",

f[0],f[1],f[2],f[3],f[4],f[5],f[6],f[7]);

return0;

}

编译时加-mavx-mavx2

常见指令

初始化

  • _mm256_setzero_ps
  • _mm256_set1_ps
  • _mm256_set_ps: predefined values
  • _mm256_setr_ps: reversed order

访存

  • _mm256_load_ps
  • _mm256_maskload_ps(address, integer vector): mask 1 read, 0 setzero
  • aligned_alloc(32, 64 * sizeof(float)): 32-byte boundary

算术逻辑

  • _mm256_add/sub/mul_ps
  • _mm256_and/cmpeq_ps
  • _mm256_hadd/hsub_ps
  • _mm256_mullo_epi32

    March 3, 2019 - AVX指令集March 3, 2019 - AVX指令集March 3, 2019 - AVX指令集

融合乘积(Fuse Multiply and Add (FMA))

编译指令-mfma

  • _mm_fmadd_ps: res = a * b + c
  • _mm_fmadd_ss: res[0] = a[0] * b[0] + c[0]

重排

  • _mm256_permute_ps: based on 8-bit control value
  • _mm256_shuffle_ps: first 2, second 2

    March 3, 2019 - AVX指令集March 3, 2019 - AVX指令集

参考资料

以上是 AVX指令集 的全部内容, 来源链接: utcz.com/a/127353.html

回到顶部