sse vector programming with gcc

GCC supports vector programming against the SSE ALU, but the documentation was scattered and a little tedious to piece together. This will hopefully get you up to speed more quickly.

First off, you need to make a typedef to represent the vector you’d like to operate on:

typedef double v2df __attribute__ ((vector_size (16)));

The above is saying that we want a 16 byte vector of doubles, meaning we’ll have a vector of 2 doubles, the most that pre-AVX supports in the 128 bit xmm registers.

The Variable Attributes documentation on vector_size provides useful clues as to usage:

This attribute is only applicable to integral and float scalars, although arrays, pointers, and function return values are allowed in conjunction with this construct.

Aggregates with this attribute are invalid, even if they are of the same size as a corresponding scalar. For example, the declaration:

          struct S { int a; };
          struct S  __attribute__ ((vector_size (16))) foo;

is invalid even if the size of the structure is the same as the size of the int.

What it doesn’t mention is that the newly created type will be aligned, in the case of our above example, on a 16-byte boundary. That means if you use this in a struct you’ll need to treat this differently then an array of doubles to avoid extra padding:

struct state {
        uint64_t        id;
        v2df            pair;

such as 8 bytes padding between id and pair above (and a warning if you compile with -Wpadded–highly suggested).

GCC provides some nice syntax for working with vectors. For starts you can use initializer and array syntax (at least with GCC 4.6, possibly earlier):

v2df n = {3.1415926, 2.71828183}
printf("e: %.4f", n[1]);

You can also use the familiar scalar operators for addition, multiplication, bit-shifting, comparison, etc.

GCC 4.7 looks to be allowing scalars to be used direct:

v2df n = {...};
v2df m = n * 2.5;

Until then you have to build a vector out of the scalar yourself:

v2df n = {...};
v2df f = {2.5, 2.5};
v2df m = n * f;

SSE has a number of richer operators that you can’t access directly using syntax. Instead you can use them as x86 built-in functions. For instance, the following will take the pairwise maximum of two vectors:

v2df c = __builtin_ia32_maxpd(a, b);

There’s no documentation for these, so I’d use an opcode reference to find what you want and how it works and then use the corresponding built-in. There are some builtins that will load and store individual values into and out of the vector. I wouldn’t advise using these as GCC will do things like copy your value into a register, then onto the stack and then call the operator on the stack value. If you can use the GCC syntax to get what you need, you’ll be better off using that.

You can read more about the syntax in the GCC vector programming documentation.

Finally, you may need to conditionally use built-ins. For instance, roundpd was only introduced in SSE4.1; if you’re compiling for a target that only has up up to SSE3 GCC will give you an undefined function error. However, it has macros such as __SSE4_1__ you can use to #ifdef around any such cases.


~ by pulotka on 2012/01/06.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: