public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Limits and feature test macroses for vector extension
@ 2023-01-01 14:53 Nikita Zlobin
  2023-01-09  7:55 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Nikita Zlobin @ 2023-01-01 14:53 UTC (permalink / raw)
  To: gcc

Vector extension is great, because allowes to use controllable
vectorization without dealing with each SIMD ISA separately. When
properly used, it allowes to get better performance, than with
auto-vectorization. However, there's just one issue.

While for specific SIMD, used as backends for vec-ext, it's possible
to check if they are supported, there's no similar features for vector
extension. The only way yo make it configurable without manually
checking each ISA, is to e.g. add configure parameter
--vector-size=<bytes>, with enough goot commentary for user to
understand, what should be there (to be specified in __attributes__((
bytes )) ).

My first approach was to check for possibility to make autodetected
config, e.g. with autoconf, ins such way (not ideal, just for start):

gcc -march=native -E -v - < /dev/null 2>&1 | awk 'BEGIN{ arr[0]=0;
delete arr[0]; } /cc1/{ for (i=1; i<=NF; i++){ if ($i ~ /-mno-/)
continue; switch ($i){ case /-m(mmx|3dnow|vis)/: arr[8]=1; break; case
/-m(sse|altivec)/: arr[16]=1; break; case /^-mavx[2]?$/: arr[32]=1;
break; case /-mavx-512/: arr[64]=1; break; } }; for (j in arr) print
j; }'

However, I discovered, that I have no idea, how to detect NEON vector
size in this way (even its presence). There was answer, suggesting to
check feature test macros. After trying this command:

gcc -march=native -dM -E - </dev/null | less

I discovered, that other ISA, like MMX, SSE and AVX, have similar
feature test macroses, e.g. __MMX__, __SSE2__, __AVX__. This means,
that simple C header with __GNU_SOURCE, would be enough to check for
each ISA without calling functions from Target Builtins extension.

However, it's not end. Some ISA have limited set of elementary types
to be used in vectors. E.g., MMX and 3DNow! don't support integer.
This may be issue if integer implementation of some code has better
performance than if using floating point format (even with same data
width). This neccesitates for real feature test macroses, representing
data types, supported by supported SIMD ISA.

E.g., for simple vector sizes - it could be done with array (example):

#define __EXT_VECTOR_SIZEV (int[]){64, 128, 256, 512}

with array len determined as sizeof(vec) / sizeof(vec[0])

But for exact check of supported data types - there could be variants:

1. Using per-type feature test macroses: __V8SI16__, __V8UI16__,
__V8F16__, __V4SI32__, __V4UI32__, __V4F32__, __V2SI64__,
__V2F64__....
(I discovered at wikipedia - some ISA restrict underlying int size to
32bit without 64bit support).

2. Extend array for supported lengths to be 2d matrix of supported
vector size + underlying element type combination. This could use
NULL-terminated array to mark end if real values sequence. First
subarray represents vector sizes, while next subarrays each correspond
to value from first. Their elements are int fields, combining bitwidth
value with bit flags, representing if it's float/int and (for int)
signed/unsigned.

Though who knowes if eventually complex numbers could have chance to
appear in this list :D . Well, even without this this could be tricky
way.

3. There could be variation of 2nd way, representing per-type vector
sizes lists rather than per-vector-size data types. This could be more
practical, since algothythms would rather need available vector sizes
for specific data types, used inside.

As for relying for vector size subdivision when it has no
corresponding ISA support - I got only worse performance in this way.
Although I'm not sure, that it's not gcc bug: if there are 2
subvectors existing at the same time, than it could be just too much
SIMD registers used. While if they are processed in sequence, this
probably should not worsen performance (I never tried manual code
intrinsics).

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Limits and feature test macroses for vector extension
  2023-01-01 14:53 Limits and feature test macroses for vector extension Nikita Zlobin
@ 2023-01-09  7:55 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2023-01-09  7:55 UTC (permalink / raw)
  To: Nikita Zlobin; +Cc: gcc

On Sun, Jan 1, 2023 at 3:54 PM Nikita Zlobin via Gcc <gcc@gcc.gnu.org> wrote:
>
> Vector extension is great, because allowes to use controllable
> vectorization without dealing with each SIMD ISA separately. When
> properly used, it allowes to get better performance, than with
> auto-vectorization. However, there's just one issue.
>
> While for specific SIMD, used as backends for vec-ext, it's possible
> to check if they are supported, there's no similar features for vector
> extension. The only way yo make it configurable without manually
> checking each ISA, is to e.g. add configure parameter
> --vector-size=<bytes>, with enough goot commentary for user to
> understand, what should be there (to be specified in __attributes__((
> bytes )) ).
>
> My first approach was to check for possibility to make autodetected
> config, e.g. with autoconf, ins such way (not ideal, just for start):
>
> gcc -march=native -E -v - < /dev/null 2>&1 | awk 'BEGIN{ arr[0]=0;
> delete arr[0]; } /cc1/{ for (i=1; i<=NF; i++){ if ($i ~ /-mno-/)
> continue; switch ($i){ case /-m(mmx|3dnow|vis)/: arr[8]=1; break; case
> /-m(sse|altivec)/: arr[16]=1; break; case /^-mavx[2]?$/: arr[32]=1;
> break; case /-mavx-512/: arr[64]=1; break; } }; for (j in arr) print
> j; }'

There's -Wvector-operation-performance which will diagnose cases
where GCC decomposes larger into smaller vectors or even to scalar
operations.  That might be of some help here as well.

> However, I discovered, that I have no idea, how to detect NEON vector
> size in this way (even its presence). There was answer, suggesting to
> check feature test macros. After trying this command:
>
> gcc -march=native -dM -E - </dev/null | less
>
> I discovered, that other ISA, like MMX, SSE and AVX, have similar
> feature test macroses, e.g. __MMX__, __SSE2__, __AVX__. This means,
> that simple C header with __GNU_SOURCE, would be enough to check for
> each ISA without calling functions from Target Builtins extension.
>
> However, it's not end. Some ISA have limited set of elementary types
> to be used in vectors. E.g., MMX and 3DNow! don't support integer.
> This may be issue if integer implementation of some code has better
> performance than if using floating point format (even with same data
> width). This neccesitates for real feature test macroses, representing
> data types, supported by supported SIMD ISA.
>
> E.g., for simple vector sizes - it could be done with array (example):
>
> #define __EXT_VECTOR_SIZEV (int[]){64, 128, 256, 512}
>
> with array len determined as sizeof(vec) / sizeof(vec[0])
>
> But for exact check of supported data types - there could be variants:
>
> 1. Using per-type feature test macroses: __V8SI16__, __V8UI16__,
> __V8F16__, __V4SI32__, __V4UI32__, __V4F32__, __V2SI64__,
> __V2F64__....
> (I discovered at wikipedia - some ISA restrict underlying int size to
> 32bit without 64bit support).
>
> 2. Extend array for supported lengths to be 2d matrix of supported
> vector size + underlying element type combination. This could use
> NULL-terminated array to mark end if real values sequence. First
> subarray represents vector sizes, while next subarrays each correspond
> to value from first. Their elements are int fields, combining bitwidth
> value with bit flags, representing if it's float/int and (for int)
> signed/unsigned.
>
> Though who knowes if eventually complex numbers could have chance to
> appear in this list :D . Well, even without this this could be tricky
> way.
>
> 3. There could be variation of 2nd way, representing per-type vector
> sizes lists rather than per-vector-size data types. This could be more
> practical, since algothythms would rather need available vector sizes
> for specific data types, used inside.
>
> As for relying for vector size subdivision when it has no
> corresponding ISA support - I got only worse performance in this way.
> Although I'm not sure, that it's not gcc bug: if there are 2
> subvectors existing at the same time, than it could be just too much
> SIMD registers used. While if they are processed in sequence, this
> probably should not worsen performance (I never tried manual code
> intrinsics).

In general you'll figure that writing generic vector code is as hard
as autovectorizing scalar code...

Richard.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-01-09  7:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-01 14:53 Limits and feature test macroses for vector extension Nikita Zlobin
2023-01-09  7:55 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).