From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=UsQP=5G=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d])
	by sourceware.org (Postfix) with ESMTPS id CFFF03858D37
	for <gcc@gcc.gnu.org>; Mon,  9 Jan 2023 07:55:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CFFF03858D37
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lf1-x12d.google.com with SMTP id y25so11686750lfa.9
        for <gcc@gcc.gnu.org>; Sun, 08 Jan 2023 23:55:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=nQXwwWNyje3CgBVMxawy7oPrDev/H5izg77lzfGG6sQ=;
        b=olIVBzCf9fv7ZI6YCWVl4yZSgFpvpe0qhecHWLOR8tCmgt5QV2R2W6hqKDzCWsbXqT
         nJqa9AdT9gJCl2uJcVqtJ563h2EBtraUx5te/wMiYGJlf1ns3AHWpLLSTVtqLZdfNhbi
         DrRzaLeMdMP8DbWCtJaaigzvYtDxeuq4lZmuADzO/cWEEripTpEhHK3I1Ea4WnIyUDmg
         LQOTlDjioSHaSR6+hgc5hY6HuIuNH1E+bN41ZuwC0q1sFI9Ofy3T8hMLrbfiCDDpDdSK
         E6vuvEi2cXUW2idhVpJH2kSQxTsTy4IJezaNUVWwry3ibE79FzjRsGdKqr/THhcjSM9a
         hYEw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=nQXwwWNyje3CgBVMxawy7oPrDev/H5izg77lzfGG6sQ=;
        b=cSsQ3hzcbcSv28nmsw36DDadg5UJv3YV8ZtQsTZqUzTakBcNb2WwV7Y64qbLjgT+c1
         0WOiGKqSYU3XqyoiEFP3GGYjM3pH/8CtWO6zGViOSjTmN3Nu9+xGcdh2U/kbda+Y4Nfy
         cc846KaTlweVuK0syd62janR6FGIoiSqL/IFndWrIjDYRNXEqmwtN5zhx2Fy5jlSuuc2
         nuri9lRftdxL8uGW4Kaz0OlESCWzYvW1bJAwDqU1jdYF6AQqm1zlf2ZBVodw7jQzd5F8
         IZtSvrAtgad7jhstqYXsbUYV0ZdUXl4PLoYJfGYA1PmTlbdw6OveEicBXFa0gDTH+DB1
         dJvw==
X-Gm-Message-State: AFqh2kpCLv1r/W1ytum3F5Qg5eobt/Ns1tYB35BCqR5a4XiZtlHpsrHD
	QxfF47fGtM6gwujfUq8sXCx42H9A8qYSI5OKAiw=
X-Google-Smtp-Source: AMrXdXtIMhmWBIuA80+ngshOKSH7UusJuNDOTe8ThJLZkGPBKOvc1hZcddiF2WriWwMfvlKZdW6xrYq85X9UUmMu+JI=
X-Received: by 2002:a05:6512:3410:b0:4cb:3ca6:1d1a with SMTP id
 i16-20020a056512341000b004cb3ca61d1amr1155410lfr.448.1673250950232; Sun, 08
 Jan 2023 23:55:50 -0800 (PST)
MIME-Version: 1.0
References: <CAGaigJORdgraoEjU_4Va3-SuJups9Zsgf=EmG+Y1BmzerO_yqQ@mail.gmail.com>
In-Reply-To: <CAGaigJORdgraoEjU_4Va3-SuJups9Zsgf=EmG+Y1BmzerO_yqQ@mail.gmail.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Mon, 9 Jan 2023 08:55:35 +0100
Message-ID: <CAFiYyc3Eg_S5kk473H8n_D5V2YXOOVJnZBnV8NJ+=-HD5bnSdQ@mail.gmail.com>
Subject: Re: Limits and feature test macroses for vector extension
To: Nikita Zlobin <nick87720z@gmail.com>
Cc: gcc@gcc.gnu.org
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

On Sun, Jan 1, 2023 at 3:54 PM Nikita Zlobin via Gcc <gcc@gcc.gnu.org> wrote:
>
> Vector extension is great, because allowes to use controllable
> vectorization without dealing with each SIMD ISA separately. When
> properly used, it allowes to get better performance, than with
> auto-vectorization. However, there's just one issue.
>
> While for specific SIMD, used as backends for vec-ext, it's possible
> to check if they are supported, there's no similar features for vector
> extension. The only way yo make it configurable without manually
> checking each ISA, is to e.g. add configure parameter
> --vector-size=<bytes>, with enough goot commentary for user to
> understand, what should be there (to be specified in __attributes__((
> bytes )) ).
>
> My first approach was to check for possibility to make autodetected
> config, e.g. with autoconf, ins such way (not ideal, just for start):
>
> gcc -march=native -E -v - < /dev/null 2>&1 | awk 'BEGIN{ arr[0]=0;
> delete arr[0]; } /cc1/{ for (i=1; i<=NF; i++){ if ($i ~ /-mno-/)
> continue; switch ($i){ case /-m(mmx|3dnow|vis)/: arr[8]=1; break; case
> /-m(sse|altivec)/: arr[16]=1; break; case /^-mavx[2]?$/: arr[32]=1;
> break; case /-mavx-512/: arr[64]=1; break; } }; for (j in arr) print
> j; }'

There's -Wvector-operation-performance which will diagnose cases
where GCC decomposes larger into smaller vectors or even to scalar
operations.  That might be of some help here as well.

> However, I discovered, that I have no idea, how to detect NEON vector
> size in this way (even its presence). There was answer, suggesting to
> check feature test macros. After trying this command:
>
> gcc -march=native -dM -E - </dev/null | less
>
> I discovered, that other ISA, like MMX, SSE and AVX, have similar
> feature test macroses, e.g. __MMX__, __SSE2__, __AVX__. This means,
> that simple C header with __GNU_SOURCE, would be enough to check for
> each ISA without calling functions from Target Builtins extension.
>
> However, it's not end. Some ISA have limited set of elementary types
> to be used in vectors. E.g., MMX and 3DNow! don't support integer.
> This may be issue if integer implementation of some code has better
> performance than if using floating point format (even with same data
> width). This neccesitates for real feature test macroses, representing
> data types, supported by supported SIMD ISA.
>
> E.g., for simple vector sizes - it could be done with array (example):
>
> #define __EXT_VECTOR_SIZEV (int[]){64, 128, 256, 512}
>
> with array len determined as sizeof(vec) / sizeof(vec[0])
>
> But for exact check of supported data types - there could be variants:
>
> 1. Using per-type feature test macroses: __V8SI16__, __V8UI16__,
> __V8F16__, __V4SI32__, __V4UI32__, __V4F32__, __V2SI64__,
> __V2F64__....
> (I discovered at wikipedia - some ISA restrict underlying int size to
> 32bit without 64bit support).
>
> 2. Extend array for supported lengths to be 2d matrix of supported
> vector size + underlying element type combination. This could use
> NULL-terminated array to mark end if real values sequence. First
> subarray represents vector sizes, while next subarrays each correspond
> to value from first. Their elements are int fields, combining bitwidth
> value with bit flags, representing if it's float/int and (for int)
> signed/unsigned.
>
> Though who knowes if eventually complex numbers could have chance to
> appear in this list :D . Well, even without this this could be tricky
> way.
>
> 3. There could be variation of 2nd way, representing per-type vector
> sizes lists rather than per-vector-size data types. This could be more
> practical, since algothythms would rather need available vector sizes
> for specific data types, used inside.
>
> As for relying for vector size subdivision when it has no
> corresponding ISA support - I got only worse performance in this way.
> Although I'm not sure, that it's not gcc bug: if there are 2
> subvectors existing at the same time, than it could be just too much
> SIMD registers used. While if they are processed in sequence, this
> probably should not worsen performance (I never tried manual code
> intrinsics).

In general you'll figure that writing generic vector code is as hard
as autovectorizing scalar code...

Richard.