From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12d.google.com (mail-lf1-x12d.google.com [IPv6:2a00:1450:4864:20::12d]) by sourceware.org (Postfix) with ESMTPS id CFFF03858D37 for ; Mon, 9 Jan 2023 07:55:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CFFF03858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12d.google.com with SMTP id y25so11686750lfa.9 for ; Sun, 08 Jan 2023 23:55:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=nQXwwWNyje3CgBVMxawy7oPrDev/H5izg77lzfGG6sQ=; b=olIVBzCf9fv7ZI6YCWVl4yZSgFpvpe0qhecHWLOR8tCmgt5QV2R2W6hqKDzCWsbXqT nJqa9AdT9gJCl2uJcVqtJ563h2EBtraUx5te/wMiYGJlf1ns3AHWpLLSTVtqLZdfNhbi DrRzaLeMdMP8DbWCtJaaigzvYtDxeuq4lZmuADzO/cWEEripTpEhHK3I1Ea4WnIyUDmg LQOTlDjioSHaSR6+hgc5hY6HuIuNH1E+bN41ZuwC0q1sFI9Ofy3T8hMLrbfiCDDpDdSK E6vuvEi2cXUW2idhVpJH2kSQxTsTy4IJezaNUVWwry3ibE79FzjRsGdKqr/THhcjSM9a hYEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nQXwwWNyje3CgBVMxawy7oPrDev/H5izg77lzfGG6sQ=; b=cSsQ3hzcbcSv28nmsw36DDadg5UJv3YV8ZtQsTZqUzTakBcNb2WwV7Y64qbLjgT+c1 0WOiGKqSYU3XqyoiEFP3GGYjM3pH/8CtWO6zGViOSjTmN3Nu9+xGcdh2U/kbda+Y4Nfy cc846KaTlweVuK0syd62janR6FGIoiSqL/IFndWrIjDYRNXEqmwtN5zhx2Fy5jlSuuc2 nuri9lRftdxL8uGW4Kaz0OlESCWzYvW1bJAwDqU1jdYF6AQqm1zlf2ZBVodw7jQzd5F8 IZtSvrAtgad7jhstqYXsbUYV0ZdUXl4PLoYJfGYA1PmTlbdw6OveEicBXFa0gDTH+DB1 dJvw== X-Gm-Message-State: AFqh2kpCLv1r/W1ytum3F5Qg5eobt/Ns1tYB35BCqR5a4XiZtlHpsrHD QxfF47fGtM6gwujfUq8sXCx42H9A8qYSI5OKAiw= X-Google-Smtp-Source: AMrXdXtIMhmWBIuA80+ngshOKSH7UusJuNDOTe8ThJLZkGPBKOvc1hZcddiF2WriWwMfvlKZdW6xrYq85X9UUmMu+JI= X-Received: by 2002:a05:6512:3410:b0:4cb:3ca6:1d1a with SMTP id i16-20020a056512341000b004cb3ca61d1amr1155410lfr.448.1673250950232; Sun, 08 Jan 2023 23:55:50 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 9 Jan 2023 08:55:35 +0100 Message-ID: Subject: Re: Limits and feature test macroses for vector extension To: Nikita Zlobin Cc: gcc@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Jan 1, 2023 at 3:54 PM Nikita Zlobin via Gcc wrote: > > Vector extension is great, because allowes to use controllable > vectorization without dealing with each SIMD ISA separately. When > properly used, it allowes to get better performance, than with > auto-vectorization. However, there's just one issue. > > While for specific SIMD, used as backends for vec-ext, it's possible > to check if they are supported, there's no similar features for vector > extension. The only way yo make it configurable without manually > checking each ISA, is to e.g. add configure parameter > --vector-size=, with enough goot commentary for user to > understand, what should be there (to be specified in __attributes__(( > bytes )) ). > > My first approach was to check for possibility to make autodetected > config, e.g. with autoconf, ins such way (not ideal, just for start): > > gcc -march=native -E -v - < /dev/null 2>&1 | awk 'BEGIN{ arr[0]=0; > delete arr[0]; } /cc1/{ for (i=1; i<=NF; i++){ if ($i ~ /-mno-/) > continue; switch ($i){ case /-m(mmx|3dnow|vis)/: arr[8]=1; break; case > /-m(sse|altivec)/: arr[16]=1; break; case /^-mavx[2]?$/: arr[32]=1; > break; case /-mavx-512/: arr[64]=1; break; } }; for (j in arr) print > j; }' There's -Wvector-operation-performance which will diagnose cases where GCC decomposes larger into smaller vectors or even to scalar operations. That might be of some help here as well. > However, I discovered, that I have no idea, how to detect NEON vector > size in this way (even its presence). There was answer, suggesting to > check feature test macros. After trying this command: > > gcc -march=native -dM -E - > I discovered, that other ISA, like MMX, SSE and AVX, have similar > feature test macroses, e.g. __MMX__, __SSE2__, __AVX__. This means, > that simple C header with __GNU_SOURCE, would be enough to check for > each ISA without calling functions from Target Builtins extension. > > However, it's not end. Some ISA have limited set of elementary types > to be used in vectors. E.g., MMX and 3DNow! don't support integer. > This may be issue if integer implementation of some code has better > performance than if using floating point format (even with same data > width). This neccesitates for real feature test macroses, representing > data types, supported by supported SIMD ISA. > > E.g., for simple vector sizes - it could be done with array (example): > > #define __EXT_VECTOR_SIZEV (int[]){64, 128, 256, 512} > > with array len determined as sizeof(vec) / sizeof(vec[0]) > > But for exact check of supported data types - there could be variants: > > 1. Using per-type feature test macroses: __V8SI16__, __V8UI16__, > __V8F16__, __V4SI32__, __V4UI32__, __V4F32__, __V2SI64__, > __V2F64__.... > (I discovered at wikipedia - some ISA restrict underlying int size to > 32bit without 64bit support). > > 2. Extend array for supported lengths to be 2d matrix of supported > vector size + underlying element type combination. This could use > NULL-terminated array to mark end if real values sequence. First > subarray represents vector sizes, while next subarrays each correspond > to value from first. Their elements are int fields, combining bitwidth > value with bit flags, representing if it's float/int and (for int) > signed/unsigned. > > Though who knowes if eventually complex numbers could have chance to > appear in this list :D . Well, even without this this could be tricky > way. > > 3. There could be variation of 2nd way, representing per-type vector > sizes lists rather than per-vector-size data types. This could be more > practical, since algothythms would rather need available vector sizes > for specific data types, used inside. > > As for relying for vector size subdivision when it has no > corresponding ISA support - I got only worse performance in this way. > Although I'm not sure, that it's not gcc bug: if there are 2 > subvectors existing at the same time, than it could be just too much > SIMD registers used. While if they are processed in sequence, this > probably should not worsen performance (I never tried manual code > intrinsics). In general you'll figure that writing generic vector code is as hard as autovectorizing scalar code... Richard.