From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by sourceware.org (Postfix) with ESMTPS id 44A503857BB9 for ; Fri, 17 Jun 2022 20:33:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 44A503857BB9 Received: by mail-lf1-x12a.google.com with SMTP id w20so8511252lfa.11 for ; Fri, 17 Jun 2022 13:33:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tDRsaEj1nJvTsVX4wvHbG7yr69BWjcMKu7I7azKcZ1A=; b=F2bcwJqCl2XuoktlD5JeO6CO37Y27v/Cx+TbVlVlTyi4T4CdvZrhLk6aHugW0lQy3a o3fP9lHYjj8JSv7GBHl9ODY1N/z2OjzLSWkxjGhOmYquC6Zj/xStAdhNAm+dpOZLxBiQ rWwV5F1ssALSbBy5nhH1XG0n3nJhlz/YPYW4eO/M/zXKjh4s2QiGoqgJrVMfFUs4FsOk A28/WG3AJsdmSErrMrZb5DlCmGo24WF7SV935zqyDxhhbS/wZQlUy0IpHyaTW7jK1U6A S1rjN6U+CXFA3ujP1ro93LvTt4mItNWQDvDpCDFcLhCLk8RhVTq7rgdNW/LSeaoZB0y8 3Nsg== X-Gm-Message-State: AJIora9OJCKFKVeoLvh00Nef6d8vsdBny1mLxNc659eJRq1OgGhgyF+V vt3ba2kzCwAEj7XbIij81H3CbLKN3AGv6e3j7mM= X-Google-Smtp-Source: AGRyM1sqxeq/WwsnWztW8XJ7ez3uglfU2NfYmkQaVulC4DYRz4RnF6sS64pnXpvJPdWLhWeUEv+olchItlOj/TnkuK8= X-Received: by 2002:ac2:5b4a:0:b0:478:f079:fa0b with SMTP id i10-20020ac25b4a000000b00478f079fa0bmr6872863lfp.349.1655498031499; Fri, 17 Jun 2022 13:33:51 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrew Pinski Date: Fri, 17 Jun 2022 13:33:39 -0700 Message-ID: Subject: Re: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR. To: Tamar Christina Cc: GCC Patches , nd , Richard Guenther Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Jun 2022 20:33:55 -0000 On Thu, Jun 16, 2022 at 3:59 AM Tamar Christina via Gcc-patches wrote: > > Hi All, > > For IEEE 754 floating point formats we can replace a sequence of alternative > +/- with fneg of a wider type followed by an fadd. This eliminated the need for > using a permutation. This patch adds a math.pd rule to recognize and do this > rewriting. I don't think this is correct. You don't check the format of the floating point to make sure this is valid (e.g. REAL_MODE_FORMAT's signbit_rw/signbit_ro field). Also would just be better if you do the xor in integer mode (using signbit_rw field for the correct bit)? And then making sure the target optimizes the xor to the neg instruction when needed? Thanks, Andrew Pinski > > For > > void f (float *restrict a, float *restrict b, float *res, int n) > { > for (int i = 0; i < (n & -4); i+=2) > { > res[i+0] = a[i+0] + b[i+0]; > res[i+1] = a[i+1] - b[i+1]; > } > } > > we generate: > > .L3: > ldr q1, [x1, x3] > ldr q0, [x0, x3] > fneg v1.2d, v1.2d > fadd v0.4s, v0.4s, v1.4s > str q0, [x2, x3] > add x3, x3, 16 > cmp x3, x4 > bne .L3 > > now instead of: > > .L3: > ldr q1, [x0, x3] > ldr q2, [x1, x3] > fadd v0.4s, v1.4s, v2.4s > fsub v1.4s, v1.4s, v2.4s > tbl v0.16b, {v0.16b - v1.16b}, v3.16b > str q0, [x2, x3] > add x3, x3, 16 > cmp x3, x4 > bne .L3 > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Thanks to George Steed for the idea. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * match.pd: Add fneg/fadd rule. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/simd/addsub_1.c: New test. > * gcc.target/aarch64/sve/addsub_1.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/match.pd b/gcc/match.pd > index 51b0a1b562409af535e53828a10c30b8a3e1ae2e..af1c98d4a2831f38258d6fc1bbe811c8ee6c7c6e 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -7612,6 +7612,49 @@ and, > (simplify (reduc (op @0 VECTOR_CST@1)) > (op (reduc:type @0) (reduc:type @1)))) > > +/* Simplify vector floating point operations of alternating sub/add pairs > + into using an fneg of a wider element type followed by a normal add. > + under IEEE 754 the fneg of the wider type will negate every even entry > + and when doing an add we get a sub of the even and add of every odd > + elements. */ > +(simplify > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2) > + (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN) > + (with > + { > + /* Build a vector of integers from the tree mask. */ > + vec_perm_builder builder; > + if (!tree_to_vec_perm_builder (&builder, @2)) > + return NULL_TREE; > + > + /* Create a vec_perm_indices for the integer vector. */ > + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); > + vec_perm_indices sel (builder, 2, nelts); > + } > + (if (sel.series_p (0, 2, 0, 2)) > + (with > + { > + machine_mode vec_mode = TYPE_MODE (type); > + auto elem_mode = GET_MODE_INNER (vec_mode); > + auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2); > + tree stype; > + switch (elem_mode) > + { > + case E_HFmode: > + stype = float_type_node; > + break; > + case E_SFmode: > + stype = double_type_node; > + break; > + default: > + return NULL_TREE; > + } > + tree ntype = build_vector_type (stype, nunits); > + if (!ntype) > + return NULL_TREE; > + } > + (plus (view_convert:type (negate (view_convert:ntype @1))) @0)))))) > + > (simplify > (vec_perm @0 @1 VECTOR_CST@2) > (with > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > @@ -0,0 +1,56 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */ > +/* { dg-options "-Ofast" } */ > +/* { dg-add-options arm_v8_2a_fp16_neon } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#pragma GCC target "+nosve" > + > +/* > +** f1: > +** ... > +** fneg v[0-9]+.2d, v[0-9]+.2d > +** fadd v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** ... > +*/ > +void f1 (float *restrict a, float *restrict b, float *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** d1: > +** ... > +** fneg v[0-9]+.4s, v[0-9]+.4s > +** fadd v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h > +** ... > +*/ > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) > +{ > + for (int i = 0; i < (n & -8); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** e1: > +** ... > +** fadd v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d > +** fsub v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d > +** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\] > +** ... > +*/ > +void e1 (double *restrict a, double *restrict b, double *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a314a29b7a7a922 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > @@ -0,0 +1,52 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Ofast" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +/* > +** f1: > +** ... > +** fneg z[0-9]+.d, p[0-9]+/m, z[0-9]+.d > +** fadd z[0-9]+.s, z[0-9]+.s, z[0-9]+.s > +** ... > +*/ > +void f1 (float *restrict a, float *restrict b, float *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** d1: > +** ... > +** fneg z[0-9]+.s, p[0-9]+/m, z[0-9]+.s > +** fadd z[0-9]+.h, z[0-9]+.h, z[0-9]+.h > +** ... > +*/ > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) > +{ > + for (int i = 0; i < (n & -8); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** e1: > +** ... > +** fsub z[0-9]+.d, z[0-9]+.d, z[0-9]+.d > +** movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d > +** fadd z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > +** ... > +*/ > +void e1 (double *restrict a, double *restrict b, double *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > > > > > --