From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 823E93858C52 for ; Fri, 23 Sep 2022 12:54:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 823E93858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 6F85121984; Fri, 23 Sep 2022 12:54:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1663937661; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JzU5AUlgZCEjReQLN/dOMLth44qiy3c+12+urrXO8Tg=; b=WaTbcgGzSOHl8lo7JbAMliS7D3QCM1+hoYJiemVfsY16N150xUahjag1YnJiEDJ/45Do6V VAoPbEnpxrSL3azAKDp14zfFc3HgKfWfnyVFWJsObERm+7tzh7lEiAN8bUsdkeRTUkfhiH 6sJCXHJkd7wYyysUVSx673cyLmtXe0M= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1663937661; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=JzU5AUlgZCEjReQLN/dOMLth44qiy3c+12+urrXO8Tg=; b=2xqzk1h45We3zrc6gRL+KlbBDkfEJuhAtqGZneKsR5otg/G97RICbFgUt0Maujl+SKZa65 SLH1jDih0flJr3Aw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 627082C15A; Fri, 23 Sep 2022 12:54:21 +0000 (UTC) Date: Fri, 23 Sep 2022 12:54:21 +0000 (UTC) From: Richard Biener To: Tamar Christina cc: Richard Sandiford , Tamar Christina via Gcc-patches , nd , juzhe.zhong@rivai.ai Subject: RE: [PATCH]middle-end Add optimized float addsub without needing VEC_PERM_EXPR. In-Reply-To: Message-ID: References: <1C4185AB-6EE6-4B8B-838C-465098DAFD3B@suse.de> <997q6no-qsqp-1oro-52sp-899sr075p4po@fhfr.qr> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_LOTSOFHASH,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 23 Sep 2022, Tamar Christina wrote: > Hi, > > Attached is the respun version of the patch, > > > >> > > >> Wouldn't a target need to re-check if lanes are NaN or denormal if > > >> after a SFmode lane operation a DFmode lane operation follows? IIRC > > >> that is what usually makes punning "integer" vectors as FP vectors costly. > > I don't believe this is a problem, due to NANs not being a single value and > according to the standard the sign bit doesn't change the meaning of a NAN. > > That's why specifically for negates generally no check is performed and it's > Assumed that if a value is a NaN going in, it's a NaN coming out, and this > Optimization doesn't change that. Also under fast-math we don't guarantee > a stable representation for NaN (or zeros, etc) afaik. > > So if that is still a concern I could add && !HONORS_NAN () to the constraints. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * match.pd: Add fneg/fadd rule. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/simd/addsub_1.c: New test. > * gcc.target/aarch64/sve/addsub_1.c: New test. > > --- inline version of patch --- > > diff --git a/gcc/match.pd b/gcc/match.pd > index 1bb936fc4010f98f24bb97671350e8432c55b347..2617d56091dfbd41ae49f980ee0af3757f5ec1cf 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -7916,6 +7916,59 @@ and, > (simplify (reduc (op @0 VECTOR_CST@1)) > (op (reduc:type @0) (reduc:type @1)))) > > +/* Simplify vector floating point operations of alternating sub/add pairs > + into using an fneg of a wider element type followed by a normal add. > + under IEEE 754 the fneg of the wider type will negate every even entry > + and when doing an add we get a sub of the even and add of every odd > + elements. */ > +(simplify > + (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2) > + (if (!VECTOR_INTEGER_TYPE_P (type) && !BYTES_BIG_ENDIAN) shouldn't this be FLOAT_WORDS_BIG_ENDIAN instead? I'm still concerned what (neg:V2DF (subreg:V2DF (reg:V4SF) 0)) means for architectures like RISC-V. Can one "reformat" FP values in vector registers so that two floats overlap a double (and then back)? I suppose you rely on target_can_change_mode_class to tell you that. > + (with > + { > + /* Build a vector of integers from the tree mask. */ > + vec_perm_builder builder; > + if (!tree_to_vec_perm_builder (&builder, @2)) > + return NULL_TREE; > + > + /* Create a vec_perm_indices for the integer vector. */ > + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); > + vec_perm_indices sel (builder, 2, nelts); > + } > + (if (sel.series_p (0, 2, 0, 2)) > + (with > + { > + machine_mode vec_mode = TYPE_MODE (type); > + auto elem_mode = GET_MODE_INNER (vec_mode); > + auto nunits = exact_div (GET_MODE_NUNITS (vec_mode), 2); > + tree stype; > + switch (elem_mode) > + { > + case E_HFmode: > + stype = float_type_node; > + break; > + case E_SFmode: > + stype = double_type_node; > + break; > + default: > + return NULL_TREE; > + } Can't you use GET_MODE_WIDER_MODE and double-check the mode-size doubles? I mean you obviously miss DFmode -> TFmode. > + tree ntype = build_vector_type (stype, nunits); > + if (!ntype) You want to check that the above results in a vector mode. > + return NULL_TREE; > + > + /* The format has to have a simple sign bit. */ > + const struct real_format *fmt = FLOAT_MODE_FORMAT (vec_mode); > + if (fmt == NULL) > + return NULL_TREE; > + } > + (if (fmt->signbit_rw == GET_MODE_UNIT_BITSIZE (vec_mode) - 1 shouldn't this be a check on the component mode? I think you'd want to check that the bigger format signbit_rw is equal to the smaller format mode size plus its signbit_rw or so? > + && fmt->signbit_rw == fmt->signbit_ro > + && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE (type), ALL_REGS) > + && (optimize_vectors_before_lowering_p () > + || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector))) > + (plus (view_convert:type (negate (view_convert:ntype @1))) @0))))))) > + > (simplify > (vec_perm @0 @1 VECTOR_CST@2) > (with > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..1fb91a34c421bbd2894faa0dbbf1b47ad43310c4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/simd/addsub_1.c > @@ -0,0 +1,56 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */ > +/* { dg-options "-Ofast" } */ > +/* { dg-add-options arm_v8_2a_fp16_neon } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#pragma GCC target "+nosve" > + > +/* > +** f1: > +** ... > +** fneg v[0-9]+.2d, v[0-9]+.2d > +** fadd v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s > +** ... > +*/ > +void f1 (float *restrict a, float *restrict b, float *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** d1: > +** ... > +** fneg v[0-9]+.4s, v[0-9]+.4s > +** fadd v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h > +** ... > +*/ > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) > +{ > + for (int i = 0; i < (n & -8); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** e1: > +** ... > +** fadd v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d > +** fsub v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d > +** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\] > +** ... > +*/ > +void e1 (double *restrict a, double *restrict b, double *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..ea7f9d9db2c8c9a3efe5c7951a314a29b7a7a922 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve/addsub_1.c > @@ -0,0 +1,52 @@ > +/* { dg-do compile } */ > +/* { dg-options "-Ofast" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +/* > +** f1: > +** ... > +** fneg z[0-9]+.d, p[0-9]+/m, z[0-9]+.d > +** fadd z[0-9]+.s, z[0-9]+.s, z[0-9]+.s > +** ... > +*/ > +void f1 (float *restrict a, float *restrict b, float *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** d1: > +** ... > +** fneg z[0-9]+.s, p[0-9]+/m, z[0-9]+.s > +** fadd z[0-9]+.h, z[0-9]+.h, z[0-9]+.h > +** ... > +*/ > +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) > +{ > + for (int i = 0; i < (n & -8); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > + > +/* > +** e1: > +** ... > +** fsub z[0-9]+.d, z[0-9]+.d, z[0-9]+.d > +** movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d > +** fadd z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > +** ... > +*/ > +void e1 (double *restrict a, double *restrict b, double *res, int n) > +{ > + for (int i = 0; i < (n & -4); i+=2) > + { > + res[i+0] = a[i+0] + b[i+0]; > + res[i+1] = a[i+1] - b[i+1]; > + } > +} > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)