From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 4CDD13858C5E for ; Fri, 10 Mar 2023 07:46:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4CDD13858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id BC0AB20618; Fri, 10 Mar 2023 07:46:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1678434412; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M+cQWfM6alWOdrVYJ4jJD6TKr2hNdp1ujNNj5TGGmkg=; b=VGiM/TtYdnet5uKe69qli6dAXI0LG1R+3TcGcKFtm5ZHDvNhFsIibAlkOl8bxMZVK3jBTt 158XLfvaVrZiqXPT175mYC/Hw8xlgFPEC7MIpiZc8/RmufuauzHgDnNo2tiP+2/SsrWEkZ F6m+a1maWhKi+sbRWvvucu66Ogb5aKs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1678434412; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M+cQWfM6alWOdrVYJ4jJD6TKr2hNdp1ujNNj5TGGmkg=; b=hsyQx2wocwccaeo2km7BOkA4svNRzhKmf0YvI8l/eBH8HvuOKs5E8doIH+84yoRAG54bq0 tGzSICmSXGSi2PAw== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 858DE2C141; Fri, 10 Mar 2023 07:46:52 +0000 (UTC) Date: Fri, 10 Mar 2023 07:46:52 +0000 (UTC) From: Richard Biener To: Hongtao Liu cc: Tamar Christina , gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com, richard.sandiford@arm.com Subject: Re: [PATCH]middle-end: don't form FMAs when multiplication is not single use. [PR108583] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1609957120-1017682637-1678434412=:18795" X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609957120-1017682637-1678434412=:18795 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT On Fri, 10 Mar 2023, Hongtao Liu wrote: > On Fri, Mar 10, 2023 at 3:37 AM Tamar Christina via Gcc-patches > wrote: > > > > Hi All, > > > > The testcase > > > > typedef unsigned int vec __attribute__((vector_size(32))); > > vec > > f3 (vec a, vec b, vec c) > > { > > vec d = a * b; > > return d + ((c + d) >> 1); > > } > > > > shows a case where we don't want to form an FMA due to the MUL not being single > > use. In this case to form an FMA we have to redo the MUL as well as we no > > longer have it to share. > > > > As such making an FMA here would be a de-optimization. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > PR target/108583 > > * tree-ssa-math-opts.cc (convert_mult_to_fma): Inhibit FMA in case not > > single use. > > > > gcc/testsuite/ChangeLog: > > > > PR target/108583 > > * gcc.dg/mla_1.c: New test. > > > > Co-Authored-By: Richard Sandiford > > > > --- inline copy of patch -- > > diff --git a/gcc/testsuite/gcc.dg/mla_1.c b/gcc/testsuite/gcc.dg/mla_1.c > > new file mode 100644 > > index 0000000000000000000000000000000000000000..a92ecf248116d89b1bc4207a907ea5ed95728a28 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/mla_1.c > > @@ -0,0 +1,40 @@ > > +/* { dg-do compile } */ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-options "-O2 -msve-vector-bits=256 -march=armv8.2-a+sve -fdump-tree-optimized" } */ > > + > > +unsigned int > > +f1 (unsigned int a, unsigned int b, unsigned int c) { > > + unsigned int d = a * b; > > + return d + ((c + d) >> 1); > > +} > > + > > +unsigned int > > +g1 (unsigned int a, unsigned int b, unsigned int c) { > > + return a * b + c; > > +} > > + > > +__Uint32x4_t > > +f2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) { > > + __Uint32x4_t d = a * b; > > + return d + ((c + d) >> 1); > > +} > > + > > +__Uint32x4_t > > +g2 (__Uint32x4_t a, __Uint32x4_t b, __Uint32x4_t c) { > > + return a * b + c; > > +} > > + > > +typedef unsigned int vec __attribute__((vector_size(32))); vec > > +f3 (vec a, vec b, vec c) > > +{ > > + vec d = a * b; > > + return d + ((c + d) >> 1); > > +} > > + > > +vec > > +g3 (vec a, vec b, vec c) > > +{ > > + return a * b + c; > > +} > > + > > +/* { dg-final { scan-tree-dump-times {\.FMA } 1 "optimized" { target aarch64*-*-* } } } */ > > diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc > > index 5ab5b944a573ad24ce8427aff24fc5215bf05dac..26ed91d58fa4709a67c903ad446d267a3113c172 100644 > > --- a/gcc/tree-ssa-math-opts.cc > > +++ b/gcc/tree-ssa-math-opts.cc > > @@ -3346,6 +3346,20 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2, > > param_avoid_fma_max_bits)); > > bool defer = check_defer; > > bool seen_negate_p = false; > > + > > + /* There is no numerical difference between fused and unfused integer FMAs, > > + and the assumption below that FMA is as cheap as addition is unlikely > > + to be true, especially if the multiplication occurs multiple times on > > + the same chain. E.g., for something like: > > + > > + (((a * b) + c) >> 1) + (a * b) > > + > > + we do not want to duplicate the a * b into two additions, not least > > + because the result is not a natural FMA chain. */ > > + if (ANY_INTEGRAL_TYPE_P (type) > > + && !has_single_use (mul_result)) > What about floating point? I think for a case like above, thus ((a * b) + c) + (a * b) it's profitable to handle this as fma (a, b, fma (a, b, c)) as this saves one add and has one op less latency? For the case where the second use is not part of the dependence chain it's less obvious but since FMA is usually not (very much more) expensive than an add erroring on the optimization side didn't look wrong (IIRC the FMA forming analysis isn't "global", aka counts untransformed mults left in the end) Richard. > > + return false; > > + > > /* Make sure that the multiplication statement becomes dead after > > the transformation, thus that all uses are transformed to FMAs. > > This means we assume that an FMA operation has the same cost > > > > > > > > > > -- > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ---1609957120-1017682637-1678434412=:18795--