From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21409 invoked by alias); 6 Mar 2015 12:13:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 21393 invoked by uid 89); 6 Mar 2015 12:13:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_FROM_URIBL_PCCC,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-oi0-f53.google.com Received: from mail-oi0-f53.google.com (HELO mail-oi0-f53.google.com) (209.85.218.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 06 Mar 2015 12:13:46 +0000 Received: by oiav1 with SMTP id v1so17232391oia.9 for ; Fri, 06 Mar 2015 04:13:44 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.202.129.70 with SMTP id c67mr10068468oid.46.1425644024700; Fri, 06 Mar 2015 04:13:44 -0800 (PST) Received: by 10.76.98.137 with HTTP; Fri, 6 Mar 2015 04:13:44 -0800 (PST) In-Reply-To: <000401d057f8$c2205870$46610950$@arm.com> References: <000401d057f8$c2205870$46610950$@arm.com> Date: Fri, 06 Mar 2015 12:13:00 -0000 Message-ID: Subject: Re: [PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations From: Richard Biener To: "Thomas Preud'homme" Cc: Jakub Jelinek , Yuri Rumyantsev , GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-03/txt/msg00357.txt.bz2 On Fri, Mar 6, 2015 at 11:31 AM, Thomas Preud'homme wrote: > Hi, > > Improved canonization after r216728 causes GCC to more often generate poo= r code due to suboptimal ordering of operand of commutative libcalls. Indee= d, if one of the operands of a commutative operation is the result of a pre= vious operation, both being implemented by libcall, the wrong ordering of t= he operands in the second operation can lead to extra mov. Consider the fol= lowing case on softfloat targets: > > double > test1 (double x, double y) > { > return x * (x + y); > } > > If x + y is put in the operand using the same register as the result of t= he libcall for x + y then no mov is generated, otherwise mov is needed. The= following happens on arm softfloat with the right ordering: > > bl __aeabi_dadd > ldr r2, [sp] > ldr r3, [sp, #4] > /* r0, r1 are reused from the return values of the __aeabi_dadd libcall. = */ > bl __aeabi_dmul > > With the wrong ordering one gets: > > bl __aeabi_dadd > mov r2, r0 > mov r3, r1 > ldr r0, [sp] > ldr r1, [sp, #4] > bl __aeabi_dmul > > This patch extend the patch written by Yuri Rumyantsev in r219646 to also= deal with the case of only one of the operand being the result of an opera= tion. ChangeLog entries are as follows: > > *** gcc/ChangeLog *** > > 2015-03-05 Thomas Preud'homme > > PR tree-optimization/63743 > * cfgexpand.c (reorder_operands): Also reorder if only second ope= rand > had its definition forwarded by TER. > > *** gcc/testsuite/ChangeLog *** > > 2015-03-05 Thomas Preud'homme > > PR tree-optimization/63743 > * gcc.dg/pr63743.c: New test. > > > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index 7dfe1f6..4fbc037 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb) > continue; > /* Swap operands if the second one is more expensive. */ > def0 =3D get_gimple_for_ssa_name (op0); > - if (!def0) > - continue; > def1 =3D get_gimple_for_ssa_name (op1); > if (!def1) > continue; > swap =3D false; > - if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)]) > + if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0= )]) > swap =3D true; > if (swap) > { > @@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb) > fprintf (dump_file, "Swap operands in stmt:\n"); > print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM); > fprintf (dump_file, "Cost left opnd=3D%d, right opnd=3D%d\n= ", > - lattice[gimple_uid (def0)], > + def0 ? lattice[gimple_uid (def0)] : 0, > lattice[gimple_uid (def1)]); > } > swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt), > diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr6374= 3.c > new file mode 100644 > index 0000000..87254ed > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/pr63743.c > @@ -0,0 +1,11 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O1 -fdump-rtl-expand-details" } */ > + > +double > +libcall_dep (double x, double y) > +{ > + return x * (x + y); > +} > + > +/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */ > +/* { dg-final { cleanup-rtl-dump "expand" } } */ > > > Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-com= piler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any = regression. > CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% c= ode size increase on x86_64-linux-gnu. > > Is it ok for trunk (since it fixes a code size regression in 5.0)? Ok. Thanks, Richard. > Best regards, > > Thomas > > >