[PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations
@ 2015-03-06 10:32 Thomas Preud'homme
  2015-03-06 12:13 ` Richard Biener
  0 siblings, 1 reply; 2+ messages in thread
From: Thomas Preud'homme @ 2015-03-06 10:32 UTC (permalink / raw)
  To: Jakub Jelinek, 'Yuri Rumyantsev', gcc-patches

Hi,

Improved canonization after r216728 causes GCC to more often generate poor code due to suboptimal ordering of operand of commutative libcalls. Indeed, if one of the operands of a commutative operation is the result of a previous operation, both being implemented by libcall, the wrong ordering of the operands in the second operation can lead to extra mov. Consider the following case on softfloat targets:

double
test1 (double x, double y)
{
  return x * (x + y);
}

If x + y is put in the operand using the same register as the result of the libcall for x + y then no mov is generated, otherwise mov is needed. The following happens on arm softfloat with the right ordering:

bl      __aeabi_dadd
ldr     r2, [sp]
ldr     r3, [sp, #4]
/* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */
bl      __aeabi_dmul

With the wrong ordering one gets:

bl      __aeabi_dadd
mov     r2, r0
mov     r3, r1
ldr     r0, [sp]
ldr     r1, [sp, #4]
bl      __aeabi_dmul

This patch extend the patch written by Yuri Rumyantsev in r219646 to also deal with the case of only one of the operand being the result of an operation. ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2015-03-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63743
        * cfgexpand.c (reorder_operands): Also reorder if only second operand
        had its definition forwarded by TER.

*** gcc/testsuite/ChangeLog ***

2015-03-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>

        PR tree-optimization/63743
        * gcc.dg/pr63743.c: New test.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 7dfe1f6..4fbc037 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb)
 	continue;
       /* Swap operands if the second one is more expensive.  */
       def0 = get_gimple_for_ssa_name (op0);
-      if (!def0)
-	continue;
       def1 = get_gimple_for_ssa_name (op1);
       if (!def1)
 	continue;
       swap = false;
-      if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
+      if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
 	swap = true;
       if (swap)
 	{
@@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb)
 	      fprintf (dump_file, "Swap operands in stmt:\n");
 	      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 	      fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n",
-		       lattice[gimple_uid (def0)],
+		       def0 ? lattice[gimple_uid (def0)] : 0,
 		       lattice[gimple_uid (def1)]);
 	    }
 	  swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c
new file mode 100644
index 0000000..87254ed
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr63743.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-rtl-expand-details" } */
+
+double
+libcall_dep (double x, double y)
+{
+  return x * (x + y);
+}
+
+/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */

Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any regression. 
CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code size increase on x86_64-linux-gnu.

Is it ok for trunk (since it fixes a code size regression in 5.0)?

Best regards,

Thomas

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations
  2015-03-06 10:32 [PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations Thomas Preud'homme
@ 2015-03-06 12:13 ` Richard Biener
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Biener @ 2015-03-06 12:13 UTC (permalink / raw)
  To: Thomas Preud'homme; +Cc: Jakub Jelinek, Yuri Rumyantsev, GCC Patches

On Fri, Mar 6, 2015 at 11:31 AM, Thomas Preud'homme
<thomas.preudhomme@arm.com> wrote:
> Hi,
>
> Improved canonization after r216728 causes GCC to more often generate poor code due to suboptimal ordering of operand of commutative libcalls. Indeed, if one of the operands of a commutative operation is the result of a previous operation, both being implemented by libcall, the wrong ordering of the operands in the second operation can lead to extra mov. Consider the following case on softfloat targets:
>
> double
> test1 (double x, double y)
> {
>   return x * (x + y);
> }
>
> If x + y is put in the operand using the same register as the result of the libcall for x + y then no mov is generated, otherwise mov is needed. The following happens on arm softfloat with the right ordering:
>
> bl      __aeabi_dadd
> ldr     r2, [sp]
> ldr     r3, [sp, #4]
> /* r0, r1 are reused from the return values of the __aeabi_dadd libcall. */
> bl      __aeabi_dmul
>
> With the wrong ordering one gets:
>
> bl      __aeabi_dadd
> mov     r2, r0
> mov     r3, r1
> ldr     r0, [sp]
> ldr     r1, [sp, #4]
> bl      __aeabi_dmul
>
> This patch extend the patch written by Yuri Rumyantsev in r219646 to also deal with the case of only one of the operand being the result of an operation. ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2015-03-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         PR tree-optimization/63743
>         * cfgexpand.c (reorder_operands): Also reorder if only second operand
>         had its definition forwarded by TER.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2015-03-05  Thomas Preud'homme  <thomas.preudhomme@arm.com>
>
>         PR tree-optimization/63743
>         * gcc.dg/pr63743.c: New test.
>
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 7dfe1f6..4fbc037 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -5117,13 +5117,11 @@ reorder_operands (basic_block bb)
>         continue;
>        /* Swap operands if the second one is more expensive.  */
>        def0 = get_gimple_for_ssa_name (op0);
> -      if (!def0)
> -       continue;
>        def1 = get_gimple_for_ssa_name (op1);
>        if (!def1)
>         continue;
>        swap = false;
> -      if (lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
> +      if (!def0 || lattice[gimple_uid (def1)] > lattice[gimple_uid (def0)])
>         swap = true;
>        if (swap)
>         {
> @@ -5132,7 +5130,7 @@ reorder_operands (basic_block bb)
>               fprintf (dump_file, "Swap operands in stmt:\n");
>               print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>               fprintf (dump_file, "Cost left opnd=%d, right opnd=%d\n",
> -                      lattice[gimple_uid (def0)],
> +                      def0 ? lattice[gimple_uid (def0)] : 0,
>                        lattice[gimple_uid (def1)]);
>             }
>           swap_ssa_operands (stmt, gimple_assign_rhs1_ptr (stmt),
> diff --git a/gcc/testsuite/gcc.dg/pr63743.c b/gcc/testsuite/gcc.dg/pr63743.c
> new file mode 100644
> index 0000000..87254ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr63743.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-rtl-expand-details" } */
> +
> +double
> +libcall_dep (double x, double y)
> +{
> +  return x * (x + y);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "Swap operands" 1 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
>
>
> Testsuite was run in QEMU when compiled by an arm-none-eabi GCC cross-compiler targeting Cortex-M3 and a bootstrapped x86_64 native GCC without any regression.
> CSiBE sees a -0.5034% code size decrease on arm-none-eabi and a 0.0058% code size increase on x86_64-linux-gnu.
>
> Is it ok for trunk (since it fixes a code size regression in 5.0)?

Ok.

Thanks,
Richard.

> Best regards,
>
> Thomas
>
>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-03-06 12:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-06 10:32 [PATCH] Fix PR63743: Incorrect ordering of operands in sequence of commutative operations Thomas Preud'homme
2015-03-06 12:13 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).