public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] PR rtl-optimization/111267: Improved forward propagation.
@ 2024-01-22  7:05 juzhe.zhong
  2024-01-22  7:14 ` Andrew Pinski
  0 siblings, 1 reply; 5+ messages in thread
From: juzhe.zhong @ 2024-01-22  7:05 UTC (permalink / raw)
  To: gcc-patches
  Cc: Kito.cheng, kito.cheng, jeffreyalaw, vineetg, Robin Dapp, palmer,
	Patrick O'Neill

[-- Attachment #1: Type: text/plain, Size: 4134 bytes --]

Hi, this patch causes these regressions (all ICEs) in RISC-V backend:

FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors) 



juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
  2024-01-22  7:05 [PATCH] PR rtl-optimization/111267: Improved forward propagation juzhe.zhong
@ 2024-01-22  7:14 ` Andrew Pinski
  2024-01-22  7:45   ` juzhe.zhong
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Pinski @ 2024-01-22  7:14 UTC (permalink / raw)
  To: juzhe.zhong
  Cc: gcc-patches, Kito.cheng, kito.cheng, jeffreyalaw, vineetg,
	Robin Dapp, palmer, Patrick O'Neill

On Sun, Jan 21, 2024 at 11:06 PM juzhe.zhong@rivai.ai
<juzhe.zhong@rivai.ai> wrote:
>
> Hi, this patch causes these regressions (all ICEs) in RISC-V backend:

Note this is related to the fix for PR 109092.

But right now register_operand still accepts `(SUBREG (MEM...))`
before reload ...
So basically there is a check against register_operand to then
reg_or_subregno is called
but since register_operand allows `(SUBREG (MEM())` but
reg_or_subregno  asserts on only having `REG` or `SUBREG(REG)`, things
go downhill.

What a mess.  I really wish `(SUBREG (MEM...))` handling would go away ...

Thanks,
Andrew Pinski

>
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors)
>
> ________________________________
> juzhe.zhong@rivai.ai

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
  2024-01-22  7:14 ` Andrew Pinski
@ 2024-01-22  7:45   ` juzhe.zhong
  0 siblings, 0 replies; 5+ messages in thread
From: juzhe.zhong @ 2024-01-22  7:45 UTC (permalink / raw)
  To: pinskia
  Cc: gcc-patches, Kito.cheng, kito.cheng, jeffreyalaw, vineetg,
	Robin Dapp, palmer, Patrick O'Neill

[-- Attachment #1: Type: text/plain, Size: 5226 bytes --]

OK I guess change register_operand into REG_P should fix those ICEs.

I will have a try and send a patch. 

Thanks.



juzhe.zhong@rivai.ai
 
From: Andrew Pinski
Date: 2024-01-22 15:14
To: juzhe.zhong@rivai.ai
CC: gcc-patches; Kito.cheng; kito.cheng; jeffreyalaw; vineetg; Robin Dapp; palmer; Patrick O'Neill
Subject: Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
On Sun, Jan 21, 2024 at 11:06 PM juzhe.zhong@rivai.ai
<juzhe.zhong@rivai.ai> wrote:
>
> Hi, this patch causes these regressions (all ICEs) in RISC-V backend:
 
Note this is related to the fix for PR 109092.
 
But right now register_operand still accepts `(SUBREG (MEM...))`
before reload ...
So basically there is a check against register_operand to then
reg_or_subregno is called
but since register_operand allows `(SUBREG (MEM())` but
reg_or_subregno  asserts on only having `REG` or `SUBREG(REG)`, things
go downhill.
 
What a mess.  I really wish `(SUBREG (MEM...))` handling would go away ...
 
Thanks,
Andrew Pinski
 
>
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg-2.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/float32-tg.c   -Os  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O1  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -O3 -g  (test for excess errors)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (internal compiler error: in reg_or_subregno, at jump.cc:1895)
> FAIL: gcc.dg/torture/pr48124-4.c   -Os  (test for excess errors)
>
> ________________________________
> juzhe.zhong@rivai.ai
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] PR rtl-optimization/111267: Improved forward propagation.
  2024-01-16  1:12 Roger Sayle
@ 2024-01-16  8:36 ` Richard Biener
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2024-01-16  8:36 UTC (permalink / raw)
  To: Roger Sayle, Richard Sandiford; +Cc: gcc-patches

On Tue, Jan 16, 2024 at 2:13 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch resolves PR rtl-optimization/111267 by improving RTL-level
> forward propagation.  This x86_64 code quality regression was caused
> (exposed) by my changes to improve how x86's (TImode) argument passing
> is represented at the RTL-level (reducing the use of SUBREGs to catch
> more optimization opportunities in combine).  The pitfall is that the
> more complex RTL representations expose a limitation in RTL's fwprop
> pass.
>
> At the heart of fwprop, in try_fwprop_subst_pattern, the logic can
> be summarized as three steps.  Step 1 is a heuristic that rejects the
> propagation attempt if the expression is too complex, step 2 calls
> the backend's recog to see if the propagated/simplified instruction
> is recognizable/valid, and step 3 then calls src_cost to compare
> the rtx costs of the replacement vs. the original, and accepts the
> transformation if the final cost is the same of better.
>
> The logic error (or missed optimization opportunity) is that the
> step 1 heuristic that attempts to predict (second guess) the
> process is flawed.  Ultimately the decision on whether to fwprop
> or not should depend solely on actual improvement, as measured
> by RTX costs.  Hence the prototype fix in the bugzilla PR removes
> the heuristic of calling prop.profitable_p entirely, relying
> entirely on the cost comparison in step 3.
>
> Unfortunately, things are a tiny bit more complicated.  The cost
> comparison in fwprop uses the older set_src_cost API and not the
> newer (preffered) insn_cost API as currently used in combine.
> This means that the cost improvement comparisons are only done
> for single_set instructions (more complex PARALLELs etc. aren't
> supported).  Hence we can only rely on skipping step 1 for that
> subset of instructions actually evaluated by step 3.
>
> The other subtlety is that to avoid potential infinite loops
> in fwprop we should only reply purely on rtx costs when the
> transformation is obviously an improvement.  If the replacement
> has the same cost as the original, we can use the prop.profitable_p
> test to preserve the current behavior.
>
> Finally, to answer Richard Biener's remaining question about this
> approach: yes, there is an asymmetry between how patterns are
> handled and how REG_EQUAL notes are handled.  For example, at
> the moment propagation into notes doesn't use rtx costs at all,
> and ultimately when fwprop is updated to use insn_cost, this
> (and recog) obviously isn't applicable to notes.  There's no reason
> the logic need be identical between patterns and notes, and during
> stage4 we only need update propagation into patterns to fix this
> P1 regression (notes and use of cost_insn can be done for GCC 15).
>
> For Jakub's reduced testcase:
>
> struct S { float a, b, c, d; };
> int bar (struct S x, struct S y) {
>   return x.b <= y.d && x.c >= y.a;
> }
>
> On x86_64-pc-linux-gnu with -O2 gcc currently generates:
>
> bar:    movq    %xmm2, %rdx
>         movq    %xmm3, %rax
>         movq    %xmm0, %rsi
>         xchgq   %rdx, %rax
>         movq    %rsi, %rcx
>         movq    %rax, %rsi
>         movq    %rdx, %rax
>         shrq    $32, %rcx
>         shrq    $32, %rax
>         movd    %ecx, %xmm4
>         movd    %eax, %xmm0
>         comiss  %xmm4, %xmm0
>         jb      .L6
>         movd    %esi, %xmm0
>         xorl    %eax, %eax
>         comiss  %xmm0, %xmm1
>         setnb   %al
>         ret
> .L6:    xorl    %eax, %eax
>         ret
>
> with this simple patch to fwprop, we now generate:
>
> bar:    shufps  $85, %xmm0, %xmm0
>         shufps  $85, %xmm3, %xmm3
>         comiss  %xmm0, %xmm3
>         jb      .L6
>         xorl    %eax, %eax
>         comiss  %xmm2, %xmm1
>         setnb   %al
>         ret
> .L6:    xorl    %eax, %eax
>         ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Additionally, it also resolves the FAIL for
> gcc.target/i386/pr82580.c.  Ok for mainline?

Thanks for the detailed explanation.  I think this deserves a comment
about the later costing before

-  if (!prop.profitable_p ())
+  if (!prop.profitable_p ()
+      && (prop.changed_mem_p ()
+         || use_insn->is_asm ()
+         || !single_set (use_rtl)))

(and generally the profitable_p method should better be called
sth like likely_profitable_p?)

I'd say OK with the additional comment added but please give Richard S.
a chance to comment.

Thanks,
Richard.

>
> 2024-01-16  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR rtl-optimization/111267
>         * fwprop.cc (try_fwprop_subst_pattern): Only bail-out early when
>         !prop.profitable_p for instructions that are not single sets.
>         When comparing costs, bail-out if the cost is unchanged and
>         !prop.profitable_p.
>
> gcc/testsuite/ChangeLog
>         PR rtl-optimization/111267
>         * gcc.target/i386/pr111267.c: New test case.
>
>
> Thanks in advance (and to Jeff Law for his guidance/help),
> Roger
> --
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] PR rtl-optimization/111267: Improved forward propagation.
@ 2024-01-16  1:12 Roger Sayle
  2024-01-16  8:36 ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Roger Sayle @ 2024-01-16  1:12 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4401 bytes --]


This patch resolves PR rtl-optimization/111267 by improving RTL-level
forward propagation.  This x86_64 code quality regression was caused
(exposed) by my changes to improve how x86's (TImode) argument passing
is represented at the RTL-level (reducing the use of SUBREGs to catch
more optimization opportunities in combine).  The pitfall is that the
more complex RTL representations expose a limitation in RTL's fwprop
pass.

At the heart of fwprop, in try_fwprop_subst_pattern, the logic can
be summarized as three steps.  Step 1 is a heuristic that rejects the
propagation attempt if the expression is too complex, step 2 calls
the backend's recog to see if the propagated/simplified instruction
is recognizable/valid, and step 3 then calls src_cost to compare
the rtx costs of the replacement vs. the original, and accepts the
transformation if the final cost is the same of better.

The logic error (or missed optimization opportunity) is that the
step 1 heuristic that attempts to predict (second guess) the
process is flawed.  Ultimately the decision on whether to fwprop
or not should depend solely on actual improvement, as measured
by RTX costs.  Hence the prototype fix in the bugzilla PR removes
the heuristic of calling prop.profitable_p entirely, relying
entirely on the cost comparison in step 3.

Unfortunately, things are a tiny bit more complicated.  The cost
comparison in fwprop uses the older set_src_cost API and not the
newer (preffered) insn_cost API as currently used in combine.
This means that the cost improvement comparisons are only done
for single_set instructions (more complex PARALLELs etc. aren't
supported).  Hence we can only rely on skipping step 1 for that
subset of instructions actually evaluated by step 3.

The other subtlety is that to avoid potential infinite loops
in fwprop we should only reply purely on rtx costs when the
transformation is obviously an improvement.  If the replacement
has the same cost as the original, we can use the prop.profitable_p
test to preserve the current behavior.

Finally, to answer Richard Biener's remaining question about this
approach: yes, there is an asymmetry between how patterns are
handled and how REG_EQUAL notes are handled.  For example, at
the moment propagation into notes doesn't use rtx costs at all,
and ultimately when fwprop is updated to use insn_cost, this
(and recog) obviously isn't applicable to notes.  There's no reason
the logic need be identical between patterns and notes, and during
stage4 we only need update propagation into patterns to fix this
P1 regression (notes and use of cost_insn can be done for GCC 15).

For Jakub's reduced testcase:

struct S { float a, b, c, d; };
int bar (struct S x, struct S y) {
  return x.b <= y.d && x.c >= y.a;
}

On x86_64-pc-linux-gnu with -O2 gcc currently generates:

bar:    movq    %xmm2, %rdx
        movq    %xmm3, %rax
        movq    %xmm0, %rsi
        xchgq   %rdx, %rax
        movq    %rsi, %rcx
        movq    %rax, %rsi
        movq    %rdx, %rax
        shrq    $32, %rcx
        shrq    $32, %rax
        movd    %ecx, %xmm4
        movd    %eax, %xmm0
        comiss  %xmm4, %xmm0
        jb      .L6
        movd    %esi, %xmm0
        xorl    %eax, %eax
        comiss  %xmm0, %xmm1
        setnb   %al
        ret
.L6:    xorl    %eax, %eax
        ret

with this simple patch to fwprop, we now generate:

bar:    shufps  $85, %xmm0, %xmm0
        shufps  $85, %xmm3, %xmm3
        comiss  %xmm0, %xmm3
        jb      .L6
        xorl    %eax, %eax
        comiss  %xmm2, %xmm1
        setnb   %al
        ret
.L6:    xorl    %eax, %eax
        ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Additionally, it also resolves the FAIL for
gcc.target/i386/pr82580.c.  Ok for mainline?


2024-01-16  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR rtl-optimization/111267
        * fwprop.cc (try_fwprop_subst_pattern): Only bail-out early when
        !prop.profitable_p for instructions that are not single sets.
        When comparing costs, bail-out if the cost is unchanged and
        !prop.profitable_p.

gcc/testsuite/ChangeLog
        PR rtl-optimization/111267
        * gcc.target/i386/pr111267.c: New test case.


Thanks in advance (and to Jeff Law for his guidance/help),
Roger
--


[-- Attachment #2: patchfw.txt --]
[-- Type: text/plain, Size: 1587 bytes --]

diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0c588f8..f06225a 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -449,7 +449,10 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change,
   if (prop.num_replacements == 0)
     return false;
 
-  if (!prop.profitable_p ())
+  if (!prop.profitable_p ()
+      && (prop.changed_mem_p ()
+	  || use_insn->is_asm ()
+	  || !single_set (use_rtl)))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "cannot propagate from insn %d into"
@@ -481,7 +484,8 @@ try_fwprop_subst_pattern (obstack_watermark &attempt, insn_change &use_change,
 	redo_changes (0);
 	auto new_cost = set_src_cost (SET_SRC (use_set),
 				      GET_MODE (SET_DEST (use_set)), speed);
-	if (new_cost > old_cost)
+	if (new_cost > old_cost
+            || (new_cost == old_cost && !prop.profitable_p ()))
 	  {
 	    if (dump_file)
 	      fprintf (dump_file, "change not profitable"
diff --git a/gcc/testsuite/gcc.target/i386/pr111267.c b/gcc/testsuite/gcc.target/i386/pr111267.c
new file mode 100644
index 0000000..e3d549d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr111267.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+struct S { float a, b, c, d; };
+
+int
+bar (struct S x, struct S y)
+{
+  return x.b <= y.d && x.c >= y.a;
+}
+
+/* { dg-final { scan-assembler-not "movq" } } */
+/* { dg-final { scan-assembler-not "xchgq" } } */
+/* { dg-final { scan-assembler-not "shrq" } } */
+/* { dg-final { scan-assembler-not "movd" } } */

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-22  7:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22  7:05 [PATCH] PR rtl-optimization/111267: Improved forward propagation juzhe.zhong
2024-01-22  7:14 ` Andrew Pinski
2024-01-22  7:45   ` juzhe.zhong
  -- strict thread matches above, loose matches on Subject: below --
2024-01-16  1:12 Roger Sayle
2024-01-16  8:36 ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).