[Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12
@ 2024-04-04 18:57 absoler at smail dot nju.edu.cn
  2024-04-04 19:03 ` [Bug target/114591] " pinskia at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: absoler at smail dot nju.edu.cn @ 2024-04-04 18:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

            Bug ID: 114591
           Summary: rtl-reload introduce an extra load operation since
                    gcc-12
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: absoler at smail dot nju.edu.cn
  Target Milestone: ---

Hi, I found such a case:
```
unsigned v1;
long long v2;

short func_1() {
    v2 = v1;
    return v2;
}
```
gcc-12 and gcc-13 would produce under -O2:
```
func_1:
 mov    0x0(%rip),%eax        # v1
 mov    %rax,0x0(%rip)        # v2
 movzwl 0x0(%rip),%eax        # v1

```
and gcc-11's:

```
func_1:
 mov    0x0(%rip),%edx        # v1
 mov    %rdx,0x0(%rip)        # v2
 mov    %rdx,%rax
```

I guess the latter is better? the second load of `v1` was introduced in
RTL-reload pass, maybe this pessimize the performance.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] rtl-reload introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
@ 2024-04-04 19:03 ` pinskia at gcc dot gnu.org
  2024-04-04 19:07 ` [Bug target/114591] [12/13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-04 19:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-04-04
             Target|                            |x86_64
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
          Component|rtl-optimization            |target

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
IRA in GCC 12+ has:
```
  Loop 0 (parent -1, header bb2, depth 0)
    bbs: 2
    all: 0r85 1r82
    modified regnos: 82 85
    border:
    Pressure: GENERAL_REGS=1
    Hard reg set forest:
      0:( 0-6 8-15 20-51)@0
        1:( 0-6 36-43)@26000
      Spill a1(r82,l0)
      Allocno a0r85 of GENERAL_REGS(15) has 15 avail. regs  0-6 36-43, node: 
0-6 36-43 (confl regs =  7-35 44-75)
      Forming thread from colorable bucket:
        Forming thread by copy 0:a0r85-a1r82 (freq=1000):
          Result (freq=5000): a0r85(2000) a1r82(3000)
      Pushing a0(r85,l0)(cost 0)
      Popping a0(r85,l0)  --         assign reg 0
Disposition:
    1:r82  l0   mem    0:r85  l0     0
New iteration of spill/restore move
+++Costs: overall 1000, reg -1000, mem 2000, ld 0, st 0, move 0
+++       move loops 0, new jumps 0
```

While before it was:
```
  Loop 0 (parent -1, header bb2, depth 0)
    bbs: 2
    all: 0r85 1r82
    modified regnos: 82 85
    border:
    Pressure: GENERAL_REGS=1
    Hard reg set forest:
      0:( 0-6 8-15 20-51)@0
        1:( 0-6 36-43)@46000
      Allocno a0r85 of GENERAL_REGS(15) has 15 avail. regs  0-6 36-43, node: 
0-6 36-43 (confl regs =  7-35 44-75)
      Allocno a1r82 of GENERAL_REGS(15) has 15 avail. regs  0-6 36-43, node: 
0-6 36-43 (confl regs =  7-35 44-75)
      Forming thread from colorable bucket:
        Forming thread by copy 0:a0r85-a1r82 (freq=1000):
          Result (freq=5000): a0r85(2000) a1r82(3000)
      Pushing a0(r85,l0)(cost 0)
      Pushing a1(r82,l0)(cost 0)
      Popping a1(r82,l0)  --         assign reg 0
      Popping a0(r85,l0)  --         assign reg 0
Disposition:
    1:r82  l0     0    0:r85  l0     0
New iteration of spill/restore move
+++Costs: overall 5000, reg 5000, mem 0, ld 0, st 0, move 0
+++       move loops 0, new jumps 0
```

Notice: `      Spill a1(r82,l0)` in GCC 12+

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] rtl-reload introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
  2024-04-04 19:03 ` [Bug target/114591] " pinskia at gcc dot gnu.org
@ 2024-04-04 19:07 ` pinskia at gcc dot gnu.org
  2024-04-05  2:32 ` [Bug target/114591] [12/13/14 Regression] register allocators " law at gcc dot gnu.org
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-04 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.4
            Summary|rtl-reload introduce an     |[12/13/14 Regression]
                   |extra load operation since  |rtl-reload introduce an
                   |gcc-12                      |extra load operation since
                   |                            |gcc-12

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
  2024-04-04 19:03 ` [Bug target/114591] " pinskia at gcc dot gnu.org
  2024-04-04 19:07 ` [Bug target/114591] [12/13/14 Regression] " pinskia at gcc dot gnu.org
@ 2024-04-05  2:32 ` law at gcc dot gnu.org
  2024-04-08 15:02 ` jakub at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: law at gcc dot gnu.org @ 2024-04-05  2:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |law at gcc dot gnu.org
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (2 preceding siblings ...)
  2024-04-05  2:32 ` [Bug target/114591] [12/13/14 Regression] register allocators " law at gcc dot gnu.org
@ 2024-04-08 15:02 ` jakub at gcc dot gnu.org
  2024-04-10  7:51 ` ubizjak at gmail dot com
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-08 15:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This changed with r12-5584-gca5667e867252db3c8642ee90f55427149cd92b6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (3 preceding siblings ...)
  2024-04-08 15:02 ` jakub at gcc dot gnu.org
@ 2024-04-10  7:51 ` ubizjak at gmail dot com
  2024-04-10  8:17 ` liuhongt at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2024-04-10  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #3 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #2)
> This changed with r12-5584-gca5667e867252db3c8642ee90f55427149cd92b6

Strange, if I revert the constraints to the previous setting with: 

--cut here--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 10ae3113ae8..262dd25a8e0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2870,9 +2870,9 @@ (define_peephole2

 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand"
-    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*Yv,*v,*v,jm,m")
+    "=r,r,r,m ,*k,*k ,*r ,*m ,*k ,?r,?v,*Yv,*v,*v,*jm,*m")
        (match_operand:HI 1 "general_operand"
-    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C  ,*v,m ,*x,*v"))]
+    "r ,n,m,rn,*r ,*km,*k,*k,CBC,v,r  ,C  ,v,m ,x,v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
--cut here--

I still get:

        movl    v1(%rip), %eax  # 6     [c=6 l=6]  *zero_extendsidi2/3
        movq    %rax, v2(%rip)  # 16    [c=4 l=7]  *movdi_internal/5
        movzwl  v1(%rip), %eax  # 7     [c=5 l=7]  *movhi_internal/2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (4 preceding siblings ...)
  2024-04-10  7:51 ` ubizjak at gmail dot com
@ 2024-04-10  8:17 ` liuhongt at gcc dot gnu.org
  2024-04-10  8:30 ` liuhongt at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-10  8:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |liuhongt at gcc dot gnu.org

--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #3)
> (In reply to Jakub Jelinek from comment #2)
> > This changed with r12-5584-gca5667e867252db3c8642ee90f55427149cd92b6
> 
> Strange, if I revert the constraints to the previous setting with: 
> 
> --cut here--
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 10ae3113ae8..262dd25a8e0 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -2870,9 +2870,9 @@ (define_peephole2
>  
>  (define_insn "*movhi_internal"
>    [(set (match_operand:HI 0 "nonimmediate_operand"
> -    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*Yv,*v,*v,jm,m")
> +    "=r,r,r,m ,*k,*k ,*r ,*m ,*k ,?r,?v,*Yv,*v,*v,*jm,*m")
>         (match_operand:HI 1 "general_operand"
> -    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C  ,*v,m ,*x,*v"))]
> +    "r ,n,m,rn,*r ,*km,*k,*k,CBC,v,r  ,C  ,v,m ,x,v"))]
>    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
>     && ix86_hardreg_mov_ok (operands[0], operands[1])"
>  {
> --cut here--
> 
> I still get:
> 
>         movl    v1(%rip), %eax  # 6     [c=6 l=6]  *zero_extendsidi2/3
>         movq    %rax, v2(%rip)  # 16    [c=4 l=7]  *movdi_internal/5
>         movzwl  v1(%rip), %eax  # 7     [c=5 l=7]  *movhi_internal/2

My experience is memory cost for the operand with rm or separate r, m is
different which impacts RA decision.

https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (5 preceding siblings ...)
  2024-04-10  8:17 ` liuhongt at gcc dot gnu.org
@ 2024-04-10  8:30 ` liuhongt at gcc dot gnu.org
  2024-04-10  8:36 ` ubizjak at gmail dot com
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-10  8:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #5 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> My experience is memory cost for the operand with rm or separate r, m is
> different which impacts RA decision.
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html

Change operands[1] alternative 2 from m -> rm, then RA makes perfect decision.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (6 preceding siblings ...)
  2024-04-10  8:30 ` liuhongt at gcc dot gnu.org
@ 2024-04-10  8:36 ` ubizjak at gmail dot com
  2024-04-10  8:40 ` ubizjak at gmail dot com
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2024-04-10  8:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
LRA starts with this:

    5: r98:SI=[`v1']
      REG_EQUIV [`v1']
    6: [`v2']=zero_extend(r98:SI)
    7: r101:HI=r98:SI#0
      REG_DEAD r98:SI
   12: ax:HI=r101:HI
      REG_DEAD r101:HI
   13: use ax:HI

then decides that:

      Removing equiv init insn 5 (freq=1000)
    5: r98:SI=[`v1']
      REG_EQUIV [`v1']

and substitutes all follow-up usages of r98 with a memory access. In insn 6, we
have:

(mem/c:SI (symbol_ref:DI ("v1")))

while in insn 7 we have:

(mem/c:HI (symbol_ref:DI ("v1")))

It looks that different modes of memory read confuse LRA to not CSE the read.

IMO, if the preloaded value is later accessed in different modes, LRA should
leave it. Alternatively, LRA should CSE memory accesses in different modes.

Cc LRA expert ... oh, he already is in the loop.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (7 preceding siblings ...)
  2024-04-10  8:36 ` ubizjak at gmail dot com
@ 2024-04-10  8:40 ` ubizjak at gmail dot com
  2024-04-10  8:47 ` ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2024-04-10  8:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao Liu from comment #5)
> > My experience is memory cost for the operand with rm or separate r, m is
> > different which impacts RA decision.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html
> 
> Change operands[1] alternative 2 from m -> rm, then RA makes perfect
> decision.

Oh, you are also the author of the above patch ;)

Can you please take the issue from here and perhaps review other x86 patterns
for unoptimal constraints? I was always under impression that rm and separate
"r,m" are treated in the same way...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (8 preceding siblings ...)
  2024-04-10  8:40 ` ubizjak at gmail dot com
@ 2024-04-10  8:47 ` ubizjak at gmail dot com
  2024-04-10  8:52 ` liuhongt at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2024-04-10  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #8 from Uroš Bizjak <ubizjak at gmail dot com> ---
BTW: The reason for the original change:

 (define_insn "*movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=r,r ,r ,m ,*k,*k
,*r,*m,*k,?r,?v,*v,*v,*m")
-       (match_operand:HI 1 "general_operand"      "r
,rn,rm,rn,*r,*km,*k,*k,CBC,v, r, v, m, v"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand"
+    "=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+       (match_operand:HI 1 "general_operand"
+    "r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*v"))]

was that (r,r) overrides (r,rn) and (r,rm), so the later two can be changed
(without introducing any side effect) to (r,n) and (r,m), since (reg,reg) is
always matched by the (r,r) constraint. The different treatment of the changed
later two patterns is confusing at least.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (9 preceding siblings ...)
  2024-04-10  8:47 ` ubizjak at gmail dot com
@ 2024-04-10  8:52 ` liuhongt at gcc dot gnu.org
  2024-04-10  9:07 ` ubizjak at gmail dot com
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-10  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #9 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> 
> It looks that different modes of memory read confuse LRA to not CSE the read.
> 
> IMO, if the preloaded value is later accessed in different modes, LRA should
> leave it. Alternatively, LRA should CSE memory accesses in different modes.

(insn 7 6 12 2 (set (reg:HI 101 [ _5 ])
        (subreg:HI (reg:SI 98 [ v1.0_1 ]) 0)) "test.c":6:12 86
{*movhi_internal}
     (expr_list:REG_DEAD (reg:SI 98 [ v1.0_1 ])
        (nil)))

May be we should reduce cost from simple move instruction(with subreg?) when
calculating total_cost, since it's probably be eliminated by later rtl
optimization.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (10 preceding siblings ...)
  2024-04-10  8:52 ` liuhongt at gcc dot gnu.org
@ 2024-04-10  9:07 ` ubizjak at gmail dot com
  2024-04-10  9:12 ` liuhongt at gcc dot gnu.org
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2024-04-10  9:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #10 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao Liu from comment #5)
> > My experience is memory cost for the operand with rm or separate r, m is
> > different which impacts RA decision.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595573.html
> 
> Change operands[1] alternative 2 from m -> rm, then RA makes perfect
> decision.

Yes, I can confirm this oddity:

        movl    v1(%rip), %edx  # 5     [c=6 l=6]  *zero_extendsidi2/3
        movq    %rdx, v2(%rip)  # 16    [c=4 l=7]  *movdi_internal/5
        movq    %rdx, %rax      # 18    [c=4 l=3]  *movdi_internal/3
        ret             # 21    [c=0 l=1]  simple_return_internal

But even there is room for improvement. The last move can be eliminated by
allocating %eax in the first instruction.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (11 preceding siblings ...)
  2024-04-10  9:07 ` ubizjak at gmail dot com
@ 2024-04-10  9:12 ` liuhongt at gcc dot gnu.org
  2024-04-11  6:33 ` liuhongt at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-10  9:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #11 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
unsigned v;
long long v2;
char foo ()
{
    v2 = v;
    return v;
}

This is related to *movqi_internal, and codegen has been worse since gcc8.1

foo:
        movl    v(%rip), %eax
        movq    %rax, v2(%rip)
        movzbl  v(%rip), %eax
        ret

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (12 preceding siblings ...)
  2024-04-10  9:12 ` liuhongt at gcc dot gnu.org
@ 2024-04-11  6:33 ` liuhongt at gcc dot gnu.org
  2024-04-11  6:54 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-11  6:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #12 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
short a;
short c;
short d;
void
foo (short b, short f)
{
  c = b + a;
  d = f + a;
}

foo(short, short):
        addw    a(%rip), %di
        addw    a(%rip), %si
        movw    %di, c(%rip)
        movw    %si, d(%rip)
        ret

this one is bad since gcc10.1 and there's no subreg, The problem is if the
operand is used by more than 1 insn, and they all support separate m
constraint, mem_cost is quite small(just 1, reg move cost is 2), and this makes
RA more inclined to propagate memory across insns. I guess RA assumes the
separate m means the insn only support memory_operand?

 961                  if (op_class == NO_REGS)
 962                    /* Although we don't need insn to reload from
 963                       memory, still accessing memory is usually more
 964                       expensive than a register.  */
 965                    pp->mem_cost = frequency;
 966                  else

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (13 preceding siblings ...)
  2024-04-11  6:33 ` liuhongt at gcc dot gnu.org
@ 2024-04-11  6:54 ` ubizjak at gmail dot com
  2024-04-11  7:08 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: ubizjak at gmail dot com @ 2024-04-11  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #13 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao Liu from comment #12)
> short a;
> short c;
> short d;
> void
> foo (short b, short f)
> {
>   c = b + a;
>   d = f + a;
> }
> 
> foo(short, short):
>         addw    a(%rip), %di
>         addw    a(%rip), %si
>         movw    %di, c(%rip)
>         movw    %si, d(%rip)
>         ret
> 
> this one is bad since gcc10.1 and there's no subreg, The problem is if the
> operand is used by more than 1 insn, and they all support separate m
> constraint, mem_cost is quite small(just 1, reg move cost is 2), and this
> makes RA more inclined to propagate memory across insns. I guess RA assumes
> the separate m means the insn only support memory_operand?

I don't see this as problematic. IIRC, there was a discussion in the past that
a couple (two?) memory accesses from the same location close to each other can
be faster (so, -O2, not -Os) than preloading the value to the register first.

In contrast, the example from the Comment #11 already has the correct value in
%eax, so there is no need to reload it again from memory, even in a narrower
mode.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (14 preceding siblings ...)
  2024-04-11  6:54 ` ubizjak at gmail dot com
@ 2024-04-11  7:08 ` pinskia at gcc dot gnu.org
  2024-04-11  7:28 ` liuhongt at gcc dot gnu.org
  2024-04-11  7:37 ` liuhongt at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-04-11  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #14 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #13)
> (In reply to Hongtao Liu from comment #12)
> > short a;
> > short c;
> > short d;
> > void
> > foo (short b, short f)
> > {
> >   c = b + a;
> >   d = f + a;
> > }
> > 
> > foo(short, short):
> >         addw    a(%rip), %di
> >         addw    a(%rip), %si
> >         movw    %di, c(%rip)
> >         movw    %si, d(%rip)
> >         ret
> > 
> > this one is bad since gcc10.1 and there's no subreg, The problem is if the
> > operand is used by more than 1 insn, and they all support separate m
> > constraint, mem_cost is quite small(just 1, reg move cost is 2), and this
> > makes RA more inclined to propagate memory across insns. I guess RA assumes
> > the separate m means the insn only support memory_operand?
> 
> I don't see this as problematic. IIRC, there was a discussion in the past
> that a couple (two?) memory accesses from the same location close to each
> other can be faster (so, -O2, not -Os) than preloading the value to the
> register first.

Someone just filed a similar issue to the above testcase (the one in comment
#12) as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114688 :).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (15 preceding siblings ...)
  2024-04-11  7:08 ` pinskia at gcc dot gnu.org
@ 2024-04-11  7:28 ` liuhongt at gcc dot gnu.org
  2024-04-11  7:37 ` liuhongt at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-11  7:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #15 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> I don't see this as problematic. IIRC, there was a discussion in the past
> that a couple (two?) memory accesses from the same location close to each
> other can be faster (so, -O2, not -Os) than preloading the value to the
> register first.
At lease for memory with vector mode, it's better to preload the value to
register first.
> 
> In contrast, the example from the Comment #11 already has the correct value
> in %eax, so there is no need to reload it again from memory, even in a
> narrower mode.

So the problem is why cse can't handle same memory with narrower mode, maybe
it's because there's zero_extend in the first load. cse looks like can handle
simple wider mode memory.

4952      /* See if a MEM has already been loaded with a widening operation;
4953         if it has, we can use a subreg of that.  Many CISC machines
4954         also have such operations, but this is only likely to be
4955         beneficial on these machines.  */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12
  2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
                   ` (16 preceding siblings ...)
  2024-04-11  7:28 ` liuhongt at gcc dot gnu.org
@ 2024-04-11  7:37 ` liuhongt at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-04-11  7:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114591

--- Comment #16 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---

> 
> 4952      /* See if a MEM has already been loaded with a widening operation;
> 4953         if it has, we can use a subreg of that.  Many CISC machines
> 4954         also have such operations, but this is only likely to be
> 4955         beneficial on these machines.  */

Oh, it's pre_reload cse_insn, not postreload gcse

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-04-11  7:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 18:57 [Bug rtl-optimization/114591] New: rtl-reload introduce an extra load operation since gcc-12 absoler at smail dot nju.edu.cn
2024-04-04 19:03 ` [Bug target/114591] " pinskia at gcc dot gnu.org
2024-04-04 19:07 ` [Bug target/114591] [12/13/14 Regression] " pinskia at gcc dot gnu.org
2024-04-05  2:32 ` [Bug target/114591] [12/13/14 Regression] register allocators " law at gcc dot gnu.org
2024-04-08 15:02 ` jakub at gcc dot gnu.org
2024-04-10  7:51 ` ubizjak at gmail dot com
2024-04-10  8:17 ` liuhongt at gcc dot gnu.org
2024-04-10  8:30 ` liuhongt at gcc dot gnu.org
2024-04-10  8:36 ` ubizjak at gmail dot com
2024-04-10  8:40 ` ubizjak at gmail dot com
2024-04-10  8:47 ` ubizjak at gmail dot com
2024-04-10  8:52 ` liuhongt at gcc dot gnu.org
2024-04-10  9:07 ` ubizjak at gmail dot com
2024-04-10  9:12 ` liuhongt at gcc dot gnu.org
2024-04-11  6:33 ` liuhongt at gcc dot gnu.org
2024-04-11  6:54 ` ubizjak at gmail dot com
2024-04-11  7:08 ` pinskia at gcc dot gnu.org
2024-04-11  7:28 ` liuhongt at gcc dot gnu.org
2024-04-11  7:37 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).