[Bug rtl-optimization/15792] missed subreg optimization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
@ 2006-01-18  4:45 ` pinskia at gcc dot gnu dot org
  2006-01-20 15:48 ` tony dot linthicum at amd dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-01-18  4:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2006-01-18 04:45 -------
The problem here is that we don't split up the subregister early before
register allocation.
If we split it up before combine, we would be able to combine the or and get
the more optimial results.

A patch like
http://gcc.gnu.org/ml/gcc-patches/2005-05/msg00554.html
should help.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
  2006-01-18  4:45 ` [Bug rtl-optimization/15792] missed subreg optimization pinskia at gcc dot gnu dot org
@ 2006-01-20 15:48 ` tony dot linthicum at amd dot com
  2006-01-20 15:52 ` pinskia at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: tony dot linthicum at amd dot com @ 2006-01-20 15:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from tony dot linthicum at amd dot com  2006-01-20 15:48 -------
I've been looking at this a bit, and tried the patch.  It does indeed fix the
problem in test1 above, but it does not appear to be the complete solution. 
The load of 'x' in test1 is actually split fairly early, and from what I can
tell, the  superfluous move is actually the result of the register allocator
doing a poor job of live range analysis when confronted with subregs.  I
suspect this is why most things (i.e. those things other than branches) are not
split into subregs until after reload.  Unfortunately, the subreg lowering
won't touch a subreg if it's seen a reference to the "inner" register so we get
the same unnecessary move if the code looks like:

foo(long long y, long long z) 
{
  unsigned long long x;

  x = y + z;
  if (x) gh();
}

I'm going to experiment with moving where the subreg lowering code occurs and
moving up the splitting into subregs and see if I can get the desired results. 
I'm pretty new to GCC, so if any of the above seems like I'm off in the weeds
then please let me know.




-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
  2006-01-18  4:45 ` [Bug rtl-optimization/15792] missed subreg optimization pinskia at gcc dot gnu dot org
  2006-01-20 15:48 ` tony dot linthicum at amd dot com
@ 2006-01-20 15:52 ` pinskia at gcc dot gnu dot org
  2006-02-02 18:18 ` ian at airs dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2006-01-20 15:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from pinskia at gcc dot gnu dot org  2006-01-20 15:52 -------
(In reply to comment #4)
> I'm going to experiment with moving where the subreg lowering code occurs and
> moving up the splitting into subregs and see if I can get the desired results. 
> I'm pretty new to GCC, so if any of the above seems like I'm off in the weeds
> then please let me know.

This seems right but the other issue is that register allocator allocates DI as
two consecutive register as one (that might be only part of the cause).


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tony dot linthicum at amd
                   |                            |dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2006-01-20 15:52 ` pinskia at gcc dot gnu dot org
@ 2006-02-02 18:18 ` ian at airs dot com
  2006-02-06 17:13 ` tony dot linthicum at amd dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: ian at airs dot com @ 2006-02-02 18:18 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from ian at airs dot com  2006-02-02 18:18 -------
With the version of RTH's subreg lowering pass which I am working on, I get
identical code for both functions:

test1:
        movl    8(%esp), %eax
        orl     4(%esp), %eax
        jne     .L7
        ret
        .p2align 4,,7
.L7:
        jmp     gh


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2006-02-02 18:18 ` ian at airs dot com
@ 2006-02-06 17:13 ` tony dot linthicum at amd dot com
  2006-02-07  0:30 ` ian at airs dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: tony dot linthicum at amd dot com @ 2006-02-06 17:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from tony dot linthicum at amd dot com  2006-02-06 17:13 -------
So do I, at least for the original code (i.e. test and test1).  I'm curious,
though, if you've tried the example that I listed above (foo).  I still get
subregs with that one, though I honestly don't recall at the moment whether or
not it makes the register allocator screw up or not (I *think* it does, but I'd
have to check).  Either way, though, the presence of the subregs provides the
needed fodder for RA badness so I'm curious if it's present in what you're
working on.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2006-02-06 17:13 ` tony dot linthicum at amd dot com
@ 2006-02-07  0:30 ` ian at airs dot com
  2006-02-07  8:23 ` ian at airs dot com
  2007-11-10  0:15 ` rask at gcc dot gnu dot org
  7 siblings, 0 replies; 12+ messages in thread
From: ian at airs dot com @ 2006-02-07  0:30 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #8 from ian at airs dot com  2006-02-07 00:30 -------
Yes, I still get an unnecessary move in your test case which uses addition.

One reason this happens is because the addition can not be split until after
the reload pass is complete.  That is because the add relies on the condition
code registers, but reload can clobber the condition code registers between any
arbitrary pair of instructions.

Another reason this happens is that the compiler knows how to set the condition
flags using a bitwise or, but it does so using a scratch register to hold the
destination of the bitwise or.  The register allocator is not clever enough to
see that if it has a DImode pair of registers which dies in the insn, that it
can use the second register in the DImode pair as the scratch register.  If the
register allocator saw that, then it could use that register as the scratch
register and avoid allocating a new scratch register and copying the value into
it.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2006-02-07  0:30 ` ian at airs dot com
@ 2006-02-07  8:23 ` ian at airs dot com
  2007-11-10  0:15 ` rask at gcc dot gnu dot org
  7 siblings, 0 replies; 12+ messages in thread
From: ian at airs dot com @ 2006-02-07  8:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from ian at airs dot com  2006-02-07 08:23 -------
I now have a reasonably simple reload patch which eliminates the unnecessary
move.  For the test case in comment #4, I get this code with -O2
-momit-leaf-frame-pointer:

foo:
        movl    12(%esp), %eax
        movl    16(%esp), %edx
        addl    4(%esp), %eax
        adcl    8(%esp), %edx
        orl     %eax, %edx
        jne     .L7
        rep ; ret
        .p2align 4,,7
.L7:
        jmp     gh


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2006-02-07  8:23 ` ian at airs dot com
@ 2007-11-10  0:15 ` rask at gcc dot gnu dot org
  7 siblings, 0 replies; 12+ messages in thread
From: rask at gcc dot gnu dot org @ 2007-11-10  0:15 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from rask at gcc dot gnu dot org  2007-11-10 00:15 -------
This was fixed in 4.3.0.


-- 

rask at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
           Keywords|                            |ra
      Known to fail|                            |4.1.2 4.2.0 4.2.1 4.2.2
      Known to work|                            |4.3.0
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-4@http.gcc.gnu.org/bugzilla/>
  2021-10-15  3:00 ` gabravier at gmail dot com
@ 2023-05-15  5:34 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-15  5:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Gabriel Ravier from comment #11)
> Seems like the issue is present again, except it's test1 that gets the
> better asm now. Perhaps this should be re-opened ?

This bug was about 32bit x86 and the code looks good in GCC 9, 10, 11, and 12
and the trunk.
If you were testing on x86_64, you need to use __int128_t to see what the
original issue was about:
void gh();
void test(__int128_t x) {
long g = (long)x|((long)(x>>64));
  if (g) gh();
}
void  test1(__int128_t x) {
  if (x) gh();
}

GCC 4.8+ produces:
test1:
        .cfi_startproc
        orq     %rdi, %rsi
        jne     .L7
        rep ret

For both. There was an extra mov in GCC 4.5.0-4.7.0 for test though. In GCC
4.4.0, test1 was two compare and jumps (ok). GCC 4.1.2 had the bad code
generation which was mentioned in comment #0.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
       [not found] <bug-15792-4@http.gcc.gnu.org/bugzilla/>
@ 2021-10-15  3:00 ` gabravier at gmail dot com
  2023-05-15  5:34 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 12+ messages in thread
From: gabravier at gmail dot com @ 2021-10-15  3:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792

Gabriel Ravier <gabravier at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gabravier at gmail dot com

--- Comment #11 from Gabriel Ravier <gabravier at gmail dot com> ---
Seems like the issue is present again, except it's test1 that gets the better
asm now. Perhaps this should be re-opened ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
  2004-06-03  4:27 [Bug rtl-optimization/15792] New: " pinskia at gcc dot gnu dot org
  2004-06-15 20:07 ` [Bug rtl-optimization/15792] " bangerth at dealii dot org
@ 2004-08-20 18:47 ` dann at godzilla dot ics dot uci dot edu
  1 sibling, 0 replies; 12+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2004-08-20 18:47 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dann at godzilla dot ics dot uci dot edu  2004-08-20 18:47 -------
(In reply to comment #1)
> Indeed. In test1, we get a completely bogus sequence: 
> 	movl	12(%ebp), %edx 
> 	movl	8(%ebp), %eax 
> 	movl	%edx, %ecx 
> 	orl	%eax, %ecx 
> What is the compiler thinking, moving data first into adx just to move 
> it further into ecx the next moment? 

This is a regression from gcc-3.0, the mov is not generated there:

        movl    16(%esp), %eax
        movl    20(%esp), %edx
        orl     %edx, %eax



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/15792] missed subreg optimization
  2004-06-03  4:27 [Bug rtl-optimization/15792] New: " pinskia at gcc dot gnu dot org
@ 2004-06-15 20:07 ` bangerth at dealii dot org
  2004-08-20 18:47 ` dann at godzilla dot ics dot uci dot edu
  1 sibling, 0 replies; 12+ messages in thread
From: bangerth at dealii dot org @ 2004-06-15 20:07 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bangerth at dealii dot org  2004-06-15 20:06 -------
Indeed. In test1, we get a completely bogus sequence: 
	movl	12(%ebp), %edx 
	movl	8(%ebp), %eax 
	movl	%edx, %ecx 
	orl	%eax, %ecx 
What is the compiler thinking, moving data first into adx just to move 
it further into ecx the next moment? 
 
W. 

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-06-15 20:07:00
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15792


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-05-15  5:34 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-15792-6528@http.gcc.gnu.org/bugzilla/>
2006-01-18  4:45 ` [Bug rtl-optimization/15792] missed subreg optimization pinskia at gcc dot gnu dot org
2006-01-20 15:48 ` tony dot linthicum at amd dot com
2006-01-20 15:52 ` pinskia at gcc dot gnu dot org
2006-02-02 18:18 ` ian at airs dot com
2006-02-06 17:13 ` tony dot linthicum at amd dot com
2006-02-07  0:30 ` ian at airs dot com
2006-02-07  8:23 ` ian at airs dot com
2007-11-10  0:15 ` rask at gcc dot gnu dot org
     [not found] <bug-15792-4@http.gcc.gnu.org/bugzilla/>
2021-10-15  3:00 ` gabravier at gmail dot com
2023-05-15  5:34 ` pinskia at gcc dot gnu.org
2004-06-03  4:27 [Bug rtl-optimization/15792] New: " pinskia at gcc dot gnu dot org
2004-06-15 20:07 ` [Bug rtl-optimization/15792] " bangerth at dealii dot org
2004-08-20 18:47 ` dann at godzilla dot ics dot uci dot edu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).