public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t)
@ 2011-09-09 12:12 wouter.vermaelen at scarlet dot be
  2012-01-23  0:24 ` [Bug rtl-optimization/50339] " svfuerst at gmail dot com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: wouter.vermaelen at scarlet dot be @ 2011-09-09 12:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

             Bug #: 50339
           Summary: suboptimal register allocation for abs(__int128_t)
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: wouter.vermaelen@scarlet.be


This function:

__int128_t abs128(__int128_t a)
{
    return (a >= 0) ? a : -a;
}

Currently generates the following code (with -O3):
(linux x86_64, g++-4.7.0, SVN revision 178692)

   49 89 f9                mov    %rdi,%r9
   48 89 f7                mov    %rsi,%rdi
   49 89 f2                mov    %rsi,%r10
   48 c1 ff 3f             sar    $0x3f,%rdi
   48 89 f8                mov    %rdi,%rax
   48 89 fa                mov    %rdi,%rdx
   4c 31 c8                xor    %r9,%rax
   4c 31 d2                xor    %r10,%rdx
   48 29 f8                sub    %rdi,%rax
   48 19 fa                sbb    %rdi,%rdx
   c3                      retq   

But the following has 2 'mov' instructions less:

   48 89 f8                mov    %rdi,%rax
   48 89 f2                mov    %rsi,%rdx
   48 89 d1                mov    %rdx,%rcx
   48 c1 f9 3f             sar    $0x3f,%rcx
   48 31 c8                xor    %rcx,%rax
   48 31 ca                xor    %rcx,%rdx
   48 29 c8                sub    %rcx,%rax
   48 19 ca                sbb    %rcx,%rdx
   c3                      retq


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
@ 2012-01-23  0:24 ` svfuerst at gmail dot com
  2012-11-02 14:34 ` glisse at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: svfuerst at gmail dot com @ 2012-01-23  0:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Steven Fuerst <svfuerst at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |svfuerst at gmail dot com

--- Comment #1 from Steven Fuerst <svfuerst at gmail dot com> 2012-01-22 23:44:55 UTC ---
It is possible to remove yet another mov instruction:

mov    %rdi,%rax
mov    %rsi,%rdx
sar    $0x3f,%rsi
xor    %rsi,%rax
xor    %rsi,%rdx
sub    %rsi,%rax
sbb    %rsi,%rdx
retq


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
  2012-01-23  0:24 ` [Bug rtl-optimization/50339] " svfuerst at gmail dot com
@ 2012-11-02 14:34 ` glisse at gcc dot gnu.org
  2013-02-15  8:58 ` [Bug rtl-optimization/50339] [4.8 Regression] " ubizjak at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-11-02 14:34 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

--- Comment #2 from Marc Glisse <glisse at gcc dot gnu.org> 2012-11-02 14:33:27 UTC ---
It looks even worse in 4.8:

    movq    %rdi, %r9
    movq    %rsi, %rdi
    movq    %rsi, %r10
    sarq    $63, %rdi
    movq    %rdi, %rcx
    xorq    %r9, %rcx
    movq    %rcx, %rax
    movq    %r10, %rcx
    xorq    %rdi, %rcx
    subq    %rdi, %rax
    movq    %rcx, %rdx
    sbbq    %rdi, %rdx
    ret


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] [4.8 Regression] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
  2012-01-23  0:24 ` [Bug rtl-optimization/50339] " svfuerst at gmail dot com
  2012-11-02 14:34 ` glisse at gcc dot gnu.org
@ 2013-02-15  8:58 ` ubizjak at gmail dot com
  2013-02-15  9:20 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ubizjak at gmail dot com @ 2013-02-15  8:58 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |ra
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-02-15
                 CC|                            |vmakarov at gcc dot gnu.org
   Target Milestone|---                         |4.8.0
            Summary|suboptimal register         |[4.8 Regression] suboptimal
                   |allocation for              |register allocation for
                   |abs(__int128_t)             |abs(__int128_t)
     Ever Confirmed|0                           |1

--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2013-02-15 08:57:41 UTC ---
(In reply to comment #2)
> It looks even worse in 4.8:

So, a regression from 4.7.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] [4.8 Regression] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (2 preceding siblings ...)
  2013-02-15  8:58 ` [Bug rtl-optimization/50339] [4.8 Regression] " ubizjak at gmail dot com
@ 2013-02-15  9:20 ` rguenth at gcc dot gnu.org
  2013-02-21 12:58 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-02-15  9:20 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
           Severity|enhancement                 |normal


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] [4.8 Regression] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (3 preceding siblings ...)
  2013-02-15  9:20 ` rguenth at gcc dot gnu.org
@ 2013-02-21 12:58 ` jakub at gcc dot gnu.org
  2013-02-21 21:28 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-02-21 12:58 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-02-21 12:57:40 UTC ---
Created attachment 29517
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29517
gcc48-pr50339.patch

Patch that improves 4.8 generated code to one insn better than what 4.7 did, by
lowering ASHIFTRT similarly how lower-subreg lowers ASHIFT and LSHIFTRT
already.

On this testcase the difference is between unpatched trunk and patched trunk:
-        movq    %rdi, %r9
-        movq    %rsi, %rdi
-        movq    %rsi, %r10
-        sarq    $63, %rdi
-        movq    %rdi, %rcx
-        xorq    %r9, %rcx
-        movq    %rcx, %rax
-        movq    %r10, %rcx
-        xorq    %rdi, %rcx
-        subq    %rdi, %rax
-        movq    %rcx, %rdx
-        sbbq    %rdi, %rdx
+        movq    %rsi, %rax
+        sarq    $63, %rax
+        movq    %rax, %r9
+        xorq    %rax, %rdi
+        xorq    %r9, %rsi
+        movq    %rdi, %rax
+        movq    %rsi, %rdx
+        subq    %r9, %rax
+        sbbq    %r9, %rdx

i.e. 4 moves instead of former 7 (no idea why RA chooses to do this shift on
%rax (i.e. first move %rsi to %rax, then shift %rax, then move %rax to %r9),
instead of copying %rsi to %r9 and shifting %r9, that would mean one less
move).
Even smaller code would probably need different expansion or much smarter
register allocation.

Anyway, also tested:
__int128_t
f1 (__int128_t a)
{
  return a >> 67;
}

__int128_t
f2 (__int128_t a)
{
  return a >> 64;
}

__int128_t
f3 (__int128_t a)
{
  return a >> 127;
}

__uint128_t
f4 (__uint128_t a)
{
  return a >> 67;
}

__uint128_t
f5 (__uint128_t a)
{
  return a >> 64;
}

__uint128_t
f6 (__uint128_t a)
{
  return a >> 127;
}

on x86_64 and the difference at -O2 is:
-        movq    %rsi, %rax
         movq    %rsi, %rdx
+        movq    %rsi, %rax
         sarq    $63, %rdx
         sarq    $3, %rax
for f1,
-        movq    %rsi, %rdx
         movq    %rsi, %rax
-        sarq    $63, %rdx
+        cqto
for f2 and
+        sarq    $63, %rsi
         movq    %rsi, %rdx
-        sarq    $63, %rdx
-        movq    %rdx, %rax
+        movq    %rsi, %rax
for f3, so either no pessimization, or small improvement.  On:
long long int
f1 (long long int a)
{
  return a >> 35;
}

long long int
f2 (long long int a)
{
  return a >> 32;
}

long long int
f3 (long long int a)
{
  return a >> 63;
}

unsigned long long int
f4 (unsigned long long int a)
{
  return a >> 35;
}

unsigned long long int
f5 (unsigned long long int a)
{
  return a >> 32;
}

unsigned long long int
f6 (unsigned long long int a)
{
  return a >> 63;
}

for -O2 -m32 the improvements are even better, for f1:
-        movl    8(%esp), %edx
-        movl    %edx, %eax
-        movl    %eax, %edx
-        sarl    $31, %edx
+        movl    8(%esp), %eax
+        cltd
         sarl    $3, %eax
and for f2:
-        movl    8(%esp), %edx
-        movl    %edx, %eax
-        movl    %eax, %edx
-        sarl    $31, %edx
+        movl    8(%esp), %eax
+        cltd
(no difference for f3).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] [4.8 Regression] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (4 preceding siblings ...)
  2013-02-21 12:58 ` jakub at gcc dot gnu.org
@ 2013-02-21 21:28 ` jakub at gcc dot gnu.org
  2013-02-21 21:36 ` [Bug rtl-optimization/50339] " jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-02-21 21:28 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-02-21 21:28:07 UTC ---
Author: jakub
Date: Thu Feb 21 21:28:03 2013
New Revision: 196214

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=196214
Log:
    PR rtl-optimization/50339
    * lower-subreg.h (struct lower_subreg_choices): Add splitting_ashiftrt
    field.
    * lower-subreg.c (compute_splitting_shift): Handle ASHIFTRT.
    (compute_costs): Call compute_splitting_shift also for ASHIFTRT
    into splitting_ashiftrt field.
    (find_decomposable_shift_zext, resolve_shift_zext): Handle also
    ASHIFTRT.
    (dump_choices): Fix up printing LSHIFTRT choices, print ASHIFTRT
    choices.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lower-subreg.c
    trunk/gcc/lower-subreg.h


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (5 preceding siblings ...)
  2013-02-21 21:28 ` jakub at gcc dot gnu.org
@ 2013-02-21 21:36 ` jakub at gcc dot gnu.org
  2013-03-22 14:48 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-02-21 21:36 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[4.8 Regression] suboptimal |suboptimal register
                   |register allocation for     |allocation for
                   |abs(__int128_t)             |abs(__int128_t)

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-02-21 21:36:17 UTC ---
Removing the regression tag, while the RA should be still improved, this is no
longer a regression.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (6 preceding siblings ...)
  2013-02-21 21:36 ` [Bug rtl-optimization/50339] " jakub at gcc dot gnu.org
@ 2013-03-22 14:48 ` jakub at gcc dot gnu.org
  2013-05-31 11:00 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-03-22 14:48 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.0                       |4.8.1

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> 2013-03-22 14:45:36 UTC ---
GCC 4.8.0 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (7 preceding siblings ...)
  2013-03-22 14:48 ` jakub at gcc dot gnu.org
@ 2013-05-31 11:00 ` jakub at gcc dot gnu.org
  2013-10-16  9:51 ` jakub at gcc dot gnu.org
  2015-06-22 14:26 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-05-31 11:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.1                       |4.8.2

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.1 has been released.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (8 preceding siblings ...)
  2013-05-31 11:00 ` jakub at gcc dot gnu.org
@ 2013-10-16  9:51 ` jakub at gcc dot gnu.org
  2015-06-22 14:26 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2013-10-16  9:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.2                       |4.8.3

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 4.8.2 has been released.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug rtl-optimization/50339] suboptimal register allocation for abs(__int128_t)
  2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
                   ` (9 preceding siblings ...)
  2013-10-16  9:51 ` jakub at gcc dot gnu.org
@ 2015-06-22 14:26 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-06-22 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50339

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.8.3                       |---


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-06-22 14:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-09 12:12 [Bug rtl-optimization/50339] New: suboptimal register allocation for abs(__int128_t) wouter.vermaelen at scarlet dot be
2012-01-23  0:24 ` [Bug rtl-optimization/50339] " svfuerst at gmail dot com
2012-11-02 14:34 ` glisse at gcc dot gnu.org
2013-02-15  8:58 ` [Bug rtl-optimization/50339] [4.8 Regression] " ubizjak at gmail dot com
2013-02-15  9:20 ` rguenth at gcc dot gnu.org
2013-02-21 12:58 ` jakub at gcc dot gnu.org
2013-02-21 21:28 ` jakub at gcc dot gnu.org
2013-02-21 21:36 ` [Bug rtl-optimization/50339] " jakub at gcc dot gnu.org
2013-03-22 14:48 ` jakub at gcc dot gnu.org
2013-05-31 11:00 ` jakub at gcc dot gnu.org
2013-10-16  9:51 ` jakub at gcc dot gnu.org
2015-06-22 14:26 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).