[Bug rtl-optimization/46514] New: 128-bit shifts on x86_64 generate silly code unless the shift amount is constant

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/46514] New: 128-bit shifts on x86_64 generate silly code unless the shift amount is constant
@ 2010-11-17  1:41 luto at mit dot edu
  2010-11-17 20:08 ` [Bug rtl-optimization/46514] " ubizjak at gmail dot com
  0 siblings, 1 reply; 2+ messages in thread
From: luto at mit dot edu @ 2010-11-17  1:41 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46514

           Summary: 128-bit shifts on x86_64 generate silly code unless
                    the shift amount is constant
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: luto@mit.edu


Created attachment 22428
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=22428
Preprocessed source

I'm using 4.5.1 (Fedora 14) with -O3, but -O2 does the same thing.

This really easy case:

uint64_t shift_test_31(__uint128_t x, uint32_t shift)
{
  if (shift != 31)
    __builtin_unreachable();

  return (uint64_t)(x >> shift);
}

generates:

0000000000000050 <shift_test_31>:
  50:   48 89 f8                mov    %rdi,%rax
  53:   48 0f ac f0 1f          shrd   $0x1f,%rsi,%rax
  58:   c3                      retq   
  59:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

which is entirely sensible.  But this:

uint64_t shift_test_le_31(__uint128_t x, uint32_t shift)
{
  if (shift >= 32)
    __builtin_unreachable();

  return (uint64_t)(x >> shift);
}

generates this:

0000000000000060 <shift_test_le_31>:
  60:   89 d1                   mov    %edx,%ecx
  62:   48 89 6c 24 f8          mov    %rbp,-0x8(%rsp)
  67:   48 89 f5                mov    %rsi,%rbp
  6a:   48 0f ad f7             shrd   %cl,%rsi,%rdi
  6e:   48 d3 ed                shr    %cl,%rbp
  71:   f6 c2 40                test   $0x40,%dl
  74:   48 89 5c 24 f0          mov    %rbx,-0x10(%rsp)
  79:   48 0f 45 fd             cmovne %rbp,%rdi
  7d:   48 8b 5c 24 f0          mov    -0x10(%rsp),%rbx
  82:   48 8b 6c 24 f8          mov    -0x8(%rsp),%rbp
  87:   48 89 f8                mov    %rdi,%rax
  8a:   c3                      retq   

which contains a pointless shr, test, and cmovne.  (Even if I change the
__builtin_unreachable() into a real branch, I get the same code.)


^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug rtl-optimization/46514] 128-bit shifts on x86_64 generate silly code unless the shift amount is constant
  2010-11-17  1:41 [Bug rtl-optimization/46514] New: 128-bit shifts on x86_64 generate silly code unless the shift amount is constant luto at mit dot edu
@ 2010-11-17 20:08 ` ubizjak at gmail dot com
  0 siblings, 0 replies; 2+ messages in thread
From: ubizjak at gmail dot com @ 2010-11-17 20:08 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46514

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2010.11.17 19:51:24
     Ever Confirmed|0                           |1

--- Comment #1 from Uros Bizjak <ubizjak at gmail dot com> 2010-11-17 19:51:24 UTC ---
This is how doubleword (TImode on x86_64 and DImode on x86_32 targets) shifts
are handled. Doubleword instructions are expanded to final instruction sequence
late after register allocation pass, so earlier optimization passes know that
they are processing SHIFT expressions and optimize them as shifts.

The expansion detects constant count operand and emits special sequence, but
for sure it can't detect limited set of possible count operands, and emits
universal sequence in this case.

That said, doubleword operation code is not the most optimized code around, on
the grounds that it is usually not used in performance critical part of the
application. Just try to avoid it as much as possible.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-11-17 19:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-17  1:41 [Bug rtl-optimization/46514] New: 128-bit shifts on x86_64 generate silly code unless the shift amount is constant luto at mit dot edu
2010-11-17 20:08 ` [Bug rtl-optimization/46514] " ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).