[Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code
@ 2012-12-04  0:06 mtkilpailut at torni dot org
  2012-12-04  0:08 ` [Bug rtl-optimization/55583] " mtkilpailut at torni dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: mtkilpailut at torni dot org @ 2012-12-04  0:06 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

             Bug #: 55583
           Summary: Extended shift instruction on x86-64 is not used,
                    producing unoptimal code
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: mtkilpailut@torni.org


Created attachment 28866
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28866
Source code demonstrating bad code generation

On x86-64, extended shift instruction is not generated for some reason.
Combined with other problems this creates very bad code.

Test functions included for signed and unsigned 16,32,64-bit types for both
left and right shifts and for constant n and function parameter n.

Code of this form:
  unsigned int a, b; const int n = 2;
  void test32l (void) { b = (b << n) | (a >> (32 - n)); }

expected code:
  mov     a(%rip),%eax
  shld    $0x2,%eax,b(%rip)
  ret

produced code:
  mov    b(%rip), %edx   ; Size of register used here depends on gcc version
  mov    a(%rip), %eax   ; Size of register used here depends on gcc version
  sal    $2, %edx        ; Size of register used here depends on gcc version
  shr    $25, %eax
  or     %edx, %eax
  mov    %eax, b(%rip)
  ret


Tested with:
COLLECT_GCC_OPTIONS='-v' '-c' '-save-temps' '-O2' '-Wall' '-W' '-o'
'gcc_shld_not_used' '-mtune=generic'

I tried gcc versions:
GNU C (Debian 4.7.2-4) version 4.7.2 (x86_64-linux-gnu)
GNU C (Debian 4.6.3-11) version 4.6.3 (x86_64-linux-gnu)
GNU C (Debian 4.5.3-9) version 4.5.3 (x86_64-linux-gnu)
GNU C (Debian 4.4.7-2) version 4.4.7 (x86_64-linux-gnu)
GNU C (GCC) version 4.8.0 20121203 (experimental) [trunk revision 194106]
(x86_64-unknown-linux-gnu)

All produce the same code modulo register size differences mentioned above. gcc
HEAD changes sal to leal (,%rcx,4),%eax


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
@ 2012-12-04  0:08 ` mtkilpailut at torni dot org
  2012-12-04  0:21 ` hjl.tools at gmail dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: mtkilpailut at torni dot org @ 2012-12-04  0:08 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

--- Comment #1 from Mikko Markus Torni <mtkilpailut at torni dot org> 2012-12-04 00:08:21 UTC ---
Created attachment 28867
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28867
gcc-HEAD compiler output


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
  2012-12-04  0:08 ` [Bug rtl-optimization/55583] " mtkilpailut at torni dot org
@ 2012-12-04  0:21 ` hjl.tools at gmail dot com
  2012-12-04  1:04 ` mikko.markus.torni at iki dot fi
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: hjl.tools at gmail dot com @ 2012-12-04  0:21 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-12-04
                 CC|                            |areg.melikadamyan at gmail
                   |                            |dot com, hjl.tools at gmail
                   |                            |dot com, ubizjak at gmail
                   |                            |dot com
     Ever Confirmed|0                           |1

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> 2012-12-04 00:21:02 UTC ---
Clang generates:

    movl    a(%rip), %eax
    shldl    $2, %eax, b(%rip)
    ret

at -O2.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug rtl-optimization/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
  2012-12-04  0:08 ` [Bug rtl-optimization/55583] " mtkilpailut at torni dot org
  2012-12-04  0:21 ` hjl.tools at gmail dot com
@ 2012-12-04  1:04 ` mikko.markus.torni at iki dot fi
  2012-12-04 10:16 ` [Bug target/55583] " glisse at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: mikko.markus.torni at iki dot fi @ 2012-12-04  1:04 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

Mikko Markus Torni <mikko.markus.torni at iki dot fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #28866|0                           |1
        is obsolete|                            |

--- Comment #3 from Mikko Markus Torni <mikko.markus.torni at iki dot fi> 2012-12-04 01:03:44 UTC ---
Created attachment 28868
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28868
Source code demonstrating code generated (updated)

Bug fixes in signed integer testcases.

Clang 3.0 seems to produce optimal looking code in the following test cases:
  test32rn testu64l testu32l testu16l testu64ln testu64rn testu32ln testu32rn

Clang 3.0 manages to use shld/shrd, but generates extra moves in the following
test cases:
  test64r test32r test16r test64rn test32rn testu64rn testu32r testu16r

Clang 3.0 fails to use shld/shrd in the following test cases:
  test64l test32l test16l test64ln test32ln test16ln test16rn testu16ln
testu16rn

Tested with clang:
 "/usr/bin/clang" -cc1 -triple x86_64-pc-linux-gnu -S -disable-free
-disable-llvm-verifier -main-file-name gcc_shld_not_used.c -mrelocation-model
static -mdisable-fp-elim -masm-verbose -mconstructor-aliases -munwind-tables
-target-cpu x86-64 -target-linker-version 2.22 -momit-leaf-frame-pointer -v
-coverage-file gcc_shld_not_used.s -resource-dir /usr/bin/../lib/clang/3.0 -O2
-Wall -W -ferror-limit 19 -fmessage-length 0 -fgnu-runtime
-fobjc-runtime-has-arc -fobjc-runtime-has-weak -fobjc-fragile-abi
-fdiagnostics-show-option -o gcc_shld_not_used.s -x cpp-output
gcc_shld_not_used.i
clang -cc1 version 3.0 based upon llvm 3.0 hosted on x86_64-pc-linux-gnu


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
                   ` (2 preceding siblings ...)
  2012-12-04  1:04 ` mikko.markus.torni at iki dot fi
@ 2012-12-04 10:16 ` glisse at gcc dot gnu.org
  2013-04-01 13:45 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-12-04 10:16 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |glisse at gcc dot gnu.org

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> 2012-12-04 10:15:27 UTC ---
It looks like the patterns all look for 32-i as the second shift amount.
Writing an additional version that takes a constant (with an extra check that
the sum of the constants is 32, and we then have to specify immediate_length
manually) and replacing (match_dup 0) with an extra operand that has the
constraint "0" seems to work. (and breaks again if I swap the 2 sides of
operator| )


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
                   ` (3 preceding siblings ...)
  2012-12-04 10:16 ` [Bug target/55583] " glisse at gcc dot gnu.org
@ 2013-04-01 13:45 ` glisse at gcc dot gnu.org
  2014-06-07  9:12 ` glisse at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2013-04-01 13:45 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> 2013-04-01 13:45:33 UTC ---
Created attachment 29764
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29764
Patch from comment #4

I apparently forgot to attach a patch when I posted comment #4. This is just to
show the idea, it doesn't handle many cases, and the length_immediate value was
randomly filled just to let it compile.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
                   ` (4 preceding siblings ...)
  2013-04-01 13:45 ` glisse at gcc dot gnu.org
@ 2014-06-07  9:12 ` glisse at gcc dot gnu.org
  2022-05-30  2:23 ` crazylht at gmail dot com
  2022-11-01  3:24 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-06-07  9:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

Marc Glisse <glisse at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2012-12-04 00:00:00         |2014-6-7

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> ---
Several things:

1) https://gcc.gnu.org/ml/gcc/2014-06/msg00063.html points out that our shrd
patterns wrongly use ashiftrt instead of lshiftrt

2) We can convince the current compiler to generate shrd by constructing
((((unsigned long long)a)<<32) | b) >> n (take care not to use '+' in place of
'|' because gcc is unable to realize that x+0 has no carry and thus leaves
plenty of unneeded code in that case). For a constant shift, it manages to
clean up all the useless code. At least that works for the 32 bit version with
-m32 and the 64 bit version (using unsigned __int128) with -m64, it doesn't
work for the 32 bit version with -m64.

3) With extra patterns as attached here, combine can handle the case where the
shift amount is constant. However, the non-constant pattern is too big for
combine. The closest it gets to matching is (b<<n)|(a>>(l-n)), but replacing l
with 32 is one more substitution than it is willing  to try (it also ignores
the REG_EQUAL note that would give (32-n) with one substitution less).
Improving combine would be nice. I am not sure what intermediate pattern (not
too artificial) we could introduce to help it. Maybe a>>(32-n), though I don't
even know if it is better to implement that as a subtraction and a shift or as
generating zero then using sh[lr]d.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/55583] Extended shift instruction on x86-64 is not used,  producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
                   ` (5 preceding siblings ...)
  2014-06-07  9:12 ` glisse at gcc dot gnu.org
@ 2022-05-30  2:23 ` crazylht at gmail dot com
  2022-11-01  3:24 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-05-30  2:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
i386 already has

12980(define_insn_and_split "*x86_shrd_2"
12981  [(set (match_operand:SI 0 "nonimmediate_operand")
12982        (ior:SI (lshiftrt:SI (match_dup 0)
12983                             (match_operand:QI 2 "nonmemory_operand"))
12984                (ashift:SI (match_operand:SI 1 "register_operand")
12985                           (minus:QI (const_int 32) (match_dup 2)))))

It need to be extended(or has new pre_reload splitters) to handle
1. op2 is constant, so minus is not necessary here.
2. swap op2 and (minus:QI (const_int 32) (match_dup 2) between lshiftrt and
ashift.
3. match_dup 0 is too restrict, we can have a extra emit_move_insn to set DEST.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/55583] Extended shift instruction on x86-64 is not used, producing unoptimal code
  2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
                   ` (6 preceding siblings ...)
  2022-05-30  2:23 ` crazylht at gmail dot com
@ 2022-11-01  3:24 ` cvs-commit at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-01  3:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55583

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:5c5ef2f9ab545b680cd4bb6c540a9dadb12ead86

commit r13-3586-g5c5ef2f9ab545b680cd4bb6c540a9dadb12ead86
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Oct 27 18:48:41 2022 +0800

    Enable more optimization for 32-bit/64-bit shrd/shld with imm shift count.

    This patch doens't handle variable count since it require 5 insns to
    be combined to get wanted pattern, but current pass_combine only
    supports at most 4.
    This patch doesn't handle 16-bit shrd/shld either.

    gcc/ChangeLog:

            PR target/55583
            * config/i386/i386.md (*x86_64_shld_1): Rename to ..
            (x86_64_shld_1): .. this.
            (*x86_shld_1): Rename to ..
            (x86_shld_1): .. this.
            (*x86_64_shrd_1): Rename to ..
            (x86_64_shrd_1): .. this.
            (*x86_shrd_1): Rename to ..
            (x86_shrd_1): .. this.
            (*x86_64_shld_shrd_1_nozext): New pre_reload splitter.
            (*x86_shld_shrd_1_nozext): Ditto.
            (*x86_64_shrd_shld_1_nozext): Ditto.
            (*x86_shrd_shld_1_nozext): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr55583.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-11-01  3:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-04  0:06 [Bug rtl-optimization/55583] New: Extended shift instruction on x86-64 is not used, producing unoptimal code mtkilpailut at torni dot org
2012-12-04  0:08 ` [Bug rtl-optimization/55583] " mtkilpailut at torni dot org
2012-12-04  0:21 ` hjl.tools at gmail dot com
2012-12-04  1:04 ` mikko.markus.torni at iki dot fi
2012-12-04 10:16 ` [Bug target/55583] " glisse at gcc dot gnu.org
2013-04-01 13:45 ` glisse at gcc dot gnu.org
2014-06-07  9:12 ` glisse at gcc dot gnu.org
2022-05-30  2:23 ` crazylht at gmail dot com
2022-11-01  3:24 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).