public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/43883]  New: missed optimization of constant __int128_t modulus
@ 2010-04-25  5:06 svfuerst at gmail dot com
  2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-25  5:06 UTC (permalink / raw)
  To: gcc-bugs

The following function gets optimized at -O3 to:

long long tmod2(long long x)
{
        return x % 2;
}


mov    %rdi,%rdx                                                   
shr    $0x3f,%rdx                                                  
lea    (%rdi,%rdx,1),%rax                                          
and    $0x1,%eax                                                   
sub    %rdx,%rax                                                   
retq

This is very good code.  Unfortunately, the 128 bit version doesn't get
optimized nearly so well.

__int128_t tmod2(__int128_t x)
{
        return x % 2;
}

mov    %rsi,%rdx
mov    %rdi,%r8
xor    %ecx,%ecx
shr    $0x3f,%rdx
push   %rbx
add    %rdx,%r8
xor    %edi,%edi
mov    %r8,%rsi
mov    %rdi,%r9
and    $0x1,%esi
mov    %rsi,%r8
sub    %rdx,%r8
sbb    %rcx,%r9
mov    %r8,%rax
mov    %r9,%rdx
pop    %rbx
retq

It looks like this simple variation of the 64bit algorithm will work for the
128 bit version:

mov    %rsi,%rdx    <--- Just changed rdi into rsi
shr    $0x3f,%rdx   <--- nicely already calculates high bytes in rdx
lea    (%rdi,%rdx,1),%rax
and    $0x1,%eax
sub    %rdx,%rax
retq


-- 
           Summary: missed optimization of constant __int128_t modulus
           Product: gcc
           Version: 4.5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: svfuerst at gmail dot com
 GCC build triplet: x86_64-linux
  GCC host triplet: x86_64-linux
GCC target triplet: x86_64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/43883] missed optimization of constant __int128_t modulus
  2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
@ 2010-04-25 20:09 ` rguenth at gcc dot gnu dot org
  2010-04-30  9:25 ` ubizjak at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-04-25 20:09 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2010-04-25 20:09 -------
There isn't any pattern for the TImode variant.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|middle-end                  |target


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/43883] missed optimization of constant __int128_t modulus
  2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
  2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
@ 2010-04-30  9:25 ` ubizjak at gmail dot com
  2010-04-30 16:13 ` svfuerst at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2010-04-30  9:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from ubizjak at gmail dot com  2010-04-30 09:12 -------
(In reply to comment #1)
> There isn't any pattern for the TImode variant.

Huh? Expansion uses TImode where appropriate:

(insn 10 9 11 ttt.c:3 (parallel [
            (set (reg:DI 66)
                (ashiftrt:DI (subreg:DI (reg/v:TI 60 [ x ]) 8)
                    (const_int 63 [0x3f])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 11 10 12 ttt.c:3 (set (subreg:DI (reg:TI 65) 0)
        (reg:DI 66)) -1 (nil))

(insn 12 11 13 ttt.c:3 (parallel [
            (set (reg:DI 67)
                (ashiftrt:DI (reg:DI 66)
                    (const_int 63 [0x3f])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 13 12 14 ttt.c:3 (set (subreg:DI (reg:TI 65) 8)
        (reg:DI 67)) -1 (nil))

(insn 14 13 15 ttt.c:3 (parallel [
            (set (reg:TI 68)
                (lshiftrt:TI (reg:TI 65)
                    (const_int 127 [0x7f])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 15 14 16 ttt.c:3 (parallel [
            (set (reg:TI 69)
                (plus:TI (reg/v:TI 60 [ x ])
                    (reg:TI 68)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 16 15 17 ttt.c:3 (parallel [
            (set (subreg:DI (reg:TI 70) 0)
                (and:DI (subreg:DI (reg:TI 69) 0)
                    (const_int 1 [0x1])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 17 16 18 ttt.c:3 (parallel [
            (set (subreg:DI (reg:TI 70) 8)
                (and:DI (subreg:DI (reg:TI 69) 8)
                    (const_int 0 [0x0])))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 18 17 19 ttt.c:3 (parallel [
            (set (reg:TI 71)
                (minus:TI (reg:TI 70)
                    (reg:TI 68)))
            (clobber (reg:CC 17 flags))
        ]) -1 (nil))

(insn 19 18 20 ttt.c:2 (set (reg:TI 59 [ <retval> ])
        (reg:TI 71)) -1 (nil))

Are you sure that proposed solution will cover all corner cases?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/43883] missed optimization of constant __int128_t modulus
  2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
  2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
  2010-04-30  9:25 ` ubizjak at gmail dot com
@ 2010-04-30 16:13 ` svfuerst at gmail dot com
  2010-04-30 16:33 ` svfuerst at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-30 16:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from svfuerst at gmail dot com  2010-04-30 16:12 -------
Oops, you are right.  The 128 bit version needs an extra sbb on the end with
that code.  (For some reason I was missreading the shr as a sar.):

mov    %rsi,%rdx
shr    $0x3f,%rdx
lea    (%rdi,%rdx,1),%rax
and    $0x1,%eax
sub    %rdx,%rax
sbb    %rdx,%rdx

However, if you use sar + add, instead of shr + sub + sbb, it is one
instruction less:
mov    %rsi,%rdx
sar    $0x3f,%rdx
lea    (%rdi,%rdx,1),%rax
and    $0x1,%eax
add    %rdx,%rax


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/43883] missed optimization of constant __int128_t modulus
  2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
                   ` (2 preceding siblings ...)
  2010-04-30 16:13 ` svfuerst at gmail dot com
@ 2010-04-30 16:33 ` svfuerst at gmail dot com
  2010-04-30 19:00 ` [Bug middle-end/43883] " ubizjak at gmail dot com
  2010-04-30 20:31 ` svfuerst at gmail dot com
  5 siblings, 0 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-30 16:33 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from svfuerst at gmail dot com  2010-04-30 16:33 -------
Argh, the sar trick doesn't work when the number is negative and even.  Sorry
about the extra noise.

This leaves as the best code:
mov    %rsi,%rdx
shr    $0x3f,%rdx
lea    (%rdi,%rdx,1),%rax
and    $0x1,%eax
sub    %rdx,%rax
sbb    %rdx,%rdx

This is still better than current version.  Of course, changing the and
instruction will allow faster versions of x%4, x%8, x%16 etc.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
  2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
                   ` (3 preceding siblings ...)
  2010-04-30 16:33 ` svfuerst at gmail dot com
@ 2010-04-30 19:00 ` ubizjak at gmail dot com
  2010-04-30 20:31 ` svfuerst at gmail dot com
  5 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2010-04-30 19:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from ubizjak at gmail dot com  2010-04-30 19:00 -------
(In reply to comment #4)
> Argh, the sar trick doesn't work when the number is negative and even.  Sorry
> about the extra noise.
> 
> This leaves as the best code:
> mov    %rsi,%rdx
> shr    $0x3f,%rdx
> lea    (%rdi,%rdx,1),%rax
> and    $0x1,%eax
> sub    %rdx,%rax
> sbb    %rdx,%rdx
> 
> This is still better than current version.  Of course, changing the and
> instruction will allow faster versions of x%4, x%8, x%16 etc.

Belive it or not, but the version that you show in the description is how gcc
handles subregs... it starts OK, but when register allocator comes into play...

Confirmed as RA problem, the same thing happens with "long long" and -m32.


-- 

ubizjak at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ubizjak at gmail dot com
             Status|UNCONFIRMED                 |NEW
          Component|target                      |middle-end
     Ever Confirmed|0                           |1
           Keywords|                            |ra
   Last reconfirmed|0000-00-00 00:00:00         |2010-04-30 19:00:41
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
  2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
                   ` (4 preceding siblings ...)
  2010-04-30 19:00 ` [Bug middle-end/43883] " ubizjak at gmail dot com
@ 2010-04-30 20:31 ` svfuerst at gmail dot com
  5 siblings, 0 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-30 20:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from svfuerst at gmail dot com  2010-04-30 20:30 -------
For posterity, I might as well note that with the sbb added on the end we don't
need the initial mov instruction if we do some register renaming.  This leaves
the, hopefully optimal this time, five-instruction fragment as the goal:

shr    $0x3f,%rsi
lea    (%rdi,%rsi,1),%rax
and    $0x1,%eax
sub    %rsi,%rax
sbb    %rdx,%rdx


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
       [not found] <bug-43883-4@http.gcc.gnu.org/bugzilla/>
  2021-12-23  0:46 ` pinskia at gcc dot gnu.org
@ 2021-12-23  0:51 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-23  0:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note LLVM produces:
        mov     rdx, rsi
        mov     rax, rdi
        mov     rcx, rsi
        shr     rcx, 63
        add     rcx, rdi
        adc     rsi, 0
        and     rcx, -2
        sub     rax, rcx
        sbb     rdx, rsi

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
       [not found] <bug-43883-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-23  0:46 ` pinskia at gcc dot gnu.org
  2021-12-23  0:51 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-23  0:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
4.7 produces:
        shr     rsi, 63
        mov     rax, rdi
        xor     edi, edi
        add     rax, rsi
        xor     r10d, r10d
        mov     r9, rax
        mov     rdx, r10
        and     r9d, 1
        mov     rax, r9
        sub     rax, rsi
        sbb     rdx, rdi
        ret

4.9:
        mov     rcx, rsi
        sar     rcx, 63
        xor     rdi, rcx
        mov     rdx, rcx
        mov     rax, rdi
        sub     rax, rcx
        and     eax, 1
        xor     rax, rcx
        sub     rax, rcx
        sbb     rdx, rcx
        ret

7.1.0-7.3.0, 8.1.0-8.2.0 decides not to inline it:

        sub     rsp, 8
        .cfi_def_cfa_offset 16
        mov     edx, 2
        xor     ecx, ecx
        call    __modti3
        add     rsp, 8
        .cfi_def_cfa_offset 8
        ret

And then we are back to the 4.9 code gen for 7.4, 8.3 and 9.x

10.x produces:

        mov     r10, rsi
        mov     r8, rdi
        sar     r10, 63
        xor     r8, r10
        mov     rdx, r10
        mov     rax, r8
        sub     rax, r10
        and     eax, 1
        xor     rax, r10
        sub     rax, r10
        sbb     rdx, r10
        ret

GCC 11 and trunk is back to the 4.9 code gen

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-12-23  0:51 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-25  5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
2010-04-30  9:25 ` ubizjak at gmail dot com
2010-04-30 16:13 ` svfuerst at gmail dot com
2010-04-30 16:33 ` svfuerst at gmail dot com
2010-04-30 19:00 ` [Bug middle-end/43883] " ubizjak at gmail dot com
2010-04-30 20:31 ` svfuerst at gmail dot com
     [not found] <bug-43883-4@http.gcc.gnu.org/bugzilla/>
2021-12-23  0:46 ` pinskia at gcc dot gnu.org
2021-12-23  0:51 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).