public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus
@ 2010-04-25 5:06 svfuerst at gmail dot com
2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
` (5 more replies)
0 siblings, 6 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-25 5:06 UTC (permalink / raw)
To: gcc-bugs
The following function gets optimized at -O3 to:
long long tmod2(long long x)
{
return x % 2;
}
mov %rdi,%rdx
shr $0x3f,%rdx
lea (%rdi,%rdx,1),%rax
and $0x1,%eax
sub %rdx,%rax
retq
This is very good code. Unfortunately, the 128 bit version doesn't get
optimized nearly so well.
__int128_t tmod2(__int128_t x)
{
return x % 2;
}
mov %rsi,%rdx
mov %rdi,%r8
xor %ecx,%ecx
shr $0x3f,%rdx
push %rbx
add %rdx,%r8
xor %edi,%edi
mov %r8,%rsi
mov %rdi,%r9
and $0x1,%esi
mov %rsi,%r8
sub %rdx,%r8
sbb %rcx,%r9
mov %r8,%rax
mov %r9,%rdx
pop %rbx
retq
It looks like this simple variation of the 64bit algorithm will work for the
128 bit version:
mov %rsi,%rdx <--- Just changed rdi into rsi
shr $0x3f,%rdx <--- nicely already calculates high bytes in rdx
lea (%rdi,%rdx,1),%rax
and $0x1,%eax
sub %rdx,%rax
retq
--
Summary: missed optimization of constant __int128_t modulus
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: svfuerst at gmail dot com
GCC build triplet: x86_64-linux
GCC host triplet: x86_64-linux
GCC target triplet: x86_64-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/43883] missed optimization of constant __int128_t modulus
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
@ 2010-04-25 20:09 ` rguenth at gcc dot gnu dot org
2010-04-30 9:25 ` ubizjak at gmail dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-04-25 20:09 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2010-04-25 20:09 -------
There isn't any pattern for the TImode variant.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|middle-end |target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/43883] missed optimization of constant __int128_t modulus
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
@ 2010-04-30 9:25 ` ubizjak at gmail dot com
2010-04-30 16:13 ` svfuerst at gmail dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2010-04-30 9:25 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ubizjak at gmail dot com 2010-04-30 09:12 -------
(In reply to comment #1)
> There isn't any pattern for the TImode variant.
Huh? Expansion uses TImode where appropriate:
(insn 10 9 11 ttt.c:3 (parallel [
(set (reg:DI 66)
(ashiftrt:DI (subreg:DI (reg/v:TI 60 [ x ]) 8)
(const_int 63 [0x3f])))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 11 10 12 ttt.c:3 (set (subreg:DI (reg:TI 65) 0)
(reg:DI 66)) -1 (nil))
(insn 12 11 13 ttt.c:3 (parallel [
(set (reg:DI 67)
(ashiftrt:DI (reg:DI 66)
(const_int 63 [0x3f])))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 13 12 14 ttt.c:3 (set (subreg:DI (reg:TI 65) 8)
(reg:DI 67)) -1 (nil))
(insn 14 13 15 ttt.c:3 (parallel [
(set (reg:TI 68)
(lshiftrt:TI (reg:TI 65)
(const_int 127 [0x7f])))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 15 14 16 ttt.c:3 (parallel [
(set (reg:TI 69)
(plus:TI (reg/v:TI 60 [ x ])
(reg:TI 68)))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 16 15 17 ttt.c:3 (parallel [
(set (subreg:DI (reg:TI 70) 0)
(and:DI (subreg:DI (reg:TI 69) 0)
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 17 16 18 ttt.c:3 (parallel [
(set (subreg:DI (reg:TI 70) 8)
(and:DI (subreg:DI (reg:TI 69) 8)
(const_int 0 [0x0])))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 18 17 19 ttt.c:3 (parallel [
(set (reg:TI 71)
(minus:TI (reg:TI 70)
(reg:TI 68)))
(clobber (reg:CC 17 flags))
]) -1 (nil))
(insn 19 18 20 ttt.c:2 (set (reg:TI 59 [ <retval> ])
(reg:TI 71)) -1 (nil))
Are you sure that proposed solution will cover all corner cases?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/43883] missed optimization of constant __int128_t modulus
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
2010-04-30 9:25 ` ubizjak at gmail dot com
@ 2010-04-30 16:13 ` svfuerst at gmail dot com
2010-04-30 16:33 ` svfuerst at gmail dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-30 16:13 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from svfuerst at gmail dot com 2010-04-30 16:12 -------
Oops, you are right. The 128 bit version needs an extra sbb on the end with
that code. (For some reason I was missreading the shr as a sar.):
mov %rsi,%rdx
shr $0x3f,%rdx
lea (%rdi,%rdx,1),%rax
and $0x1,%eax
sub %rdx,%rax
sbb %rdx,%rdx
However, if you use sar + add, instead of shr + sub + sbb, it is one
instruction less:
mov %rsi,%rdx
sar $0x3f,%rdx
lea (%rdi,%rdx,1),%rax
and $0x1,%eax
add %rdx,%rax
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/43883] missed optimization of constant __int128_t modulus
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
` (2 preceding siblings ...)
2010-04-30 16:13 ` svfuerst at gmail dot com
@ 2010-04-30 16:33 ` svfuerst at gmail dot com
2010-04-30 19:00 ` [Bug middle-end/43883] " ubizjak at gmail dot com
2010-04-30 20:31 ` svfuerst at gmail dot com
5 siblings, 0 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-30 16:33 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from svfuerst at gmail dot com 2010-04-30 16:33 -------
Argh, the sar trick doesn't work when the number is negative and even. Sorry
about the extra noise.
This leaves as the best code:
mov %rsi,%rdx
shr $0x3f,%rdx
lea (%rdi,%rdx,1),%rax
and $0x1,%eax
sub %rdx,%rax
sbb %rdx,%rdx
This is still better than current version. Of course, changing the and
instruction will allow faster versions of x%4, x%8, x%16 etc.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
` (3 preceding siblings ...)
2010-04-30 16:33 ` svfuerst at gmail dot com
@ 2010-04-30 19:00 ` ubizjak at gmail dot com
2010-04-30 20:31 ` svfuerst at gmail dot com
5 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2010-04-30 19:00 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from ubizjak at gmail dot com 2010-04-30 19:00 -------
(In reply to comment #4)
> Argh, the sar trick doesn't work when the number is negative and even. Sorry
> about the extra noise.
>
> This leaves as the best code:
> mov %rsi,%rdx
> shr $0x3f,%rdx
> lea (%rdi,%rdx,1),%rax
> and $0x1,%eax
> sub %rdx,%rax
> sbb %rdx,%rdx
>
> This is still better than current version. Of course, changing the and
> instruction will allow faster versions of x%4, x%8, x%16 etc.
Belive it or not, but the version that you show in the description is how gcc
handles subregs... it starts OK, but when register allocator comes into play...
Confirmed as RA problem, the same thing happens with "long long" and -m32.
--
ubizjak at gmail dot com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ubizjak at gmail dot com
Status|UNCONFIRMED |NEW
Component|target |middle-end
Ever Confirmed|0 |1
Keywords| |ra
Last reconfirmed|0000-00-00 00:00:00 |2010-04-30 19:00:41
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
` (4 preceding siblings ...)
2010-04-30 19:00 ` [Bug middle-end/43883] " ubizjak at gmail dot com
@ 2010-04-30 20:31 ` svfuerst at gmail dot com
5 siblings, 0 replies; 9+ messages in thread
From: svfuerst at gmail dot com @ 2010-04-30 20:31 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from svfuerst at gmail dot com 2010-04-30 20:30 -------
For posterity, I might as well note that with the sbb added on the end we don't
need the initial mov instruction if we do some register renaming. This leaves
the, hopefully optimal this time, five-instruction fragment as the goal:
shr $0x3f,%rsi
lea (%rdi,%rsi,1),%rax
and $0x1,%eax
sub %rsi,%rax
sbb %rdx,%rdx
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <bug-43883-4@http.gcc.gnu.org/bugzilla/>]
* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
[not found] <bug-43883-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-23 0:46 ` pinskia at gcc dot gnu.org
2021-12-23 0:51 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-23 0:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
4.7 produces:
shr rsi, 63
mov rax, rdi
xor edi, edi
add rax, rsi
xor r10d, r10d
mov r9, rax
mov rdx, r10
and r9d, 1
mov rax, r9
sub rax, rsi
sbb rdx, rdi
ret
4.9:
mov rcx, rsi
sar rcx, 63
xor rdi, rcx
mov rdx, rcx
mov rax, rdi
sub rax, rcx
and eax, 1
xor rax, rcx
sub rax, rcx
sbb rdx, rcx
ret
7.1.0-7.3.0, 8.1.0-8.2.0 decides not to inline it:
sub rsp, 8
.cfi_def_cfa_offset 16
mov edx, 2
xor ecx, ecx
call __modti3
add rsp, 8
.cfi_def_cfa_offset 8
ret
And then we are back to the 4.9 code gen for 7.4, 8.3 and 9.x
10.x produces:
mov r10, rsi
mov r8, rdi
sar r10, 63
xor r8, r10
mov rdx, r10
mov rax, r8
sub rax, r10
and eax, 1
xor rax, r10
sub rax, r10
sbb rdx, r10
ret
GCC 11 and trunk is back to the 4.9 code gen
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/43883] missed optimization of constant __int128_t modulus
[not found] <bug-43883-4@http.gcc.gnu.org/bugzilla/>
2021-12-23 0:46 ` pinskia at gcc dot gnu.org
@ 2021-12-23 0:51 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-23 0:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43883
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
--- Comment #8 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note LLVM produces:
mov rdx, rsi
mov rax, rdi
mov rcx, rsi
shr rcx, 63
add rcx, rdi
adc rsi, 0
and rcx, -2
sub rax, rcx
sbb rdx, rsi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-12-23 0:51 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-25 5:06 [Bug middle-end/43883] New: missed optimization of constant __int128_t modulus svfuerst at gmail dot com
2010-04-25 20:09 ` [Bug target/43883] " rguenth at gcc dot gnu dot org
2010-04-30 9:25 ` ubizjak at gmail dot com
2010-04-30 16:13 ` svfuerst at gmail dot com
2010-04-30 16:33 ` svfuerst at gmail dot com
2010-04-30 19:00 ` [Bug middle-end/43883] " ubizjak at gmail dot com
2010-04-30 20:31 ` svfuerst at gmail dot com
[not found] <bug-43883-4@http.gcc.gnu.org/bugzilla/>
2021-12-23 0:46 ` pinskia at gcc dot gnu.org
2021-12-23 0:51 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).