* Missed optimisation in __udivmoddi4 of libgcc2
@ 2020-09-13 13:39 Stefan Kanthak
0 siblings, 0 replies; only message in thread
From: Stefan Kanthak @ 2020-09-13 13:39 UTC (permalink / raw)
To: gcc
libgcc2 provides "double-word" division as __udivmoddi4()
The following part of its source
| UWtype d0, d1, n0, n1, n2;
| UWtype b, bm;
...
| count_leading_zeros (bm, d1);
| if (bm == 0)
...
| else
| {
| UWtype m1, m0;
| /* Normalize. */
|
| b = W_TYPE_SIZE - bm;
|
| d1 = (d1 << bm) | (d0 >> b);
| d0 = d0 << bm;
| n2 = n1 >> b;
| n1 = (n1 << bm) | (n0 >> b);
| n0 = n0 << bm;
compiles for AMD64 to this rather clumsy code:
| b0: 4c 0f bd da bsr %rdx,%r11
| b4: 49 83 f3 3f xor $0x3f,%r11
| b8: 45 85 db test %r11d,%r11d
| bb: 75 33 jne f0 <__udivmodti4+0xf0>
...
| f0: 49 63 c3 movslq %r11d,%rax
| f3: bd 40 00 00 00 mov $0x40,%ebp
| f8: 44 89 d9 mov %r11d,%ecx
| fb: 4d 89 ec mov %r13,%r12
| fe: 48 29 c5 sub %rax,%rbp
| 101: 48 d3 e2 shl %cl,%rdx
| 104: 49 89 f2 mov %rsi,%r10
| 107: 48 89 f8 mov %rdi,%rax
| 10a: 89 e9 mov %ebp,%ecx
| 10c: 44 89 db mov %r11d,%ebx
| 10f: 49 d3 ec shr %cl,%r12
| 112: 44 89 d9 mov %r11d,%ecx
| 115: 49 d3 e5 shl %cl,%r13
| 118: 89 e9 mov %ebp,%ecx
| 11a: 49 09 d4 or %rdx,%r12
| 11d: 49 d3 ea shr %cl,%r10
| 120: 44 89 d9 mov %r11d,%ecx
| 123: 48 d3 e6 shl %cl,%rsi
| 126: 89 e9 mov %ebp,%ecx
| 128: 4c 89 d2 mov %r10,%rdx
| 12b: 48 d3 e8 shr %cl,%rax
| 12e: 44 89 d9 mov %r11d,%ecx
| 131: 48 09 c6 or %rax,%rsi
| 134: 48 d3 e7 shl %cl,%rdi
| 137: 48 89 f0 mov %rsi,%rax
| 13a: 49 89 f9 mov %rdi,%r9
Why does the optimiser not recognize this pattern of alternating shifts
and generate MUCH shorter and of course faster code like the following?
(I use the variable names from the C source instead of register names
here)
mov %r11d, %ecx
shld %cl, d0, d1
xor n2, n2
shld %cl, n1, n2
shld %cl, n0, n1
JFTR: the test at b8 is superfluous.
regards
Stefan
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-09-13 13:41 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-13 13:39 Missed optimisation in __udivmoddi4 of libgcc2 Stefan Kanthak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).