[Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t
@ 2015-06-11 14:49 matthijs at stdin dot nl
  2015-06-27 17:52 ` [Bug target/66511] " gjl at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: matthijs at stdin dot nl @ 2015-06-11 14:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

            Bug ID: 66511
           Summary: [avr] whole-byte shifts not optimized away for
                    uint64_t
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthijs at stdin dot nl
  Target Milestone: ---

When doing whole-byte shifts, gcc usually optimizes away the shifts and
ends up moving data between registers instead. However, it seems this
doesn't happen when uint64_t is used.

Here's an example (assembler output slightly trimmed of unrelated
comments and annotations etc.):

matthijs@grubby:~$ cat test.cpp
#include <stdint.h>

uint8_t foo64_8(uint64_t a) {
        return a >> 8;
}

uint16_t foo64_16(uint64_t a) {
        return a >> 8;
}

uint8_t foo32_8(uint32_t a) {
        return a >> 8;
}

uint16_t foo32_16(uint32_t a) {
        return (a >> 8);
}

matthijs@grubby:~$ avr-gcc -Os test.cpp -S -o -
_Z7foo64_8y:
        push r16
        ldi r16,lo8(8)
        rcall __lshrdi3
        mov r24,r18
        pop r16
        ret

_Z8foo64_16y:
        push r16
        ldi r16,lo8(8)
        rcall __lshrdi3
        mov r24,r18
        mov r25,r19
        pop r16
        ret

_Z7foo32_8m:
        mov r24,r23
        ret

_Z8foo32_16m:
        clr r27
        mov r26,r25
        mov r25,r24
        mov r24,r23
        ret

        .ident  "GCC: (GNU) 4.9.2 20141224 (prerelease)"

The output is identical for 4.8.1 on Debian, and the above 4.9.2 on
Arch. I haven't found a readily available 5.x package yet to test.

As you can see, the versions operating on 64 bit values preserve the
8-bit shift (which is very inefficient on AVR), while the versions
running on 32 bit values simply copy the right registers.

The foo32_16 function still has some useless instructions (r27 and r26
are not part of the return value, not sure why these are set) but that
is probably an unrelated problem.

I've marked this with component "target", since I think these
optimizations are avr-specific (or at least not applicable to bigger
architectures).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
  2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
@ 2015-06-27 17:52 ` gjl at gcc dot gnu.org
  2015-06-27 18:07 ` gjl at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: gjl at gcc dot gnu.org @ 2015-06-27 17:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Georg-Johann Lay <gjl at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Target|                            |avr
           Priority|P3                          |P5
                 CC|                            |gjl at gcc dot gnu.org
           Severity|normal                      |enhancement


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
  2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
  2015-06-27 17:52 ` [Bug target/66511] " gjl at gcc dot gnu.org
@ 2015-06-27 18:07 ` gjl at gcc dot gnu.org
  2015-08-02 11:42 ` matthijs at stdin dot nl
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: gjl at gcc dot gnu.org @ 2015-06-27 18:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #1 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
(In reply to Matthijs Kooijman from comment #0)
> I haven't found a readily available 5.x package yet to test.

It's the same.

> As you can see, the versions operating on 64 bit values preserve the
> 8-bit shift (which is very inefficient on AVR), while the versions
> running on 32 bit values simply copy the right registers.

Lib functions are used because users complained about bloated 64-bit
arithmetic.  

Notice that indide these 64-bit shift functions byte-shifts are used.

> The foo32_16 function still has some useless instructions (r27 and r26
> are not part of the return value, not sure why these are set) but that
> is probably an unrelated problem.

Yes.

> I've marked this with component "target", since I think these
> optimizations are avr-specific (or at least not applicable to bigger
> architectures).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
  2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
  2015-06-27 17:52 ` [Bug target/66511] " gjl at gcc dot gnu.org
  2015-06-27 18:07 ` gjl at gcc dot gnu.org
@ 2015-08-02 11:42 ` matthijs at stdin dot nl
  2015-08-13 20:56 ` gjl at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: matthijs at stdin dot nl @ 2015-08-02 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #2 from Matthijs Kooijman <matthijs at stdin dot nl> ---
So, IIUC, this is quite hard to fix? Either you use lib functions, which
prevents the optimizer from just relabeling or coyping registers to apply
shifting, or you don't and then more complex operations will become very
verbose and messy?

Would it make sense (and be possible) to add a special case to not use lib
functions for shifts by a constant number of bits that is also a multiple of 8?
At first glance, that would make a lot of common cases (where an integer is
decomposed into separate bytes or other parts) a lot faster, while still
keeping the lib functions for more complex operations?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
  2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
                   ` (2 preceding siblings ...)
  2015-08-02 11:42 ` matthijs at stdin dot nl
@ 2015-08-13 20:56 ` gjl at gcc dot gnu.org
  2023-04-16 12:51 ` roger at nextmovesoftware dot com
  2023-04-16 17:27 ` klaus.doldinger64 at googlemail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: gjl at gcc dot gnu.org @ 2015-08-13 20:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

--- Comment #3 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
(In reply to Matthijs Kooijman from comment #2)
> So, IIUC, this is quite hard to fix? Either you use lib functions, which
> prevents the optimizer from just relabeling or coyping registers to apply
> shifting, or you don't and then more complex operations will become very
> verbose and messy?

avr-gcc is using lib functions, but not as libcall but as transparent call
instead.  You'll find the implementation in avr-dimode.md.  avr-gcc has no
proper DImode support.  For reasoning cf. the comments in the head of that
file.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
  2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
                   ` (3 preceding siblings ...)
  2015-08-13 20:56 ` gjl at gcc dot gnu.org
@ 2023-04-16 12:51 ` roger at nextmovesoftware dot com
  2023-04-16 17:27 ` klaus.doldinger64 at googlemail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-04-16 12:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
Created attachment 54871
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54871&action=edit
proposed patch

Proposed patch, using a peephole2 in avr-dimode.md to inline calls to __lshrdi3
that require only a single instruction or two (due to truncation).  For
truncations to char, this is smaller and faster, and for truncations to
unsigned short this is the same size, but faster.  The drawback is that
performing this late (in peephole2) doesn't eliminate the stack frame
prolog/epilog.  Thoughts?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/66511] [avr] whole-byte shifts not optimized away for uint64_t
  2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
                   ` (4 preceding siblings ...)
  2023-04-16 12:51 ` roger at nextmovesoftware dot com
@ 2023-04-16 17:27 ` klaus.doldinger64 at googlemail dot com
  5 siblings, 0 replies; 7+ messages in thread
From: klaus.doldinger64 at googlemail dot com @ 2023-04-16 17:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66511

Wilhelm M <klaus.doldinger64 at googlemail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |klaus.doldinger64@googlemai
                   |                            |l.com

--- Comment #5 from Wilhelm M <klaus.doldinger64 at googlemail dot com> ---
(In reply to Roger Sayle from comment #4)
> Created attachment 54871 [details]
> proposed patch
> 
> Proposed patch, using a peephole2 in avr-dimode.md to inline calls to
> __lshrdi3 that require only a single instruction or two (due to truncation).
> For truncations to char, this is smaller and faster, and for truncations to
> unsigned short this is the same size, but faster.  The drawback is that
> performing this late (in peephole2) doesn't eliminate the stack frame
> prolog/epilog.  Thoughts?

Looks good to me.
Many thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-16 17:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-11 14:49 [Bug target/66511] New: [avr] whole-byte shifts not optimized away for uint64_t matthijs at stdin dot nl
2015-06-27 17:52 ` [Bug target/66511] " gjl at gcc dot gnu.org
2015-06-27 18:07 ` gjl at gcc dot gnu.org
2015-08-02 11:42 ` matthijs at stdin dot nl
2015-08-13 20:56 ` gjl at gcc dot gnu.org
2023-04-16 12:51 ` roger at nextmovesoftware dot com
2023-04-16 17:27 ` klaus.doldinger64 at googlemail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).