public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed
@ 2013-10-04 16:38 b.grayson at samsung dot com
  2013-10-05  9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: b.grayson at samsung dot com @ 2013-10-04 16:38 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621

            Bug ID: 58621
           Summary: With -fsection-anchors, a superfluous 'add' is
                    performed
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: b.grayson at samsung dot com
            Target: AArch64

In some scenarios, the code emitted with -fsection-anchors is suboptimal.  Here
is the generated assembly for some code, compiled with -O3 -fsection-anchors
-fno-common:

int a;
char b;

int foo() {
    return a+b;
}


0000000000000000 <foo>:
   0:   90000000        adrp    x0, 0 <foo>
   4:   91000002        add     x2, x0, #0x0
   8:   39400001        ldrb    w1, [x0]
   c:   b9400440        ldr     w0, [x2,#4]
  10:   0b000020        add     w0, w1, w0
  14:   d65f03c0        ret

Here is the source assembly:

foo:
        adrp    x0, .LANCHOR0
        add     x2, x0, :lo12:.LANCHOR0
        ldrb    w1, [x0,#:lo12:.LANCHOR0]
        ldr     w0, [x2,4]
        add     w0, w1, w0
        ret

Note the add.  It is computing the address of variable b, and using that as the
section anchor, so effectively the ldr is using the address x0 + #0 + #4.

If the ldr instead used #:lo12:.LANCHOR0 + 4, it would eliminate the extra add
instruction:

foo:
        adrp    x0, .LANCHOR0
        ldrb    w1, [x0,#:lo12:.LANCHOR0]
        ldr     w0, [x2,#:lo12:.LANCHOR0+4]
        add     w0, w1, w0
        ret

Of course, the offset must be small enough to not cross a page boundary from
.LANCHOR0's page address, but that restriction is already in there that the
offset from address of the chosen section anchor to address of 'a' must be less
than a page.

(Yes, in this case the add looks amazingly redundant since it is adding 0, but
I've seen other cases where the ANCHOR is not at offset 0, in which case we
have an add plus an offsetted load, where it could be done with just an
offsetted load.)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/58621] With -fsection-anchors, a superfluous 'add' is performed
  2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
@ 2013-10-05  9:23 ` rearnsha at gcc dot gnu.org
  2013-10-06 15:00 ` b.grayson at samsung dot com
  2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-10-05  9:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Your suggested optimization would be invalid if .LANCHOR0 had the address
0x?????????????FFC, whereupon adding 4 would wrap the offset component but not
the base component in x0.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/58621] With -fsection-anchors, a superfluous 'add' is performed
  2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
  2013-10-05  9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
@ 2013-10-06 15:00 ` b.grayson at samsung dot com
  2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: b.grayson at samsung dot com @ 2013-10-06 15:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621

Brian Grayson <b.grayson at samsung dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |---

--- Comment #2 from Brian Grayson <b.grayson at samsung dot com> ---
My understanding of the relocation for the ldr instruction is that the 'pimm'
field takes a number in the range 0 ... 16380, divides it by 4, and places it
in the 'imm12' field;  thus, this case could be handled:  the worst-case
situation is a symbol 4K-4 away from the section anchor, which could have an
offset of 4K-4, resulting in a total offset of 8K-8, and 8K-8 still fits within
the legal 0 ... 16K-4 range.

I verified the limits by changing the constant in the instruction to be 16380,
and assembling by hand;  neither gcc nor ld choked, and the resulting
disassembly was:

  400118:       b97ffc40        ldr     w0, [x2,#16380]

So for the ldr (64-bit), ldr (32-bit), and ldrh (16-bit) (and the sign-extend
variations thereof, and the equivalent store instructions), such an
optimization would be possible without concern for the page-crossing situation.

Accesses via ldrb would still need the pure computation of the full
section-anchor access to preserve correctness, but I highly suspect that the
dominant use of section anchors is not for byte-sized variables, and thus this
optimization would usually be feasible as long as the size was checked.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/58621] With -fsection-anchors, a superfluous 'add' is performed
  2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
  2013-10-05  9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
  2013-10-06 15:00 ` b.grayson at samsung dot com
@ 2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-10-06 22:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621

Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #3 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
No, the LO12 relocations (see the AArch64 ELF bindings in infocenter) are
defined to insert (S + A) & 0xff<n> (n = 0xf, 0xe, 0xc, 0x8 for byte, halfword,
word and dword respectively), so never insert more than 12 bits into the offset
field, even when there is space for more.  Note that the addend is inside the
mask operation

This is needed because the ADRP and LDR have to work as a pair when a symbol
has an offset and the offset could be of arbitrary size.  For example,

    adrp  x0, A+102400
    ldr   w1, [x0, #:lo12:A+102400]

still needs to work.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-06 22:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
2013-10-05  9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
2013-10-06 15:00 ` b.grayson at samsung dot com
2013-10-06 22:18 ` rearnsha at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).