public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed
@ 2013-10-04 16:38 b.grayson at samsung dot com
2013-10-05 9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: b.grayson at samsung dot com @ 2013-10-04 16:38 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621
Bug ID: 58621
Summary: With -fsection-anchors, a superfluous 'add' is
performed
Product: gcc
Version: 4.9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: b.grayson at samsung dot com
Target: AArch64
In some scenarios, the code emitted with -fsection-anchors is suboptimal. Here
is the generated assembly for some code, compiled with -O3 -fsection-anchors
-fno-common:
int a;
char b;
int foo() {
return a+b;
}
0000000000000000 <foo>:
0: 90000000 adrp x0, 0 <foo>
4: 91000002 add x2, x0, #0x0
8: 39400001 ldrb w1, [x0]
c: b9400440 ldr w0, [x2,#4]
10: 0b000020 add w0, w1, w0
14: d65f03c0 ret
Here is the source assembly:
foo:
adrp x0, .LANCHOR0
add x2, x0, :lo12:.LANCHOR0
ldrb w1, [x0,#:lo12:.LANCHOR0]
ldr w0, [x2,4]
add w0, w1, w0
ret
Note the add. It is computing the address of variable b, and using that as the
section anchor, so effectively the ldr is using the address x0 + #0 + #4.
If the ldr instead used #:lo12:.LANCHOR0 + 4, it would eliminate the extra add
instruction:
foo:
adrp x0, .LANCHOR0
ldrb w1, [x0,#:lo12:.LANCHOR0]
ldr w0, [x2,#:lo12:.LANCHOR0+4]
add w0, w1, w0
ret
Of course, the offset must be small enough to not cross a page boundary from
.LANCHOR0's page address, but that restriction is already in there that the
offset from address of the chosen section anchor to address of 'a' must be less
than a page.
(Yes, in this case the add looks amazingly redundant since it is adding 0, but
I've seen other cases where the ANCHOR is not at offset 0, in which case we
have an add plus an offsetted load, where it could be done with just an
offsetted load.)
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/58621] With -fsection-anchors, a superfluous 'add' is performed
2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
@ 2013-10-05 9:23 ` rearnsha at gcc dot gnu.org
2013-10-06 15:00 ` b.grayson at samsung dot com
2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-10-05 9:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621
Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID
--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Your suggested optimization would be invalid if .LANCHOR0 had the address
0x?????????????FFC, whereupon adding 4 would wrap the offset component but not
the base component in x0.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/58621] With -fsection-anchors, a superfluous 'add' is performed
2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
2013-10-05 9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
@ 2013-10-06 15:00 ` b.grayson at samsung dot com
2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: b.grayson at samsung dot com @ 2013-10-06 15:00 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621
Brian Grayson <b.grayson at samsung dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |UNCONFIRMED
Resolution|INVALID |---
--- Comment #2 from Brian Grayson <b.grayson at samsung dot com> ---
My understanding of the relocation for the ldr instruction is that the 'pimm'
field takes a number in the range 0 ... 16380, divides it by 4, and places it
in the 'imm12' field; thus, this case could be handled: the worst-case
situation is a symbol 4K-4 away from the section anchor, which could have an
offset of 4K-4, resulting in a total offset of 8K-8, and 8K-8 still fits within
the legal 0 ... 16K-4 range.
I verified the limits by changing the constant in the instruction to be 16380,
and assembling by hand; neither gcc nor ld choked, and the resulting
disassembly was:
400118: b97ffc40 ldr w0, [x2,#16380]
So for the ldr (64-bit), ldr (32-bit), and ldrh (16-bit) (and the sign-extend
variations thereof, and the equivalent store instructions), such an
optimization would be possible without concern for the page-crossing situation.
Accesses via ldrb would still need the pure computation of the full
section-anchor access to preserve correctness, but I highly suspect that the
dominant use of section anchors is not for byte-sized variables, and thus this
optimization would usually be feasible as long as the size was checked.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/58621] With -fsection-anchors, a superfluous 'add' is performed
2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
2013-10-05 9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
2013-10-06 15:00 ` b.grayson at samsung dot com
@ 2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-10-06 22:18 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58621
Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |INVALID
--- Comment #3 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
No, the LO12 relocations (see the AArch64 ELF bindings in infocenter) are
defined to insert (S + A) & 0xff<n> (n = 0xf, 0xe, 0xc, 0x8 for byte, halfword,
word and dword respectively), so never insert more than 12 bits into the offset
field, even when there is space for more. Note that the addend is inside the
mask operation
This is needed because the ADRP and LDR have to work as a pair when a symbol
has an offset and the offset could be of arbitrary size. For example,
adrp x0, A+102400
ldr w1, [x0, #:lo12:A+102400]
still needs to work.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-10-06 22:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-04 16:38 [Bug target/58621] New: With -fsection-anchors, a superfluous 'add' is performed b.grayson at samsung dot com
2013-10-05 9:23 ` [Bug target/58621] " rearnsha at gcc dot gnu.org
2013-10-06 15:00 ` b.grayson at samsung dot com
2013-10-06 22:18 ` rearnsha at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).