public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Palmer Dabbelt <palmer@rivosinc.com>
To: Joseph.Faulls@imgtec.com
Cc: nelson@rivosinc.com, binutils@sourceware.org
Subject: RE: [EXTERNAL] Re: [PATCH] RISC-V: Do not gp relax against an ABS symbol if it is far away.
Date: Fri, 28 Jul 2023 09:12:32 -0700 (PDT)	[thread overview]
Message-ID: <mhng-031337c1-3f82-41cb-a6c2-40e231a5fb8b@palmer-ri-x1c9> (raw)
In-Reply-To: <LO4P265MB591421EBA3B59627721B032B8006A@LO4P265MB5914.GBRP265.PROD.OUTLOOK.COM>

On Fri, 28 Jul 2023 08:52:41 PDT (-0700), Joseph.Faulls@imgtec.com wrote:
>> I think Nelson's point about cross-section GP relaxation is probably the root cause here, we've had some other bugs like that crop up recently.  
> Assuming that's the case then SHN_ABS is just a red herring: maybe it makes triggering the bug easier, but we can probably come up with a similar failure in other ways.
> 
> Ah, I think you're right. My thought process was that any data not in the ABS section would move position if the sections before it shrunk / changed size, so that the offset to the gp would remain the same as instructions were relaxed. But you could place sections at particular addresses, so this bug would reveal itself just the same. I was able to hit the truncation issue with a very contrived test case.

Yep, these test cases always end up pretty contrived because you're 
trying to reliably get symbols at specific offsets to trigger the bug.  
I don't think there's really any way around it.

>> Can you post a reduced test case with a custom linker script that triggers the problem?  That should help us a bit more concrete.
> 
> Sure, here's an example I managed to create. I'm sure there's a way to reduce it further, without the weird C things to prevent optimising the buffer away. (etiquette question: should I be attaching these or just pasting them?)

I don't think it really matters, we just need something concrete so 
we're all talking about the same thing.  At some point it'll end up in 
there as a real test case with all the DG bits, until then it's just 
going to be us talking through things.

> Command: riscv64-unknown-elf-gcc test.c -c -o test.o; riscv64-unknown-elf-ld -melf64lriscv test.o -o test.elf -Ttest.ld
> 
> test.c:
> extern char __bss_end[];
> static char data[0x800];
> 
> void _start() {
>   *__bss_end = 1;
> }
> 
> char foo(int i) {
>   data[i+1] = 1;
>   return data[i];
> }

FWIW, it can frequently be easier to expose these sorts of things by 
just writing assembly instead of trying to get the compiler to do it -- 
that way you don't need to worry about nonsensical C code getting 
optimized away.  I frequently just grab the output of the compiler for a 
case that looks similar and then massage the assembly to get things to 
break.

> test.ld:
> ENTRY(_start)
> MEMORY
> {
>         ram (!rx) : ORIGIN = 0x800000000, LENGTH = 0x1000
> }
> SECTIONS {
>         .text : {
>                 *(.text*)
>         } >ram
> 
>         .data : {
>                 *(.data*)
>         } >ram
> 
>         __global_pointer$ = .;

So I think that's it: .text shrinks so .data moves, and goes out of 
range of __global_pointer$.  We could probaby get similar breakages if 
__global_pointer$ is in any section that doesn't end up in the same 
segment as .data (or any symbol referenced via GP), and also from some 
of the alignment-related stuff like we recently ran into.

> 
>         .bss : {
>                 *(.bss*)
>         } >ram
>         PROVIDE(__bss_end = .);
> }

IIRC Nelson has already started looking into some of the alignment-based 
cross-section relaxation issues.  Probably best to let him chime in, 
it's the weekend in Taiwan already so that might just be next week.

Given that we've found two bugs here in the last few months we've 
probably got a few more floating around, so it's probably worth thinking 
through this a bit more.  That might take a little while, but sounds 
like it's worth doing.

Obviously happy to see a patch that fixes it if you feel like hacking on 
this stuff a bit ;)

> 
> -----Original Message-----
> From: Palmer Dabbelt <palmer@rivosinc.com> 
> Sent: Friday, July 28, 2023 4:03 PM
> To: Joseph Faulls <Joseph.Faulls@imgtec.com>
> Cc: nelson@rivosinc.com; binutils@sourceware.org
> Subject: RE: [EXTERNAL] Re: [PATCH] RISC-V: Do not gp relax against an ABS symbol if it is far away.
> 
> *** CAUTION: This email originates from a source not known to Imagination Technologies. Think before you click a link or open an attachment ***
> 
> On Fri, 28 Jul 2023 02:18:49 PDT (-0700), Joseph.Faulls@imgtec.com wrote:
>> Thanks for the replies!
>> 
>> 
>>   *   Do you have a reduced test that shows we still have a chance to get the relax error by using the default linker script?
>> Yes, you can re-produce this with the default linkerscript, but it’s a little harder. If you take the same assembly files as the test cases in this patch (gp-relax-abs-sym.s and gp-relax-abs.s), assemble and link an elf and see the value __global_pointer$. Then adjust the absolute symbol to be this + 0x800. Re-assemble + link, and you should see a truncation.
>> 
>> I came across this issue when using a custom linker script that provided some symbols (which are absolute) and linking a program with quite a few relaxations.
>> 
>>   *   I'm not quite sure what you mean by GP moving -- GP's the register, so it doesn't have a value until runtime.  GP is meant to equal __global_pointer$, but that's usually a data symbol and thus doesn't move as a result of relaxation.
>> Perhaps I’m using the terms incorrectly, but if the __global_pointer is set in the linkerscript at the end of some section (I think this is most cases), then this value will change if the size of any section before it changes, as would be in the case of relaxation. Could we not describe this as the GP moving?
> 
> I wouldn't, but I think it's just different meanings of the words here
> -- it shouldn't really matter for the purposes of fixing the bug, I think I understand what you're saying ;)
> 
> I think Nelson's point about cross-section GP relaxation is probably the root cause here, we've had some other bugs like that crop up recently.  
> Assuming that's the case then SHN_ABS is just a red herring: maybe it makes triggering the bug easier, but we can probably come up with a similar failure in other ways.
> 
> Assuming that's the case, both SHN_ABS and /2 are just a hueristic here.  
> We've got some tricky cases floating around for the cross-section stuff where we don't have a great answer yet, I wouldn't be surprised if this is another case where we just can't relax safely.
> 
> Can you post a reduced test case with a custom linker script that triggers the problem?  That should help us a bit more concrete.
> 
>> From: Nelson Chu <nelson@rivosinc.com>
>> Sent: Friday, July 28, 2023 2:16 AM
>> To: Palmer Dabbelt <palmer@rivosinc.com>
>> Cc: Joseph Faulls <Joseph.Faulls@imgtec.com>; binutils@sourceware.org
>> Subject: [EXTERNAL] Re: [PATCH] RISC-V: Do not gp relax against an ABS symbol if it is far away.
>> 
>> Maybe, if the shrink code is more than the data segment alignment in the default linker script (0x1000), then probably still have a chance to get the truncated gp by using default linker script...  I will suggest if that is so, then we just have a linker option to disallow/allow the gp relax for any abs symbol, that should be more intuitive.
>> 
>> On Fri, Jul 28, 2023 at 9:02 AM Nelson Chu <nelson@rivosinc.com<mailto:nelson@rivosinc.com>> wrote:
>> Relaxation should shrink the text, but since the default linker script has huge data segment alignment, data after the segment alignment shouldn't be affected and shrinked, so ideally  it should be safe by using default linker script.
>> 
>> Do you have a reduced test that shows we still have a chance to get the relax error by using the default linker script?
>> 
>> On Fri, Jul 28, 2023 at 5:08 AM Palmer Dabbelt <palmer@rivosinc.com<mailto:palmer@rivosinc.com>> wrote:
>> On Thu, 27 Jul 2023 09:48:24 PDT (-0700), Joseph.Faulls@imgtec.com<mailto:Joseph.Faulls@imgtec.com> wrote:
>>> Relaxations can cause the gp to move after it has been decided to gp 
>>> relax. Against an absolute symbol, the distance may change such that 
>>> the offset can no longer fit in the 12-bit immediate field.
>>>
>>> bfd/
>>>         * elfnn-riscv.c (_bfd_riscv_relax_pc) Do not gp relax against and ABS
>>>           symbol if it is far away.
>>> ld/
>>>         * ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp: Updated.
>>>         * ld/testsuite/ld-riscv-elf/gp-relax-abs*: New testcases.
>>> ---
>>> bfd/elfnn-riscv.c                            | 12 ++++++++++++
>>> ld/testsuite/ld-riscv-elf/gp-relax-abs-sym.s |  3 +++
>>> ld/testsuite/ld-riscv-elf/gp-relax-abs.d     | 14 ++++++++++++++
>>> ld/testsuite/ld-riscv-elf/gp-relax-abs.s     |  5 +++++
>>> ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp   |  1 +
>>> 5 files changed, 35 insertions(+)
>>> create mode 100644 ld/testsuite/ld-riscv-elf/gp-relax-abs-sym.s
>>> create mode 100644 ld/testsuite/ld-riscv-elf/gp-relax-abs.d
>>> create mode 100644 ld/testsuite/ld-riscv-elf/gp-relax-abs.s
>>>
>>> diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c index 
>>> 09aa7be225e..79e29e8b272 100644
>>> --- a/bfd/elfnn-riscv.c
>>> +++ b/bfd/elfnn-riscv.c
>>> @@ -4885,6 +4885,18 @@ _bfd_riscv_relax_pc (bfd *abfd ATTRIBUTE_UNUSED,
>>>               max_alignment = _bfd_riscv_get_max_alignment (sec, gp);
>>>               htab->max_alignment_for_gp = max_alignment;
>>>             }
>>> +         /* If the symbol is in the abs section, relaxation could cause the gp
>>> +          * to move such that the gp relocation is no longer possible.
>> 
>> I'm not quite sure what you mean by GP moving -- GP's the register, so 
>> it doesn't have a value until runtime.  GP is meant to equal 
>> __global_pointer$, but that's usually a data symbol and thus doesn't 
>> move as a result of relaxation.
>> 
>> We do have some other issues related to SHN_ABS (for example position 
>> independent vs PC relative stuff), so it's possible there's some other 
>> bug here?
>> 
>>> +          * Conservatively half the allowed distance, as it cannot be that
>>> +          * gp moves more than this, i.e. more than half the instructions be
>>> +          * deleted due to relaxation.  Do this by adjusting reserve_size.  */
>>> +         if (sym_sec->output_section == bfd_abs_section_ptr)
>>> +           {
>>> +             if (symval >= gp)
>>> +               reserve_size += (symval - gp) / 2;
>>> +             else
>>> +               reserve_size += (gp - symval) / 2;
>> 
>> We can have more than a factor of two from relaxation, for example 
>> from alignment.  If a hueristic is all we can do then I guess we'll 
>> have to live with it, but I think I'd want to understand the movement
>> 
>> An extreme case is that all the instruction can be relaxed without any huge alignments between text and data, so I guess that's probably why the half of distance comes from.  If the default linker script may cause the problem, then we will need to know more details and try to resolve it.  But if this only occurs without the default linker script, then I will suggest users disable that relax patterns by the .norelax directive, or related options of compiler/assembler/linker.  Or maybe we can have an option for users to reject gp relax when meeting abs symbols, if without using default linker script.  However, please give a reduced case by using default linker script that will be broken in detailed, thanks a lot.
>> 
>> Nelson
>> 
>>> +           }
>>>         }
>>>      }
>>>
>>> diff --git a/ld/testsuite/ld-riscv-elf/gp-relax-abs-sym.s 
>>> b/ld/testsuite/ld-riscv-elf/gp-relax-abs-sym.s
>>> new file mode 100644
>>> index 00000000000..a018bb3a50a
>>> --- /dev/null
>>> +++ b/ld/testsuite/ld-riscv-elf/gp-relax-abs-sym.s
>>> @@ -0,0 +1,3 @@
>>> +.section .data
>>> +.globl sym
>>> +.set sym,0x10804
>>> diff --git a/ld/testsuite/ld-riscv-elf/gp-relax-abs.d 
>>> b/ld/testsuite/ld-riscv-elf/gp-relax-abs.d
>>> new file mode 100644
>>> index 00000000000..2c7ab3a2579
>>> --- /dev/null
>>> +++ b/ld/testsuite/ld-riscv-elf/gp-relax-abs.d
>>> @@ -0,0 +1,14 @@
>>> +#source: gp-relax-abs.s
>>> +#source: gp-relax-abs-sym.s
>>> +#as: -march=rv64ic -mabi=lp64
>>> +#ld: -Tcode-model-01.ld -melf64lriscv
>>> +#objdump: -d
>>> +
>>> +.*:[   ]+file format .*
>>> +
>>> +Disassembly of section \.text:
>>> +
>>> +[0-9a-f]+ <_start>:
>>> +.*auipc.*
>>> +.*lw.*# [0-9a-f]* <sym>
>>> +#pass
>>> diff --git a/ld/testsuite/ld-riscv-elf/gp-relax-abs.s 
>>> b/ld/testsuite/ld-riscv-elf/gp-relax-abs.s
>>> new file mode 100644
>>> index 00000000000..db2103bafd1
>>> --- /dev/null
>>> +++ b/ld/testsuite/ld-riscv-elf/gp-relax-abs.s
>>> @@ -0,0 +1,5 @@
>>> +.text
>>> +.global _start
>>> +_start:
>>> +    auipc t0, %pcrel_hi(sym)
>>> +    lw t0, %pcrel_lo(_start)(t0)
>>> diff --git a/ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp 
>>> b/ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp
>>> index 947a266ba72..a53a2758991 100644
>>> --- a/ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp
>>> +++ b/ld/testsuite/ld-riscv-elf/ld-riscv-elf.exp
>>> @@ -172,6 +172,7 @@ if [istarget "riscv*-*-*"] {
>>>      run_dump_test "attr-merge-priv-spec-failed-06"
>>>      run_dump_test "attr-phdr"
>>>      run_dump_test "relax-max-align-gp"
>>> +    run_dump_test "gp-relax-abs"
>>>      run_dump_test "uleb128"
>>>      run_ld_link_tests [list \
>>>         [list "Weak reference 32" "-T weakref.ld 
>>> -m[riscv_choose_ilp32_emul]" "" \

  reply	other threads:[~2023-07-28 16:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-27 16:48 Joseph Faulls
2023-07-27 17:02 ` Andreas Schwab
2023-07-27 21:08 ` Palmer Dabbelt
2023-07-28  1:02   ` Nelson Chu
2023-07-28  1:15     ` Nelson Chu
2023-07-28  9:18       ` [EXTERNAL] " Joseph Faulls
2023-07-28 15:03         ` Palmer Dabbelt
2023-07-28 15:52           ` Joseph Faulls
2023-07-28 16:12             ` Palmer Dabbelt [this message]
2023-08-08  9:49 ` Maciej W. Rozycki
2023-08-08 10:38   ` [EXTERNAL] " Joseph Faulls
2023-08-08 12:04     ` Maciej W. Rozycki
2023-08-23 11:16     ` Joseph Faulls

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mhng-031337c1-3f82-41cb-a6c2-40e231a5fb8b@palmer-ri-x1c9 \
    --to=palmer@rivosinc.com \
    --cc=Joseph.Faulls@imgtec.com \
    --cc=binutils@sourceware.org \
    --cc=nelson@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).