From: Palmer Dabbelt <palmer@dabbelt.com>
To: chigot@adacore.com
Cc: binutils@sourceware.org
Subject: Re: Risc - V failures inside ld-undefined
Date: Mon, 03 Oct 2022 10:10:41 -0700 (PDT) [thread overview]
Message-ID: <mhng-2152513e-ffd9-46f7-bdcd-e3adb98690e9@palmer-ri-x1c9> (raw)
In-Reply-To: <CAJ307EjHyWU7AgHPTtFH2ySjT3-fph52A5Lwmk2vmnKiBd3YTQ@mail.gmail.com>
On Mon, 03 Oct 2022 05:09:42 PDT (-0700), chigot@adacore.com wrote:
> On Fri, Sep 30, 2022 at 11:40 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>> On Fri, 30 Sep 2022 05:29:36 PDT (-0700), binutils@sourceware.org wrote:
>> > Hi all,
>> >
>> > I've been running the ld testsuite for Risc-V targets using a cross
>> > compiler and I'm seeing some failures in "undefined".
>> > PASS: undefined
>> > FAIL: undefined function
>> > FAIL: undefined line
>> >
>> > As I'm using a custom gcc, I just want confirmation from the Risc-V
>> > maintainers that these two tests are either failing or being skipped
>> > on their side.
>>
>> They're not failing for me in a local run, unless I'm somehow
>> misunderstanding the test results. My "make check-ld" shows
>>
>> === ld Summary ===
>>
>> # of expected passes 797
>> # of expected failures 8
>> # of untested testcases 26
>> # of unsupported tests 175
>> ./ld-new 2.39.50.20220930
>
> I have less untested testcases but less tests overall.
>
> === ld Summary ===
>
> # of expected passes 493
> # of unexpected failures 5 (3 expected + 2 with undefined)
> # of unexpected successes 9
> # of expected failures 7
> # of untested testcases 7
> # of unsupported tests 193
>
> You can probably check through ld.sum what the status of "undefined",
> "undefined function" and "undefined line".
Yep, untested on my end
Running /home/palmer/life/binutils-gdb/ld/testsuite/ld-undefined/undefined.exp ...
UNTESTED: undefined
UNTESTED: undefined function
UNTESTED: undefined line
PASS: undefined symbols in shared lib (dyn sym)
PASS: undefined symbols in shared lib (dyn reloc)
>> > I've a patch fixing the testsuite for cross-compiler which might
>> > explain why it has always been skipped until now and why this error
>> > has been masked.
>>
>> OK, probably best to send the patch, then? There's certainly been cases
>> before where we've missed failures because they're being skipped due to
>> environment issues, if it's possible to get the test running for more
>> people then it'll be more likely to get fixed -- at least with it
>> showing up in the tests as an unexpected failure it'll be harder to
>> forget about.
>
> Yes, this is done (see "[PATCH] ld/testsuite: consistently add
> board_ldflags when linking with GCC").
> It aims to resolve some incoherency between "check_compiler_available"
> and how the compiler is later used.
Thanks, I replied over there.
>> > Note that I've investigated a bit and it seems to be related to
>> > R_RISCV_ALIGN and the associated linker relaxation.
>> > Instead of showing
>> > | ld: tmpdir/undefined.o: in function `function':
>> > | .../undefined.c:9: undefined reference to `this_function_is_not_defined'
>> > the error is:
>> > | riscv64-elf-ld: tmpdir/undefined.o: in function `.Ltext0':
>> > | undefined.c:(.text+0x0): undefined reference to
>> > `this_function_is_not_defined'
>> >
>> > The idea is that "function" has an alignment of 2 and thus gas will
>> > add NOP instructions and a R_RISCV_ALIGN relocation before it (within
>> > tc-riscv.c:riscv_frag_align_code).
>> > | $ riscv64-elf-objdump -D tmpdir/undefined.o
>> > | 0000000000000000 <.Ltext0>:
>> > | 0: 0001 nop
>> > |
>> > | 0000000000000002 <function>:
>> > | 2: 00000317 auipc t1,0x0
>> > | 6: 00030067 jr t1 # 2 <function>
>> >
>> > When the linker will relax these NOPs (in
>> > elfnn-riscv.c:riscv_relax_delete_bytes), it will correctly change the
>> > relocation offset but not the DWARF information. Thus, when the error
>> > is trying to get the nearest line using _bfd_dwarf2_find_nearest_line,
>> > it ends up using a modified offset with non-modified DWARF information
>> > and targets the wrong symbol.
>> > I guess riscv_relax_delete_bytes can be improved to adjust the DWARF
>> > part too. But I'm not familiar with this part of bfd and I'm not sure
>> > it's even possible.
>> > Otherwise, I guess XFAIL these tests should be fine as they are
>> > similar to other XFAILs related to DWARF-2 (on arm-elf). Even if not
>> > having a "function" being shown in the error might be problematic.
>>
>> That seems like a plausible issue, and it's certainly not surprising
>> that there's a bug in debug info handling related to relaxation -- both
>> because we've had those before, and because it's the combination of some
>> complex and hard to test topics. Probably best to just start with
>> getting this running for everyone so we can chew on it a bit, if it is a
>> XFAIL then we should at least have some sort of error here rather than
>> just producing bad binaries.
>
> IIUC, the binaries are fined at the end. At least, I've quickly
> checked the DWARF infos and they don't look wrong to me.
> The problem is more that when we are raising the issue for the
> "undefined" symbol we are in the middle of the relocation phase. The
> DWARF infos are then not yet corrected, resulting in the error message
> being wrong.
OK, that's a lot less scary.
>> > Anyway, the fact that I'm using a custom gcc shouldn't matter here but
>> > I want to confirm that first.
>>
>> That also seems reasonable, but there's quite a bit of coupling between
>> the compiler, assembler, and linker when it comes to DWARF on RISC-V as
>> we don't support all the flavors of DWARF-relate directives. We should
>> have errors for all those, but it's entirely possible something has been
>> missed. That's even more likely if your custom compiler that's doing
>> something different than we normally test against, but IMO that's still
>> a bug that we should fix (at least with an error message).
>
> I shouldn't assume things without checking.
> This seems indeed related to a change in our gcc. We are adding a
> ".align 2" before "function" which results in the relaxation part
> being triggered. But as you said and I agree with that, there is still
> a bug behind it.
You can probably trigger that with a regular GCC, the non-C targets all
align functions to 4 bytes (as the non-C ISAs don't allow 2-byte aligned
instructions).
> I'll see what I can do. Either add a warning/error or try to fix it.
Thanks. Sounds like this one would be simple to produce a directed
assembly test for, so if you don't want to sort that out then just
sending along a patch that adds the failing test case would be great.
That makes these way easier to reproduce, as rather than relying on a
specific flavor of toolchain we can just sort out the bug directly.
prev parent reply other threads:[~2022-10-03 17:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-30 12:29 Clément Chigot
2022-09-30 21:40 ` Palmer Dabbelt
2022-10-03 12:09 ` Clément Chigot
2022-10-03 17:10 ` Palmer Dabbelt [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mhng-2152513e-ffd9-46f7-bdcd-e3adb98690e9@palmer-ri-x1c9 \
--to=palmer@dabbelt.com \
--cc=binutils@sourceware.org \
--cc=chigot@adacore.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).