public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* Risc - V failures inside ld-undefined
@ 2022-09-30 12:29 Clément Chigot
  2022-09-30 21:40 ` Palmer Dabbelt
  0 siblings, 1 reply; 4+ messages in thread
From: Clément Chigot @ 2022-09-30 12:29 UTC (permalink / raw)
  To: binutils

Hi all,

I've been running the ld testsuite for Risc-V targets using a cross
compiler and I'm seeing some failures in "undefined".
PASS: undefined
FAIL: undefined function
FAIL: undefined line

As I'm using a custom gcc, I just want confirmation from the Risc-V
maintainers that these two tests are either failing or being skipped
on their side.
I've a patch fixing the testsuite for cross-compiler which might
explain why it has always been skipped until now and why this error
has been masked.

Note that I've investigated a bit and it seems to be related to
R_RISCV_ALIGN and the associated linker relaxation.
Instead of showing
  | ld: tmpdir/undefined.o: in function `function':
  | .../undefined.c:9: undefined reference to `this_function_is_not_defined'
the error is:
  | riscv64-elf-ld: tmpdir/undefined.o: in function `.Ltext0':
  | undefined.c:(.text+0x0): undefined reference to
`this_function_is_not_defined'

The idea is that "function" has an alignment of 2 and thus gas will
add NOP instructions and a R_RISCV_ALIGN relocation before it (within
tc-riscv.c:riscv_frag_align_code).
  | $ riscv64-elf-objdump -D tmpdir/undefined.o
  | 0000000000000000 <.Ltext0>:
  |   0:   0001                    nop
  |
  | 0000000000000002 <function>:
  |   2:   00000317                auipc   t1,0x0
  |   6:   00030067                jr      t1 # 2 <function>

When the linker will relax these NOPs (in
elfnn-riscv.c:riscv_relax_delete_bytes), it will correctly change the
relocation offset but not the DWARF information. Thus, when the error
is trying to get the nearest line using _bfd_dwarf2_find_nearest_line,
it ends up using a modified offset with non-modified DWARF information
and targets the wrong symbol.
I guess riscv_relax_delete_bytes can be improved to adjust the DWARF
part too. But I'm not familiar with this part of bfd and I'm not sure
it's even possible.
Otherwise, I guess XFAIL these tests should be fine as they are
similar to other XFAILs related to DWARF-2 (on arm-elf). Even if not
having a "function" being shown in the error might be problematic.

Anyway, the fact that I'm using a custom gcc shouldn't matter here but
I want to confirm that first.

Thanks,
Clément

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Risc - V failures inside ld-undefined
  2022-09-30 12:29 Risc - V failures inside ld-undefined Clément Chigot
@ 2022-09-30 21:40 ` Palmer Dabbelt
  2022-10-03 12:09   ` Clément Chigot
  0 siblings, 1 reply; 4+ messages in thread
From: Palmer Dabbelt @ 2022-09-30 21:40 UTC (permalink / raw)
  To: binutils

On Fri, 30 Sep 2022 05:29:36 PDT (-0700), binutils@sourceware.org wrote:
> Hi all,
>
> I've been running the ld testsuite for Risc-V targets using a cross
> compiler and I'm seeing some failures in "undefined".
> PASS: undefined
> FAIL: undefined function
> FAIL: undefined line
>
> As I'm using a custom gcc, I just want confirmation from the Risc-V
> maintainers that these two tests are either failing or being skipped
> on their side.

They're not failing for me in a local run, unless I'm somehow 
misunderstanding the test results.  My "make check-ld" shows

		=== ld Summary ===

# of expected passes		797
# of expected failures		8
# of untested testcases		26
# of unsupported tests		175
./ld-new 2.39.50.20220930

> I've a patch fixing the testsuite for cross-compiler which might
> explain why it has always been skipped until now and why this error
> has been masked.

OK, probably best to send the patch, then?  There's certainly been cases 
before where we've missed failures because they're being skipped due to 
environment issues, if it's possible to get the test running for more 
people then it'll be more likely to get fixed -- at least with it 
showing up in the tests as an unexpected failure it'll be harder to 
forget about.

> Note that I've investigated a bit and it seems to be related to
> R_RISCV_ALIGN and the associated linker relaxation.
> Instead of showing
>   | ld: tmpdir/undefined.o: in function `function':
>   | .../undefined.c:9: undefined reference to `this_function_is_not_defined'
> the error is:
>   | riscv64-elf-ld: tmpdir/undefined.o: in function `.Ltext0':
>   | undefined.c:(.text+0x0): undefined reference to
> `this_function_is_not_defined'
>
> The idea is that "function" has an alignment of 2 and thus gas will
> add NOP instructions and a R_RISCV_ALIGN relocation before it (within
> tc-riscv.c:riscv_frag_align_code).
>   | $ riscv64-elf-objdump -D tmpdir/undefined.o
>   | 0000000000000000 <.Ltext0>:
>   |   0:   0001                    nop
>   |
>   | 0000000000000002 <function>:
>   |   2:   00000317                auipc   t1,0x0
>   |   6:   00030067                jr      t1 # 2 <function>
>
> When the linker will relax these NOPs (in
> elfnn-riscv.c:riscv_relax_delete_bytes), it will correctly change the
> relocation offset but not the DWARF information. Thus, when the error
> is trying to get the nearest line using _bfd_dwarf2_find_nearest_line,
> it ends up using a modified offset with non-modified DWARF information
> and targets the wrong symbol.
> I guess riscv_relax_delete_bytes can be improved to adjust the DWARF
> part too. But I'm not familiar with this part of bfd and I'm not sure
> it's even possible.
> Otherwise, I guess XFAIL these tests should be fine as they are
> similar to other XFAILs related to DWARF-2 (on arm-elf). Even if not
> having a "function" being shown in the error might be problematic.

That seems like a plausible issue, and it's certainly not surprising 
that there's a bug in debug info handling related to relaxation -- both 
because we've had those before, and because it's the combination of some 
complex and hard to test topics.  Probably best to just start with 
getting this running for everyone so we can chew on it a bit, if it is a 
XFAIL then we should at least have some sort of error here rather than 
just producing bad binaries.

> Anyway, the fact that I'm using a custom gcc shouldn't matter here but
> I want to confirm that first.

That also seems reasonable, but there's quite a bit of coupling between 
the compiler, assembler, and linker when it comes to DWARF on RISC-V as 
we don't support all the flavors of DWARF-relate directives.  We should 
have errors for all those, but it's entirely possible something has been 
missed.  That's even more likely if your custom compiler that's doing 
something different than we normally test against, but IMO that's still 
a bug that we should fix (at least with an error message).

>
> Thanks,
> Clément

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Risc - V failures inside ld-undefined
  2022-09-30 21:40 ` Palmer Dabbelt
@ 2022-10-03 12:09   ` Clément Chigot
  2022-10-03 17:10     ` Palmer Dabbelt
  0 siblings, 1 reply; 4+ messages in thread
From: Clément Chigot @ 2022-10-03 12:09 UTC (permalink / raw)
  To: Palmer Dabbelt; +Cc: binutils

On Fri, Sep 30, 2022 at 11:40 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Fri, 30 Sep 2022 05:29:36 PDT (-0700), binutils@sourceware.org wrote:
> > Hi all,
> >
> > I've been running the ld testsuite for Risc-V targets using a cross
> > compiler and I'm seeing some failures in "undefined".
> > PASS: undefined
> > FAIL: undefined function
> > FAIL: undefined line
> >
> > As I'm using a custom gcc, I just want confirmation from the Risc-V
> > maintainers that these two tests are either failing or being skipped
> > on their side.
>
> They're not failing for me in a local run, unless I'm somehow
> misunderstanding the test results.  My "make check-ld" shows
>
>                 === ld Summary ===
>
> # of expected passes            797
> # of expected failures          8
> # of untested testcases         26
> # of unsupported tests          175
> ./ld-new 2.39.50.20220930

I have less untested testcases but less tests overall.

                === ld Summary ===

# of expected passes            493
# of unexpected failures        5       (3 expected + 2 with undefined)
# of unexpected successes       9
# of expected failures          7
# of untested testcases         7
# of unsupported tests          193

You can probably check through ld.sum what the status of "undefined",
"undefined function" and "undefined line".

> > I've a patch fixing the testsuite for cross-compiler which might
> > explain why it has always been skipped until now and why this error
> > has been masked.
>
> OK, probably best to send the patch, then?  There's certainly been cases
> before where we've missed failures because they're being skipped due to
> environment issues, if it's possible to get the test running for more
> people then it'll be more likely to get fixed -- at least with it
> showing up in the tests as an unexpected failure it'll be harder to
> forget about.

Yes, this is done (see "[PATCH] ld/testsuite: consistently add
board_ldflags when linking with GCC").
It aims to resolve some incoherency between "check_compiler_available"
and how the compiler is later used.

> > Note that I've investigated a bit and it seems to be related to
> > R_RISCV_ALIGN and the associated linker relaxation.
> > Instead of showing
> >   | ld: tmpdir/undefined.o: in function `function':
> >   | .../undefined.c:9: undefined reference to `this_function_is_not_defined'
> > the error is:
> >   | riscv64-elf-ld: tmpdir/undefined.o: in function `.Ltext0':
> >   | undefined.c:(.text+0x0): undefined reference to
> > `this_function_is_not_defined'
> >
> > The idea is that "function" has an alignment of 2 and thus gas will
> > add NOP instructions and a R_RISCV_ALIGN relocation before it (within
> > tc-riscv.c:riscv_frag_align_code).
> >   | $ riscv64-elf-objdump -D tmpdir/undefined.o
> >   | 0000000000000000 <.Ltext0>:
> >   |   0:   0001                    nop
> >   |
> >   | 0000000000000002 <function>:
> >   |   2:   00000317                auipc   t1,0x0
> >   |   6:   00030067                jr      t1 # 2 <function>
> >
> > When the linker will relax these NOPs (in
> > elfnn-riscv.c:riscv_relax_delete_bytes), it will correctly change the
> > relocation offset but not the DWARF information. Thus, when the error
> > is trying to get the nearest line using _bfd_dwarf2_find_nearest_line,
> > it ends up using a modified offset with non-modified DWARF information
> > and targets the wrong symbol.
> > I guess riscv_relax_delete_bytes can be improved to adjust the DWARF
> > part too. But I'm not familiar with this part of bfd and I'm not sure
> > it's even possible.
> > Otherwise, I guess XFAIL these tests should be fine as they are
> > similar to other XFAILs related to DWARF-2 (on arm-elf). Even if not
> > having a "function" being shown in the error might be problematic.
>
> That seems like a plausible issue, and it's certainly not surprising
> that there's a bug in debug info handling related to relaxation -- both
> because we've had those before, and because it's the combination of some
> complex and hard to test topics.  Probably best to just start with
> getting this running for everyone so we can chew on it a bit, if it is a
> XFAIL then we should at least have some sort of error here rather than
> just producing bad binaries.

IIUC, the binaries are fined at the end. At least, I've quickly
checked the DWARF infos and they don't look wrong to me.
The problem is more that when we are raising the issue for the
"undefined" symbol we are in the middle of the relocation phase. The
DWARF infos are then not yet corrected, resulting in the error message
being wrong.

> > Anyway, the fact that I'm using a custom gcc shouldn't matter here but
> > I want to confirm that first.
>
> That also seems reasonable, but there's quite a bit of coupling between
> the compiler, assembler, and linker when it comes to DWARF on RISC-V as
> we don't support all the flavors of DWARF-relate directives.  We should
> have errors for all those, but it's entirely possible something has been
> missed.  That's even more likely if your custom compiler that's doing
> something different than we normally test against, but IMO that's still
> a bug that we should fix (at least with an error message).

I shouldn't assume things without checking.
This seems indeed related to a change in our gcc. We are adding a
".align 2" before "function" which results in the relaxation part
being triggered. But as you said and I agree with that, there is still
a bug behind it.
I'll see what I can do. Either add a warning/error or try to fix it.

Thanks,
Clément

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Risc - V failures inside ld-undefined
  2022-10-03 12:09   ` Clément Chigot
@ 2022-10-03 17:10     ` Palmer Dabbelt
  0 siblings, 0 replies; 4+ messages in thread
From: Palmer Dabbelt @ 2022-10-03 17:10 UTC (permalink / raw)
  To: chigot; +Cc: binutils

On Mon, 03 Oct 2022 05:09:42 PDT (-0700), chigot@adacore.com wrote:
> On Fri, Sep 30, 2022 at 11:40 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>>
>> On Fri, 30 Sep 2022 05:29:36 PDT (-0700), binutils@sourceware.org wrote:
>> > Hi all,
>> >
>> > I've been running the ld testsuite for Risc-V targets using a cross
>> > compiler and I'm seeing some failures in "undefined".
>> > PASS: undefined
>> > FAIL: undefined function
>> > FAIL: undefined line
>> >
>> > As I'm using a custom gcc, I just want confirmation from the Risc-V
>> > maintainers that these two tests are either failing or being skipped
>> > on their side.
>>
>> They're not failing for me in a local run, unless I'm somehow
>> misunderstanding the test results.  My "make check-ld" shows
>>
>>                 === ld Summary ===
>>
>> # of expected passes            797
>> # of expected failures          8
>> # of untested testcases         26
>> # of unsupported tests          175
>> ./ld-new 2.39.50.20220930
>
> I have less untested testcases but less tests overall.
>
>                 === ld Summary ===
>
> # of expected passes            493
> # of unexpected failures        5       (3 expected + 2 with undefined)
> # of unexpected successes       9
> # of expected failures          7
> # of untested testcases         7
> # of unsupported tests          193
>
> You can probably check through ld.sum what the status of "undefined",
> "undefined function" and "undefined line".

Yep, untested on my end

Running /home/palmer/life/binutils-gdb/ld/testsuite/ld-undefined/undefined.exp ...
UNTESTED: undefined
UNTESTED: undefined function
UNTESTED: undefined line
PASS: undefined symbols in shared lib (dyn sym)
PASS: undefined symbols in shared lib (dyn reloc)

>> > I've a patch fixing the testsuite for cross-compiler which might
>> > explain why it has always been skipped until now and why this error
>> > has been masked.
>>
>> OK, probably best to send the patch, then?  There's certainly been cases
>> before where we've missed failures because they're being skipped due to
>> environment issues, if it's possible to get the test running for more
>> people then it'll be more likely to get fixed -- at least with it
>> showing up in the tests as an unexpected failure it'll be harder to
>> forget about.
>
> Yes, this is done (see "[PATCH] ld/testsuite: consistently add
> board_ldflags when linking with GCC").
> It aims to resolve some incoherency between "check_compiler_available"
> and how the compiler is later used.

Thanks, I replied over there.

>> > Note that I've investigated a bit and it seems to be related to
>> > R_RISCV_ALIGN and the associated linker relaxation.
>> > Instead of showing
>> >   | ld: tmpdir/undefined.o: in function `function':
>> >   | .../undefined.c:9: undefined reference to `this_function_is_not_defined'
>> > the error is:
>> >   | riscv64-elf-ld: tmpdir/undefined.o: in function `.Ltext0':
>> >   | undefined.c:(.text+0x0): undefined reference to
>> > `this_function_is_not_defined'
>> >
>> > The idea is that "function" has an alignment of 2 and thus gas will
>> > add NOP instructions and a R_RISCV_ALIGN relocation before it (within
>> > tc-riscv.c:riscv_frag_align_code).
>> >   | $ riscv64-elf-objdump -D tmpdir/undefined.o
>> >   | 0000000000000000 <.Ltext0>:
>> >   |   0:   0001                    nop
>> >   |
>> >   | 0000000000000002 <function>:
>> >   |   2:   00000317                auipc   t1,0x0
>> >   |   6:   00030067                jr      t1 # 2 <function>
>> >
>> > When the linker will relax these NOPs (in
>> > elfnn-riscv.c:riscv_relax_delete_bytes), it will correctly change the
>> > relocation offset but not the DWARF information. Thus, when the error
>> > is trying to get the nearest line using _bfd_dwarf2_find_nearest_line,
>> > it ends up using a modified offset with non-modified DWARF information
>> > and targets the wrong symbol.
>> > I guess riscv_relax_delete_bytes can be improved to adjust the DWARF
>> > part too. But I'm not familiar with this part of bfd and I'm not sure
>> > it's even possible.
>> > Otherwise, I guess XFAIL these tests should be fine as they are
>> > similar to other XFAILs related to DWARF-2 (on arm-elf). Even if not
>> > having a "function" being shown in the error might be problematic.
>>
>> That seems like a plausible issue, and it's certainly not surprising
>> that there's a bug in debug info handling related to relaxation -- both
>> because we've had those before, and because it's the combination of some
>> complex and hard to test topics.  Probably best to just start with
>> getting this running for everyone so we can chew on it a bit, if it is a
>> XFAIL then we should at least have some sort of error here rather than
>> just producing bad binaries.
>
> IIUC, the binaries are fined at the end. At least, I've quickly
> checked the DWARF infos and they don't look wrong to me.
> The problem is more that when we are raising the issue for the
> "undefined" symbol we are in the middle of the relocation phase. The
> DWARF infos are then not yet corrected, resulting in the error message
> being wrong.

OK, that's a lot less scary.

>> > Anyway, the fact that I'm using a custom gcc shouldn't matter here but
>> > I want to confirm that first.
>>
>> That also seems reasonable, but there's quite a bit of coupling between
>> the compiler, assembler, and linker when it comes to DWARF on RISC-V as
>> we don't support all the flavors of DWARF-relate directives.  We should
>> have errors for all those, but it's entirely possible something has been
>> missed.  That's even more likely if your custom compiler that's doing
>> something different than we normally test against, but IMO that's still
>> a bug that we should fix (at least with an error message).
>
> I shouldn't assume things without checking.
> This seems indeed related to a change in our gcc. We are adding a
> ".align 2" before "function" which results in the relaxation part
> being triggered. But as you said and I agree with that, there is still
> a bug behind it.

You can probably trigger that with a regular GCC, the non-C targets all 
align functions to 4 bytes (as the non-C ISAs don't allow 2-byte aligned 
instructions).

> I'll see what I can do. Either add a warning/error or try to fix it.

Thanks.  Sounds like this one would be simple to produce a directed 
assembly test for, so if you don't want to sort that out then just 
sending along a patch that adds the failing test case would be great.  
That makes these way easier to reproduce, as rather than relying on a 
specific flavor of toolchain we can just sort out the bug directly.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-10-03 17:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30 12:29 Risc - V failures inside ld-undefined Clément Chigot
2022-09-30 21:40 ` Palmer Dabbelt
2022-10-03 12:09   ` Clément Chigot
2022-10-03 17:10     ` Palmer Dabbelt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).