* [PR24444] speed up locview resolution wiht relaxable frags @ 2019-04-13 8:57 Alexandre Oliva 2019-04-14 13:22 ` Alan Modra 0 siblings, 1 reply; 10+ messages in thread From: Alexandre Oliva @ 2019-04-13 8:57 UTC (permalink / raw) To: binutils Targets such as xtensa incur a much higher overhead to resolve location view numbers than e.g. x86, because the expressions used to compute view numbers cannot be resolved soon enough. Each view number is computed by incrementing the previous view, if they are both at the same address, or by resetting it to zero otherwise. If PV is the previous view number, PL is its location, and NL is the location of the next view, its number is computed by evaluating NV = !(NL > PL) * (PV + 1). set_or_check_view uses resolve_expression to decide whether portions of this expression can be simplified to constants. The (NL > PL) subexpression is one that can often be resolved to a constant, breaking chains of view number computations at instructions of nonzero length, but not after alignment that might be unnecessary. Alas, when nearly every frag ends with a relaxable instruction, frag_offset_fixed_p will correctly fail to determine a known offset between two unresolved addresses in neighboring frags, so the unresolved symbolic operation will be constructed and used in the computation of most view numbers. This results in very deep expressions. As view numbers get referenced in location view lists, each operand in the list goes through symbol_clone_if_forward_ref, which recurses on every subexpression. If each view number were to be referenced, this would exhibit O(n^2) behavior, where n is the depth of the view number expressions, i.e., the length of view number sequences without an early resolution that cuts the expression short. I see two ways to go about this: - make forward_ref a tri-state, so that symbol_clone_if_forward_ref can avoid revisiting the same nodes over and over when they have already been visited and the conclusion was that their expression does not contain forward refs. - enable address compares used by view numbering to be resolved even when exact offsets are not known, using new logic to determine when the location either remained the same or changed for sure, even with the possibility of relaxation. This patch implements the latter, assuming fr_fix to be the minimum size the frag will end up with, even for machine-dependent frags, but not rs_org: for those, or for negative fr_fix, it conservatively refrains from resolving the expression. This enables most view number expressions to be resolved with a small, reasonable depth. Now, is the assumption on fr_fix a reasonable one to make across all supported targets? Tested on x86_64-linux-gnu, native and cross to xtensa-elf. No new testcase; the issue was just performance, not output correctness. Ok to install? for gas/ChangeLog PR gas/24444 * frags.c (frag_gtoffset_p): New. * frags.h (frag_gtoffset_p): Declare it. * expr.c (resolve_expression): Use it. --- gas/expr.c | 5 +++- gas/frags.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ gas/frags.h | 2 ++ 3 files changed, 80 insertions(+), 1 deletion(-) diff --git a/gas/expr.c b/gas/expr.c index ee85bda1cc60..3efde88cc0a2 100644 --- a/gas/expr.c +++ b/gas/expr.c @@ -2179,7 +2179,10 @@ resolve_expression (expressionS *expressionP) || op == O_lt || op == O_le || op == O_ge || op == O_gt) && seg_left == seg_right && (finalize_syms - || frag_offset_fixed_p (frag_left, frag_right, &frag_off)) + || frag_offset_fixed_p (frag_left, frag_right, &frag_off) + || (op == O_gt + && frag_gtoffset_p (left, frag_left, + right, frag_right, &frag_off))) && (seg_left != reg_section || left == right) && (seg_left != undefined_section || add_symbol == op_symbol))) { diff --git a/gas/frags.c b/gas/frags.c index 90096ff696d7..e2ff79fbcc04 100644 --- a/gas/frags.c +++ b/gas/frags.c @@ -460,3 +460,77 @@ frag_offset_fixed_p (const fragS *frag1, const fragS *frag2, offsetT *offset) return FALSE; } + +/* Return TRUE if we can determine whether FRAG1 OFF1 appears after + (strict >, not >=) FRAG2 OFF2, assuming it is not before. Set + *OFFSET so that resolve_expression will resolve an O_gt operation + between them to false (0) if they are guaranteed to be at the same + location, or to true (-1) if they are guaranteed to be at different + locations. Return FALSE conservatively, e.g. if neither result can + be guaranteed (yet). + + They are known to be in the same segment, and not the same frag + (this is a fallback for frag_offset_fixed_p, that always takes care + of this case), and it is expected (from the uses this is designed + to simplify, namely location view increments) that frag1 is + reachable from frag2 following the fr_next links, rather than the + other way round. */ + +bfd_boolean +frag_gtoffset_p (valueT off1, const fragS *frag1, + valueT off2, const fragS *frag2, offsetT *offset) +{ + if (frag1 == frag2 || off2 > (valueT)frag2->fr_fix) + return FALSE; + + /* If any frag from frag2, inclusive, to frag1, exclusive, has a + nonzero fixed length, the addresses won't be the same as long as + we reach frag1. */ + bfd_boolean maybe_same_addr = off1 == 0 && off2 == (valueT)frag2->fr_fix; + bfd_boolean found_varlength = FALSE; + const fragS *frag = frag2; + for (;;) + { + switch (frag->fr_type) + { + case rs_dummy: + break; + + case rs_fill: + case rs_align: + case rs_align_code: + case rs_align_test: + case rs_space: + case rs_space_nop: + case rs_fill_nop: + case rs_leb128: + case rs_cfa: + case rs_dwarf2dbg: + case rs_machine_dependent: + found_varlength = TRUE; + break; + + default: + /* case rs_broken_word: */ + case rs_org: + return FALSE; + } + + frag = frag->fr_next; + if (frag == frag1 || !frag) + break; + if (maybe_same_addr && frag->fr_fix > 0) + maybe_same_addr = FALSE; + if (frag->fr_fix < 0) + return FALSE; + } + + /* If both are TRUE, we don't know whether or not it will resolve to + the same address. */ + if (maybe_same_addr && found_varlength) + return FALSE; + + if (frag) + *offset = (off1 - off2 - !maybe_same_addr) * OCTETS_PER_BYTE; + return !!frag; +} diff --git a/gas/frags.h b/gas/frags.h index 6ac5d4426c79..9eb066229085 100644 --- a/gas/frags.h +++ b/gas/frags.h @@ -154,6 +154,8 @@ char *frag_var (relax_stateT type, char *opcode); bfd_boolean frag_offset_fixed_p (const fragS *, const fragS *, offsetT *); +bfd_boolean frag_gtoffset_p (valueT, const fragS *, valueT, const fragS *, + offsetT *); int get_frag_count (void); void clear_frag_count (void); -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain Engineer Free Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-13 8:57 [PR24444] speed up locview resolution wiht relaxable frags Alexandre Oliva @ 2019-04-14 13:22 ` Alan Modra 2019-04-16 13:24 ` Alexandre Oliva 0 siblings, 1 reply; 10+ messages in thread From: Alan Modra @ 2019-04-14 13:22 UTC (permalink / raw) To: Alexandre Oliva; +Cc: binutils On Sat, Apr 13, 2019 at 05:55:34AM -0300, Alexandre Oliva wrote: > > Targets such as xtensa incur a much higher overhead to resolve > location view numbers than e.g. x86, because the expressions used to > compute view numbers cannot be resolved soon enough. > > Each view number is computed by incrementing the previous view, if > they are both at the same address, or by resetting it to zero > otherwise. If PV is the previous view number, PL is its location, and > NL is the location of the next view, its number is computed by > evaluating NV = !(NL > PL) * (PV + 1). > > set_or_check_view uses resolve_expression to decide whether portions > of this expression can be simplified to constants. The (NL > PL) > subexpression is one that can often be resolved to a constant, > breaking chains of view number computations at instructions of nonzero > length, but not after alignment that might be unnecessary. > > Alas, when nearly every frag ends with a relaxable instruction, > frag_offset_fixed_p will correctly fail to determine a known offset > between two unresolved addresses in neighboring frags, so the > unresolved symbolic operation will be constructed and used in the > computation of most view numbers. This results in very deep > expressions. > > As view numbers get referenced in location view lists, each operand in > the list goes through symbol_clone_if_forward_ref, which recurses on > every subexpression. If each view number were to be referenced, this > would exhibit O(n^2) behavior, where n is the depth of the view number > expressions, i.e., the length of view number sequences without an > early resolution that cuts the expression short. > > I see two ways to go about this: > > - make forward_ref a tri-state, so that symbol_clone_if_forward_ref > can avoid revisiting the same nodes over and over when they have > already been visited and the conclusion was that their expression does > not contain forward refs. This doesn't work. I threw together a quick patch when I saw this bug report and the result was a reduction in gas run-time from 89 to 80 minutes on the testcase in the PR. The problem is simply the depth of the expressions.. > - enable address compares used by view numbering to be resolved even > when exact offsets are not known, using new logic to determine when > the location either remained the same or changed for sure, even with > the possibility of relaxation. Whereas this approach does in fact result in shorter expressions, reducing run-time to 3 seconds. > This patch implements the latter, assuming fr_fix to be the minimum > size the frag will end up with, even for machine-dependent frags, but > not rs_org: for those, or for negative fr_fix, it conservatively > refrains from resolving the expression. This enables most view number > expressions to be resolved with a small, reasonable depth. > > Now, is the assumption on fr_fix a reasonable one to make across all > supported targets? Yes, I believe so. In fact, fr_fix is really unsigned. Also, it is an error for rs_org frags to go backwards. > Tested on x86_64-linux-gnu, native and cross to xtensa-elf. No new > testcase; the issue was just performance, not output correctness. Ok to > install? Um, the testcase object file after this patch differs from the original. First readelf -wi difference shown below. @@ -109941,7 +109941,7 @@ <2><3e4f6>: Abbrev Number: 8 (DW_TAG_inlined_subroutine) <3e4f7> DW_AT_abstract_origin: <0x2e531> <3e4fb> DW_AT_entry_pc : 0x2a - <3e4ff> DW_AT_GNU_entry_view: 9 + <3e4ff> DW_AT_GNU_entry_view: 2 <3e501> DW_AT_low_pc : 0x2a <3e505> DW_AT_high_pc : 0x0 <3e509> DW_AT_call_file : 3 > > > for gas/ChangeLog > > PR gas/24444 > * frags.c (frag_gtoffset_p): New. > * frags.h (frag_gtoffset_p): Declare it. > * expr.c (resolve_expression): Use it. > --- > gas/expr.c | 5 +++- > gas/frags.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > gas/frags.h | 2 ++ > 3 files changed, 80 insertions(+), 1 deletion(-) > > diff --git a/gas/expr.c b/gas/expr.c > index ee85bda1cc60..3efde88cc0a2 100644 > --- a/gas/expr.c > +++ b/gas/expr.c > @@ -2179,7 +2179,10 @@ resolve_expression (expressionS *expressionP) > || op == O_lt || op == O_le || op == O_ge || op == O_gt) > && seg_left == seg_right > && (finalize_syms > - || frag_offset_fixed_p (frag_left, frag_right, &frag_off)) > + || frag_offset_fixed_p (frag_left, frag_right, &frag_off) > + || (op == O_gt > + && frag_gtoffset_p (left, frag_left, > + right, frag_right, &frag_off))) > && (seg_left != reg_section || left == right) > && (seg_left != undefined_section || add_symbol == op_symbol))) > { > diff --git a/gas/frags.c b/gas/frags.c > index 90096ff696d7..e2ff79fbcc04 100644 > --- a/gas/frags.c > +++ b/gas/frags.c > @@ -460,3 +460,77 @@ frag_offset_fixed_p (const fragS *frag1, const fragS *frag2, offsetT *offset) > > return FALSE; > } > + > +/* Return TRUE if we can determine whether FRAG1 OFF1 appears after > + (strict >, not >=) FRAG2 OFF2, assuming it is not before. Set > + *OFFSET so that resolve_expression will resolve an O_gt operation > + between them to false (0) if they are guaranteed to be at the same > + location, or to true (-1) if they are guaranteed to be at different > + locations. Return FALSE conservatively, e.g. if neither result can > + be guaranteed (yet). > + > + They are known to be in the same segment, and not the same frag > + (this is a fallback for frag_offset_fixed_p, that always takes care > + of this case), and it is expected (from the uses this is designed > + to simplify, namely location view increments) that frag1 is > + reachable from frag2 following the fr_next links, rather than the > + other way round. */ For the sake of anyone reading this code, please swap the param names, with frag1/off1 being the first in the frag chain. > + > +bfd_boolean > +frag_gtoffset_p (valueT off1, const fragS *frag1, > + valueT off2, const fragS *frag2, offsetT *offset) > +{ > + if (frag1 == frag2 || off2 > (valueT)frag2->fr_fix) > + return FALSE; > + > + /* If any frag from frag2, inclusive, to frag1, exclusive, has a > + nonzero fixed length, the addresses won't be the same as long as > + we reach frag1. */ > + bfd_boolean maybe_same_addr = off1 == 0 && off2 == (valueT)frag2->fr_fix; Ins't it true that if the symbol offsets are not at exactly the above values, then you already know the result and there's no need to traverse the frags? This is assuming you can't .org backwards, which is the case. > + bfd_boolean found_varlength = FALSE; > + const fragS *frag = frag2; > + for (;;) > + { > + switch (frag->fr_type) > + { > + case rs_dummy: > + break; > + > + case rs_fill: > + case rs_align: > + case rs_align_code: > + case rs_align_test: > + case rs_space: > + case rs_space_nop: > + case rs_fill_nop: > + case rs_leb128: > + case rs_cfa: > + case rs_dwarf2dbg: > + case rs_machine_dependent: > + found_varlength = TRUE; > + break; > + > + default: > + /* case rs_broken_word: */ > + case rs_org: > + return FALSE; > + } > + > + frag = frag->fr_next; > + if (frag == frag1 || !frag) > + break; > + if (maybe_same_addr && frag->fr_fix > 0) > + maybe_same_addr = FALSE; > + if (frag->fr_fix < 0) > + return FALSE; > + } > + > + /* If both are TRUE, we don't know whether or not it will resolve to > + the same address. */ > + if (maybe_same_addr && found_varlength) > + return FALSE; > + > + if (frag) > + *offset = (off1 - off2 - !maybe_same_addr) * OCTETS_PER_BYTE; > + return !!frag; > +} > diff --git a/gas/frags.h b/gas/frags.h > index 6ac5d4426c79..9eb066229085 100644 > --- a/gas/frags.h > +++ b/gas/frags.h > @@ -154,6 +154,8 @@ char *frag_var (relax_stateT type, > char *opcode); > > bfd_boolean frag_offset_fixed_p (const fragS *, const fragS *, offsetT *); > +bfd_boolean frag_gtoffset_p (valueT, const fragS *, valueT, const fragS *, > + offsetT *); > > int get_frag_count (void); > void clear_frag_count (void); The following makes the changes I'm suggesting, and simplifies a few more things. The object file for the testcase is the same as with your patch. /* Return TRUE if we can determine whether FRAG2 OFF2 appears after (strict >, not >=) FRAG1 OFF1, assuming it is not before. Set *OFFSET so that resolve_expression will resolve an O_gt operation between them to false (0) if they are guaranteed to be at the same location, or to true (-1) if they are guaranteed to be at different locations. Return FALSE conservatively, e.g. if neither result can be guaranteed (yet). They are known to be in the same segment, and not the same frag (this is a fallback for frag_offset_fixed_p, that always takes care of this case), and it is expected (from the uses this is designed to simplify, namely location view increments) that frag2 is reachable from frag1 following the fr_next links, rather than the other way round. */ bfd_boolean frag_gtoffset_p (valueT off2, const fragS *frag2, valueT off1, const fragS *frag1, offsetT *offset) { /* Insanity check. */ if (frag2 == frag1 || off1 > (valueT) frag1->fr_fix) return FALSE; /* If the first symbol offset is at the end of the first frag and the second symbol offset at the beginning of the second frag then it is possible they are at the same address. Go looking for a non-zero fr_fix in any frag between these frags. If found then we can say the O_gt result will be true. If no such frag is found we assume that frag1 or any of the following frags might have a variable tail and thus the answer is unknown. This isn't strictly true; some frags don't have a variable tail, but it doesn't seem worth optimizing for those cases. We could omit the loop on off1 != frag1->fr_fix || off2 != 0 and exit the loop on finding a non-zero fr_fix, but loop until we hit frag2 to check our assumptions. */ const fragS *frag = frag1; offsetT delta = off2 - off1; for (;;) { delta += frag->fr_fix; frag = frag->fr_next; if (frag == frag2) { if (delta == 0) return FALSE; break; } /* Insanity check. */ if (frag == NULL) return FALSE; } *offset = (off2 - off1 - 1) * OCTETS_PER_BYTE; return TRUE; } -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-14 13:22 ` Alan Modra @ 2019-04-16 13:24 ` Alexandre Oliva 2019-04-17 1:32 ` Alan Modra 2019-04-24 23:18 ` Alan Modra 0 siblings, 2 replies; 10+ messages in thread From: Alexandre Oliva @ 2019-04-16 13:24 UTC (permalink / raw) To: Alan Modra; +Cc: binutils On Apr 14, 2019, Alan Modra <amodra@gmail.com> wrote: > Yes, I believe so. In fact, fr_fix is really unsigned. Also, it is > an error for rs_org frags to go backwards. Nice, that makes things simpler. >> Tested on x86_64-linux-gnu, native and cross to xtensa-elf. No new >> testcase; the issue was just performance, not output correctness. Ok to >> install? > Um, the testcase object file after this patch differs from the > original. First readelf -wi difference shown below. Which testcase was that? I didn't catch that one. >> + /* If any frag from frag2, inclusive, to frag1, exclusive, has a >> + nonzero fixed length, the addresses won't be the same as long as >> + we reach frag1. */ >> + bfd_boolean maybe_same_addr = off1 == 0 && off2 == (valueT)frag2->fr_fix; > Ins't it true that if the symbol offsets are not at exactly the above > values, then you already know the result and there's no need to > traverse the frags? This is assuming you can't .org backwards, which > is the case. Not entirely, no. Although the comments state we assume a certain ordering of frags, the assumption is only valid for O_gt operations arising from the location view machinery. It doesn't hurt if we resolve other O_gt operations if we can find the result, but those could have frags in the opposite order. Now, O_gt is not exactly the operation the location view machinery needs; O_ne (or, after negation, O_eq) would be more like it. IIRC the reason I went for O_gt was just that it was resolved early in a lot more cases than O_eq. Now, with this kind of fallback, we could actually introduce another operation kind that (i) resolves to zero if frags are not in the same (sub)section, and (ii) can safely make the optimization you suggested for frags in the same (sub)section, with the extra benefit that we know we won't ever search all the way to the end of the frag linked list without finding the other frag: we could assert-check that, and stop the search at the first nonzero-fr_fix. Would a new op with this semantics (O_incview?, vs reset view) be preferred? > The following makes the changes I'm suggesting, and simplifies a few > more things. Thanks, I'll wait for a response to the question above before taking further action on this, but I'll likely integrate your change (credited in the ChangeLog) in the patch regardless. -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain Engineer Free Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-16 13:24 ` Alexandre Oliva @ 2019-04-17 1:32 ` Alan Modra 2019-04-24 23:18 ` Alan Modra 1 sibling, 0 replies; 10+ messages in thread From: Alan Modra @ 2019-04-17 1:32 UTC (permalink / raw) To: Alexandre Oliva; +Cc: binutils On Tue, Apr 16, 2019 at 10:24:04AM -0300, Alexandre Oliva wrote: > On Apr 14, 2019, Alan Modra <amodra@gmail.com> wrote: > > > Yes, I believe so. In fact, fr_fix is really unsigned. Also, it is > > an error for rs_org frags to go backwards. > > Nice, that makes things simpler. > > >> Tested on x86_64-linux-gnu, native and cross to xtensa-elf. No new > >> testcase; the issue was just performance, not output correctness. Ok to > >> install? > > > Um, the testcase object file after this patch differs from the > > original. First readelf -wi difference shown below. > > Which testcase was that? I didn't catch that one. The https://sourceware.org/bugzilla/show_bug.cgi?id=24444 reproducer attachment. I see differences in DW_AT_GNU_entry_view, in all cases substituting 3 for 10, or 2 for 9 (lower number after your patch). I don't know enough to say whether this is a reasonable result. > > >> + /* If any frag from frag2, inclusive, to frag1, exclusive, has a > >> + nonzero fixed length, the addresses won't be the same as long as > >> + we reach frag1. */ > >> + bfd_boolean maybe_same_addr = off1 == 0 && off2 == (valueT)frag2->fr_fix; > > > Ins't it true that if the symbol offsets are not at exactly the above > > values, then you already know the result and there's no need to > > traverse the frags? This is assuming you can't .org backwards, which > > is the case. > > Not entirely, no. Although the comments state we assume a certain > ordering of frags, the assumption is only valid for O_gt operations > arising from the location view machinery. It doesn't hurt if we resolve > other O_gt operations if we can find the result, but those could have > frags in the opposite order. Thanks for reminding me. I wrote that comment before writing the simplification of your patch, and there you'll note I did chase down the second frag.. > Now, O_gt is not exactly the operation the location view machinery > needs; O_ne (or, after negation, O_eq) would be more like it. IIRC the > reason I went for O_gt was just that it was resolved early in a lot more > cases than O_eq. Now, with this kind of fallback, we could actually > introduce another operation kind that (i) resolves to zero if frags are > not in the same (sub)section, and (ii) can safely make the optimization > you suggested for frags in the same (sub)section, with the extra benefit > that we know we won't ever search all the way to the end of the frag > linked list without finding the other frag: we could assert-check that, > and stop the search at the first nonzero-fr_fix. > > Would a new op with this semantics (O_incview?, vs reset view) be > preferred? No, if O_gt works let's go with that. > > The following makes the changes I'm suggesting, and simplifies a few > > more things. > > Thanks, I'll wait for a response to the question above before taking > further action on this, but I'll likely integrate your change (credited > in the ChangeLog) in the patch regardless. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-16 13:24 ` Alexandre Oliva 2019-04-17 1:32 ` Alan Modra @ 2019-04-24 23:18 ` Alan Modra 2019-04-30 5:49 ` Alexandre Oliva 1 sibling, 1 reply; 10+ messages in thread From: Alan Modra @ 2019-04-24 23:18 UTC (permalink / raw) To: Alexandre Oliva; +Cc: binutils I pushed the following for you. ---- Speed up locview resolution with relaxable frags Targets such as xtensa incur a much higher overhead to resolve location view numbers than e.g. x86, because the expressions used to compute view numbers cannot be resolved soon enough. Each view number is computed by incrementing the previous view, if they are both at the same address, or by resetting it to zero otherwise. If PV is the previous view number, PL is its location, and NL is the location of the next view, its number is computed by evaluating NV = !(NL > PL) * (PV + 1). set_or_check_view uses resolve_expression to decide whether portions of this expression can be simplified to constants. The (NL > PL) subexpression is one that can often be resolved to a constant, breaking chains of view number computations at instructions of nonzero length, but not after alignment that might be unnecessary. Alas, when nearly every frag ends with a relaxable instruction, frag_offset_fixed_p will correctly fail to determine a known offset between two unresolved addresses in neighboring frags, so the unresolved symbolic operation will be constructed and used in the computation of most view numbers. This results in very deep expressions. As view numbers get referenced in location view lists, each operand in the list goes through symbol_clone_if_forward_ref, which recurses on every subexpression. If each view number were to be referenced, this would exhibit O(n^2) behavior, where n is the depth of the view number expressions, i.e., the length of view number sequences without an early resolution that cuts the expression short. This patch enables address compares used by view numbering to be resolved even when exact offsets are not known, using new logic to determine when the location either remained the same or changed for sure, even with the possibility of relaxation. This enables most view number expressions to be resolved with a small, reasonable depth. PR gas/24444 * frags.c (frag_gtoffset_p): New. * frags.h (frag_gtoffset_p): Declare it. * expr.c (resolve_expression): Use it. diff --git a/gas/ChangeLog b/gas/ChangeLog index eeabe0ccae..8558ecd7b4 100644 --- a/gas/ChangeLog +++ b/gas/ChangeLog @@ -1,3 +1,11 @@ +2019-04-25 Alexandre Oliva <aoliva@redhat.com> + Alan Modra <amodra@gmail.com> + + PR gas/24444 + * frags.c (frag_gtoffset_p): New. + * frags.h (frag_gtoffset_p): Declare it. + * expr.c (resolve_expression): Use it. + 2019-04-24 Alan Modra <amodra@gmail.com> PR 24444 diff --git a/gas/expr.c b/gas/expr.c index ee85bda1cc..3efde88cc0 100644 --- a/gas/expr.c +++ b/gas/expr.c @@ -2179,7 +2179,10 @@ resolve_expression (expressionS *expressionP) || op == O_lt || op == O_le || op == O_ge || op == O_gt) && seg_left == seg_right && (finalize_syms - || frag_offset_fixed_p (frag_left, frag_right, &frag_off)) + || frag_offset_fixed_p (frag_left, frag_right, &frag_off) + || (op == O_gt + && frag_gtoffset_p (left, frag_left, + right, frag_right, &frag_off))) && (seg_left != reg_section || left == right) && (seg_left != undefined_section || add_symbol == op_symbol))) { diff --git a/gas/frags.c b/gas/frags.c index a9fc003fbd..2f21b9db20 100644 --- a/gas/frags.c +++ b/gas/frags.c @@ -462,3 +462,58 @@ frag_offset_fixed_p (const fragS *frag1, const fragS *frag2, offsetT *offset) return FALSE; } + +/* Return TRUE if we can determine whether FRAG2 OFF2 appears after + (strict >, not >=) FRAG1 OFF1, assuming it is not before. Set + *OFFSET so that resolve_expression will resolve an O_gt operation + between them to false (0) if they are guaranteed to be at the same + location, or to true (-1) if they are guaranteed to be at different + locations. Return FALSE conservatively, e.g. if neither result can + be guaranteed (yet). + + They are known to be in the same segment, and not the same frag + (this is a fallback for frag_offset_fixed_p, that always takes care + of this case), and it is expected (from the uses this is designed + to simplify, namely location view increments) that frag2 is + reachable from frag1 following the fr_next links, rather than the + other way round. */ + +bfd_boolean +frag_gtoffset_p (valueT off2, const fragS *frag2, + valueT off1, const fragS *frag1, offsetT *offset) +{ + /* Insanity check. */ + if (frag2 == frag1 || off1 > frag1->fr_fix) + return FALSE; + + /* If the first symbol offset is at the end of the first frag and + the second symbol offset at the beginning of the second frag then + it is possible they are at the same address. Go looking for a + non-zero fr_fix in any frag between these frags. If found then + we can say the O_gt result will be true. If no such frag is + found we assume that frag1 or any of the following frags might + have a variable tail and thus the answer is unknown. This isn't + strictly true; some frags don't have a variable tail, but it + doesn't seem worth optimizing for those cases. */ + const fragS *frag = frag1; + offsetT delta = off2 - off1; + for (;;) + { + delta += frag->fr_fix; + frag = frag->fr_next; + if (frag == frag2) + { + if (delta == 0) + return FALSE; + break; + } + /* If we run off the end of the frag chain then we have a case + where frag2 is not after frag1, ie. an O_gt expression not + created for .loc view. */ + if (frag == NULL) + return FALSE; + } + + *offset = (off2 - off1 - delta) * OCTETS_PER_BYTE; + return TRUE; +} diff --git a/gas/frags.h b/gas/frags.h index 741ae2ba22..6d84aaab3f 100644 --- a/gas/frags.h +++ b/gas/frags.h @@ -154,6 +154,8 @@ char *frag_var (relax_stateT type, char *opcode); bfd_boolean frag_offset_fixed_p (const fragS *, const fragS *, offsetT *); +bfd_boolean frag_gtoffset_p (valueT, const fragS *, valueT, const fragS *, + offsetT *); int get_frag_count (void); void clear_frag_count (void); -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-24 23:18 ` Alan Modra @ 2019-04-30 5:49 ` Alexandre Oliva 2019-05-03 1:14 ` Alan Modra 2019-05-04 3:22 ` Alexandre Oliva 0 siblings, 2 replies; 10+ messages in thread From: Alexandre Oliva @ 2019-04-30 5:49 UTC (permalink / raw) To: Alan Modra; +Cc: binutils On Apr 24, 2019, Alan Modra <amodra@gmail.com> wrote: > I pushed the following for you. Thanks for taking care of this while I convalesced. I'm afraid we may still have a problem I hadn't realized before. Consider two text subsections, say 0 and 1. We switch from 0 to 1 and output a loc with a view. Later we switch back, and also output a loc with a view. Once addresses are resolved, one of these will have necessarily gone from a higher offset to a lower offset within the text section. In rare cases, the subsection boundary may actually have locviews in both subsections, at the same address. O_gt would give us an answer that does not match the expectation that we should reset view when PC changes. However, it may be worse than that. Loc view expressions are chained back to the previous loc view present in the asm input, even if it's in a different section. This means O_gt on loc addresses will give us the wrong result when we switch between subsections from higher to lower address, and it will also likely give us the wrong result at the boundary. Consider: .text 0 .loc 1 1 view -0 nop .L2: .loc 1 2 view .LV2 .text 1 .L3: .loc 1 3 view .LV3 nop .L4: .loc 1 4 view .LV4 .text 0 .L5: .loc 1 5 view .LV5 We'll compute: .LV3 = (!(.L3 > .L2)) * (.Lv2 + 1) .Lv5 = (!(.L5 > .L4)) * (.LV4 + 1) while we should really compute: .Lv3 = (.L3 == .L5) * (.LV5 + 1) .LV5 = (.L5 == .L2) * (.LV2 + 1) because the reordered text section, based on which we output the line number program, is: .text .loc 1 1 view -0 nop .L2: .loc 1 2 view .LV2 # 0 .L5: .loc 1 5 view .LV5 # 1 # --- transition from .text 0 to .text 1 .L3: .loc 1 3 view .LV3 # 2 nop .L4: .loc 1 4 view .LV4 # 0 AFAICT fixing this will require introducing artificial undefined symbols to be used as previous view label and previous view number for each subsection, and then, when ordering subsections and sections, defining the initial previous symbols of each subsection as equal to the final previous symbols of the previous subsection in the same section. Maybe with this arrangement we can keep on using O_gt, though reducing the depth of the loc view expressions feels appealing in the context of this PR. Any better ideas? -- Alexandre Oliva, freedom fighter https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain Engineer Free Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-30 5:49 ` Alexandre Oliva @ 2019-05-03 1:14 ` Alan Modra 2019-05-04 0:56 ` Alexandre Oliva 2019-05-04 3:22 ` Alexandre Oliva 1 sibling, 1 reply; 10+ messages in thread From: Alan Modra @ 2019-05-03 1:14 UTC (permalink / raw) To: Alexandre Oliva; +Cc: binutils On Tue, Apr 30, 2019 at 02:49:38AM -0300, Alexandre Oliva wrote: > AFAICT fixing this will require introducing artificial undefined symbols > to be used as previous view label and previous view number for each > subsection, and then, when ordering subsections and sections, defining > the initial previous symbols of each subsection as equal to the final > previous symbols of the previous subsection in the same section. Don't you already keep struct line_entry on per-subseg lists? Frags too are kept on per-subseg lists until chain_frchains_together. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-05-03 1:14 ` Alan Modra @ 2019-05-04 0:56 ` Alexandre Oliva 0 siblings, 0 replies; 10+ messages in thread From: Alexandre Oliva @ 2019-05-04 0:56 UTC (permalink / raw) To: Alan Modra; +Cc: binutils On May 2, 2019, Alan Modra <amodra@gmail.com> wrote: > On Tue, Apr 30, 2019 at 02:49:38AM -0300, Alexandre Oliva wrote: >> AFAICT fixing this will require introducing artificial undefined symbols >> to be used as previous view label and previous view number for each >> subsection, and then, when ordering subsections and sections, defining >> the initial previous symbols of each subsection as equal to the final >> previous symbols of the previous subsection in the same section. > Don't you already keep struct line_entry on per-subseg lists? Yeah, sorry. I had indeed started out worried about that, but I failed to remove some traces of that suspicion from my message even after I narrowed the problem down to the boundaries. This testcase demonstrates the problem: the generated line number program has nonzero views at subsection boundaries, but the view numbers computed and assigned by the view expressions do not match those implied by the line number program, because we assume a view reset when first entering a subsection. This would only be appropriate (and even then somewhat problematic) if we generated a separate line number program, or forced a view reset in the line number program at subsection boundaries. /* Test view numbering continuity at subsection borders. Copyright (C) 2017-2019 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>. */ .file "dwarf2-19.c" .text 0 .balign 4 .globl _start _start: .file 1 "dwarf2-19.c" .loc 1 1 view 0 .section .rodata .uleb128 .L1 .uleb128 .L3 .uleb128 .L4 .uleb128 .L2 .text 1 .loc 1 2 view .L1 /* same address as view 0 above -> view 1 */ .text 2 .loc 1 3 view .L2 /* same address as .L4 below -> view 2 */ .text 1 .dc.l 0 .loc 1 4 view .L3 /* bumped address from .L1's, view 0 */ .loc 1 5 view .L4 /* same address, view 1 */ #as: #readelf: -x.rodata -wL #name: DWARF2 19 # The am33 avr cr16 crx ft32 mn10 msp430 nds32 pru rl78 and xtensa targets do not evaluate the subtraction of symbols at assembly time. # The riscv targets do not support the subtraction of symbols. # The tile targets require 8-byte instructions, but the test only simulates 4-byte instructions. #notarget: am3*-* avr-* cr16-* crx-* ft32*-* mn10*-* msp430-* nds32*-* pru-* riscv*-* rl78-* tile*-* xtensa-* Hex dump of section '\.rodata': 0x00000000 01000102 *.* Contents of the \.debug_line section: CU: dwarf2-19\.c: File name *Line number *Starting address *View +Stmt dwarf2-19\.c *1 *0 +x dwarf2-19\.c *2 *0 *1 +x dwarf2-19\.c *4 *0x4 +x dwarf2-19\.c *5 *0x4 *1 +x dwarf2-19\.c *3 *0x4 *2 +x -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain Engineer Free Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-04-30 5:49 ` Alexandre Oliva 2019-05-03 1:14 ` Alan Modra @ 2019-05-04 3:22 ` Alexandre Oliva 2019-05-04 5:44 ` Alan Modra 1 sibling, 1 reply; 10+ messages in thread From: Alexandre Oliva @ 2019-05-04 3:22 UTC (permalink / raw) To: Alan Modra; +Cc: binutils On Apr 30, 2019, Alexandre Oliva <aoliva@redhat.com> wrote: > .L5: .loc 1 5 view .LV5 # 1 > # --- transition from .text 0 to .text 1 > .L3: .loc 1 3 view .LV3 # 2 > AFAICT fixing this will require introducing artificial undefined symbols > to be used as previous view label and previous view number for each > subsection, and then, when ordering subsections and sections, defining > the initial previous symbols of each subsection as equal to the final > previous symbols of the previous subsection in the same section. I ended up delaying the view assignment for heads of subsegments. Here's the patch. Tested on x86_64-linux-gnu native. Ok to install? [LVu] base subseg head view on prev subseg's tail Location views at borders between subsegments/subsections in the same segment/section are computed as if each new subsegment/subsection started with a forced view reset to zero, but the line number program does not introduce resets that are not explicitly requested, so if a subsegment ends at the same address another starts, the line number program will have a continuity of views at the border address, whereas the initial view number label in the latter subsegment will be miscomputed as zero. This patch delays the assignment of view expressions at subsegment heads to the time of chaining the frags of subsegments into a single segment, so that they are set based on the view at the end of the previous subsegment in the same segment. The line number program created for the test program had an unnecessary DW_LNS_advance_pc at the end. This patch also arranges for us not to emit it. for gas/ChangeLog * dwarf2dbg.c (set_or_check_view): Skip heads when assigning views of prior locs. (dwarf2_gen_line_info_1): Skip heads. (size_inc_line_addr, emit_inc_line_addr): Drop DW_LNS_advance_pc for zero addr delta. (dwarf2_finish): Assign views for heads of segments. * testsuite/gas/elf/dwarf2-19.d: New. * testsuite/gas/elf/dwarf2-19.s: New. * testsuite/gas/elf/elf.exp: Test it. --- gas/dwarf2dbg.c | 37 +++++++++++++++++++++++++++++---- gas/testsuite/gas/elf/dwarf2-19.d | 21 +++++++++++++++++++ gas/testsuite/gas/elf/dwarf2-19.s | 41 +++++++++++++++++++++++++++++++++++++ gas/testsuite/gas/elf/elf.exp | 1 + 4 files changed, 95 insertions(+), 5 deletions(-) create mode 100644 gas/testsuite/gas/elf/dwarf2-19.d create mode 100644 gas/testsuite/gas/elf/dwarf2-19.s diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c index 2d316ddcb9ca..b77751d8af3d 100644 --- a/gas/dwarf2dbg.c +++ b/gas/dwarf2dbg.c @@ -442,7 +442,16 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, gas_assert (r == p); /* Set or check views until we find a defined or absent view. */ do - set_or_check_view (r, r->next, NULL); + { + /* Do not define the head of a (sub?)segment view while + handling others. It would be defined too early, without + regard to the last view of other subsegments. + set_or_check_view will be called for every head segment + that needs it. */ + if (r == h) + break; + set_or_check_view (r, r->next, NULL); + } while (r->next && r->next->loc.view && !S_IS_DEFINED (r->next->loc.view) && (r = r->next)); @@ -454,6 +463,11 @@ set_or_check_view (struct line_entry *e, struct line_entry *p, simplify the view expressions, until we do so to P. */ do { + /* The head view of a subsegment may remain undefined while + handling other elements, before it is linked to the last + view of the previous subsegment. */ + if (r == h) + continue; gas_assert (S_IS_DEFINED (r->loc.view)); resolve_expression (symbol_get_value_expression (r->loc.view)); } @@ -480,9 +494,11 @@ dwarf2_gen_line_info_1 (symbolS *label, struct dwarf2_line_info *loc) lss = get_line_subseg (now_seg, now_subseg, TRUE); - if (loc->view) + /* Subseg heads are chained to previous subsegs in + dwarf2_finish. */ + if (loc->view && lss->head) set_or_check_view (e, - !lss->head ? NULL : (struct line_entry *)lss->ptail, + (struct line_entry *)lss->ptail, lss->head); *lss->ptail = e; @@ -1176,7 +1192,7 @@ size_inc_line_addr (int line_delta, addressT addr_delta) { if (addr_delta == MAX_SPECIAL_ADDR_DELTA) len = 1; - else + else if (addr_delta) len = 1 + sizeof_leb128 (addr_delta, 0); return len + 3; } @@ -1240,7 +1256,7 @@ emit_inc_line_addr (int line_delta, addressT addr_delta, char *p, int len) { if (addr_delta == MAX_SPECIAL_ADDR_DELTA) *p++ = DW_LNS_const_add_pc; - else + else if (addr_delta) { *p++ = DW_LNS_advance_pc; p += output_leb128 (p, addr_delta, 0); @@ -2218,8 +2234,19 @@ dwarf2_finish (void) struct line_subseg *lss = s->head; struct line_entry **ptail = lss->ptail; + /* Reset the initial view of the first subsection of the + section. */ + if (lss->head && lss->head->loc.view) + set_or_check_view (lss->head, NULL, NULL); + while ((lss = lss->next) != NULL) { + /* Link the first view of subsequent subsections to the + previous view. */ + if (lss->head && lss->head->loc.view) + set_or_check_view (lss->head, + !s->head ? NULL : (struct line_entry *)ptail, + s->head ? s->head->head : NULL); *ptail = lss->head; ptail = lss->ptail; } diff --git a/gas/testsuite/gas/elf/dwarf2-19.d b/gas/testsuite/gas/elf/dwarf2-19.d new file mode 100644 index 000000000000..ebb3bf984b1c --- /dev/null +++ b/gas/testsuite/gas/elf/dwarf2-19.d @@ -0,0 +1,21 @@ +#as: +#readelf: -x.rodata -wL +#name: DWARF2 19 +# The am33 avr cr16 crx ft32 mn10 msp430 nds32 pru rl78 and xtensa targets do not evaluate the subtraction of symbols at assembly time. +# The riscv targets do not support the subtraction of symbols. +# The tile targets require 8-byte instructions, but the test only simulates 4-byte instructions. +#notarget: am3*-* avr-* cr16-* crx-* ft32*-* mn10*-* msp430-* nds32*-* pru-* riscv*-* rl78-* tile*-* xtensa-* + +Hex dump of section '\.rodata': + 0x00000000 01000102 *.* + +Contents of the \.debug_line section: + +CU: dwarf2-19\.c: +File name *Line number *Starting address *View +Stmt +dwarf2-19\.c *1 *0 +x +dwarf2-19\.c *2 *0 *1 +x +dwarf2-19\.c *4 *0x4 +x +dwarf2-19\.c *5 *0x4 *1 +x +dwarf2-19\.c *3 *0x4 *2 +x +dwarf2-19\.c *3 *0x4 *3 +x diff --git a/gas/testsuite/gas/elf/dwarf2-19.s b/gas/testsuite/gas/elf/dwarf2-19.s new file mode 100644 index 000000000000..dd87be8fc9c1 --- /dev/null +++ b/gas/testsuite/gas/elf/dwarf2-19.s @@ -0,0 +1,41 @@ +/* Test view numbering continuity at subsection borders. + + Copyright (C) 2017-2019 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + + .file "dwarf2-19.c" + .text 0 + .balign 4 + .globl _start +_start: + .file 1 "dwarf2-19.c" + .loc 1 1 view 0 + + .section .rodata + .uleb128 .L1 + .uleb128 .L3 + .uleb128 .L4 + .uleb128 .L2 + + .text 1 + .loc 1 2 view .L1 /* same address as view 0 above -> view 1 */ + + .text 2 + .loc 1 3 view .L2 /* same address as .L4 below -> view 2 */ + + .text 1 + .dc.l 0 + .loc 1 4 view .L3 /* bumped address from .L1's, view 0 */ + .loc 1 5 view .L4 /* same address, view 1 */ diff --git a/gas/testsuite/gas/elf/elf.exp b/gas/testsuite/gas/elf/elf.exp index d616d5de7662..01d8b9dcb5f2 100644 --- a/gas/testsuite/gas/elf/elf.exp +++ b/gas/testsuite/gas/elf/elf.exp @@ -251,6 +251,7 @@ if { [is_elf_format] } then { run_dump_test "dwarf2-16" run_dump_test "dwarf2-17" run_dump_test "dwarf2-18" + run_dump_test "dwarf2-19" run_dump_test "bss" run_dump_test "bad-bss" run_dump_test "bad-section-flag" -- Alexandre Oliva, freedom fighter he/him https://FSFLA.org/blogs/lxo Be the change, be Free! FSF Latin America board member GNU Toolchain Engineer Free Software Evangelist Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PR24444] speed up locview resolution wiht relaxable frags 2019-05-04 3:22 ` Alexandre Oliva @ 2019-05-04 5:44 ` Alan Modra 0 siblings, 0 replies; 10+ messages in thread From: Alan Modra @ 2019-05-04 5:44 UTC (permalink / raw) To: Alexandre Oliva; +Cc: binutils On Sat, May 04, 2019 at 12:22:22AM -0300, Alexandre Oliva wrote: > for gas/ChangeLog > > * dwarf2dbg.c (set_or_check_view): Skip heads when assigning > views of prior locs. > (dwarf2_gen_line_info_1): Skip heads. > (size_inc_line_addr, emit_inc_line_addr): Drop > DW_LNS_advance_pc for zero addr delta. > (dwarf2_finish): Assign views for heads of segments. > * testsuite/gas/elf/dwarf2-19.d: New. > * testsuite/gas/elf/dwarf2-19.s: New. > * testsuite/gas/elf/elf.exp: Test it. Looks good. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-05-04 5:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-13 8:57 [PR24444] speed up locview resolution wiht relaxable frags Alexandre Oliva 2019-04-14 13:22 ` Alan Modra 2019-04-16 13:24 ` Alexandre Oliva 2019-04-17 1:32 ` Alan Modra 2019-04-24 23:18 ` Alan Modra 2019-04-30 5:49 ` Alexandre Oliva 2019-05-03 1:14 ` Alan Modra 2019-05-04 0:56 ` Alexandre Oliva 2019-05-04 3:22 ` Alexandre Oliva 2019-05-04 5:44 ` Alan Modra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).