* [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization @ 2022-12-08 10:59 Jose E. Marchesi 2022-12-08 12:20 ` Jakub Jelinek ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Jose E. Marchesi @ 2022-12-08 10:59 UTC (permalink / raw) To: gcc-patches The expand_expr_divmod function in expr.cc attempts to optimize cases where both arguments of a division/modulus are known to be positive when interpreted as signed. In these cases, both signed division and unsigned division will raise the same value, and therefore the cheapest option can be used. In order to determine what is the cheaper option in the current target, expand_expr_divmod actually expands both a signed divmod and an unsigned divmod using local "sequences": start_sequence (); ... expand_divmod (... signed ...); ... end_sequence (); start_sequence (); ... expand_divmod (... unsigned ...); ... end_sequence (); And then compares the cost of each generated sequence, choosing the best one. Finally, it emits the selected expanded sequence and returns the rtx with the result. This approach has a caveat. Some targets do not provide instructions for division/modulus instructions. In the case of BPF, it provides unsigned division/modulus, but not signed division/modulus. In these cases, the expand_divmod tries can contain calls to funcalls. For example, in BPF: start_sequence (); ... expand_divmod (... signed ...); -> This generates funcall to __divdi3 ... end_sequence (); start_sequence (); ... expand_divmod (... unsigned ...); -> This generates direct `div' insn. ... end_sequence (); The problem is that when a funcall is expanded, an accompanying global symbol definition is written in the output stream: .global __divdi3 And this symbol definition remains in the compiled assembly file, even if the sequence using the direct `div' instruction above is used. This is particularly bad in BPF, because the kernel bpf loader chokes on the spurious symbol __divdi3 and makes the resulting BPF object unloadable (note that BPF objects are not linked before processed by the kernel.) In order to fix this, this patch modifies expand_expr_divmod in the following way: - When trying each sequence (signed, unsigned) the expand_divmod calls are told to _not_ use libcalls if everything else fails. This is done by passing OPTAB_WIDEN as the `methods' argument. (Before it was using the default value OPTAB_LIB_WIDEN.) - If any of the tried expanded sequences contain a funcall, then the optimization is not attempted. A couple of BPF tests are also added to make sure this doesn't break at any point in the future. Tested in bpf-unknown-none and x86_64-linux-gnu. Regtested in x86_64-linux-gnu. No regressions. gcc/ChangeLog * expr.cc (expand_expr_divmod): Avoid side-effects of trying sequences involving funcalls in optimization. gcc/testsuite/ChangeLog: * gcc.target/bpf/divmod-funcall-1.c: New test. * gcc.target/bpf/divmod-funcall-2.c: Likewise. --- gcc/expr.cc | 44 +++++++++++-------- .../gcc.target/bpf/divmod-funcall-1.c | 8 ++++ .../gcc.target/bpf/divmod-funcall-2.c | 8 ++++ 3 files changed, 41 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c diff --git a/gcc/expr.cc b/gcc/expr.cc index d9407432ea5..4d4be5d7bda 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -9168,32 +9168,38 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, do_pending_stack_adjust (); start_sequence (); rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, - op0, op1, target, 1); + op0, op1, target, 1, OPTAB_WIDEN); rtx_insn *uns_insns = get_insns (); end_sequence (); start_sequence (); rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, - op0, op1, target, 0); + op0, op1, target, 0, OPTAB_WIDEN); rtx_insn *sgn_insns = get_insns (); end_sequence (); - unsigned uns_cost = seq_cost (uns_insns, speed_p); - unsigned sgn_cost = seq_cost (sgn_insns, speed_p); - /* If costs are the same then use as tie breaker the other other - factor. */ - if (uns_cost == sgn_cost) - { - uns_cost = seq_cost (uns_insns, !speed_p); - sgn_cost = seq_cost (sgn_insns, !speed_p); - } - - if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) - { - emit_insn (uns_insns); - return uns_ret; - } - emit_insn (sgn_insns); - return sgn_ret; + /* Do not try to optimize if any of the sequences tried above + resulted in a funcall. */ + if (uns_ret && sgn_ret) + { + unsigned uns_cost = seq_cost (uns_insns, speed_p); + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); + + /* If costs are the same then use as tie breaker the other + other factor. */ + if (uns_cost == sgn_cost) + { + uns_cost = seq_cost (uns_insns, !speed_p); + sgn_cost = seq_cost (sgn_insns, !speed_p); + } + + if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) + { + emit_insn (uns_insns); + return uns_ret; + } + emit_insn (sgn_insns); + return sgn_ret; + } } return expand_divmod (mod_p, code, mode, treeop0, treeop1, op0, op1, target, unsignedp); diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c new file mode 100644 index 00000000000..dffb1506f06 --- /dev/null +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-final { scan-assembler-not "__divdi3" } } */ + +int +foo (unsigned int len) +{ + return ((unsigned long)len) * 234 / 5; +} diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c new file mode 100644 index 00000000000..41e8e40c35c --- /dev/null +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-final { scan-assembler-not "__moddi3" } } */ + +int +foo (unsigned int len) +{ + return ((unsigned long)len) * 234 % 5; +} -- 2.30.2 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 10:59 [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization Jose E. Marchesi @ 2022-12-08 12:20 ` Jakub Jelinek 2022-12-08 13:02 ` Jose E. Marchesi 2022-12-08 13:42 ` Richard Biener 2023-01-30 18:45 ` Andrew Pinski 2 siblings, 1 reply; 15+ messages in thread From: Jakub Jelinek @ 2022-12-08 12:20 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: gcc-patches On Thu, Dec 08, 2022 at 11:59:44AM +0100, Jose E. Marchesi via Gcc-patches wrote: > gcc/ChangeLog > > * expr.cc (expand_expr_divmod): Avoid side-effects of trying > sequences involving funcalls in optimization. That looks wrong. The globals for mentioned calls just shouldn't be emitted during expansion, especially if it is bigger annoyance than just having some extra symbols in the symbol table. expand_expr_divmod is definitely not the only place where something is expanded and later not used, lots of other places in the expander do that, and more importantly, there are over 80 optimization passes after expansion, many of them can remove code determined to be dead, and while lots of dead code is removed in GIMPLE optimizations already, definitely not all. So, rather than add hacks for this in a single spot, much better is to emit the globals only for stuff that is actually needed (so during final or immediately before it). Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 12:20 ` Jakub Jelinek @ 2022-12-08 13:02 ` Jose E. Marchesi 2022-12-08 13:19 ` Jakub Jelinek 0 siblings, 1 reply; 15+ messages in thread From: Jose E. Marchesi @ 2022-12-08 13:02 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches Hi Jakub. > On Thu, Dec 08, 2022 at 11:59:44AM +0100, Jose E. Marchesi via Gcc-patches wrote: >> gcc/ChangeLog >> >> * expr.cc (expand_expr_divmod): Avoid side-effects of trying >> sequences involving funcalls in optimization. > > That looks wrong. > The globals for mentioned calls just shouldn't be emitted during expansion, > especially if it is bigger annoyance than just having some extra symbols > in the symbol table. > expand_expr_divmod is definitely not the only place where something is > expanded and later not used, lots of other places in the expander do that, > and more importantly, there are over 80 optimization passes after expansion, > many of them can remove code determined to be dead, and while lots of dead > code is removed in GIMPLE optimizations already, definitely not all. > So, rather than add hacks for this in a single spot, much better is to emit > the globals only for stuff that is actually needed (so during final or > immediately before it). Yeah I see the point. The culprit of the leadked .global seems to be a call to assemble_external_libcall in emit_library_call_value_1: expand_expr_divmod expand_divmod -> This will result in libcall sign_expand_divmod emit_library_call_value emit_library_call_value_1 ... /* If this machine requires an external definition for library functions, write one out. */ assemble_external_libcall (fun); ... The documented purpose of assemble_external_libcall is, as stated in output.h, to "Assemble a string constant". So, it seems to me that emit_library_call_value should not assemble anything, since it is used by expand functions whose expansions may be eventually discarded. However, simply removing that call to assemble_external_libcall makes .global declarations to not be emitted even when the funcall is actually emitted in final: For: int foo(unsigned int len) { return ((long)len) * 234 / 5; } we get: .file "foo.c" .text <------------- NO .global __divdi3 .align 3 .global foo .type foo, @function foo: mov32 %r1,%r1 mov %r2,5 mul %r1,234 call __divdi3 exit .size foo, .-foo .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" Note that BPF lacks signed division instructions. So, I guess the right fix would be to call assemble_external_libcall during final? The `.global FOO' directive would be generated immediately before the call sequence, but I guess that would be ok. WDYT? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 13:02 ` Jose E. Marchesi @ 2022-12-08 13:19 ` Jakub Jelinek 2022-12-08 22:40 ` Jose E. Marchesi 0 siblings, 1 reply; 15+ messages in thread From: Jakub Jelinek @ 2022-12-08 13:19 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: gcc-patches On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: > So, I guess the right fix would be to call assemble_external_libcall > during final? The `.global FOO' directive would be generated > immediately before the call sequence, but I guess that would be ok. During final only if all the targets can deal with the effects of assemble_external_libcall being done in the middle of emitting assembly for the function. Otherwise, it could be e.g. done in the first loop of shorten_branches. Note, in calls.cc it is done only for emit_library_call_value_1 and not for emit_call_1, so if we do it late, we need to be able to find out what call is to a libcall and what is to a normal call. If there is no way to differentiate it right now, perhaps we need some flag somewhere, say on a SYMBOL_REF. And then assemble_external_libcall either only if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or perhaps anywhere in the function and its constant pool. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 13:19 ` Jakub Jelinek @ 2022-12-08 22:40 ` Jose E. Marchesi 2023-01-04 8:58 ` Jose E. Marchesi 0 siblings, 1 reply; 15+ messages in thread From: Jose E. Marchesi @ 2022-12-08 22:40 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, david.faust Hi Jakub. > On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: >> So, I guess the right fix would be to call assemble_external_libcall >> during final? The `.global FOO' directive would be generated >> immediately before the call sequence, but I guess that would be ok. > > During final only if all the targets can deal with the effects of > assemble_external_libcall being done in the middle of emitting assembly > for the function. > > Otherwise, it could be e.g. done in the first loop of shorten_branches. > > Note, in calls.cc it is done only for emit_library_call_value_1 > and not for emit_call_1, so if we do it late, we need to be able to find > out what call is to a libcall and what is to a normal call. If there is > no way to differentiate it right now, perhaps we need some flag somewhere, > say on a SYMBOL_REF. And then assemble_external_libcall either only > if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or > perhaps anywhere in the function and its constant pool. Allright, the quick-and-dirty patch below seems to DTRT with simple examples. First, when libcalls are generated. Note only one .global is generated for all calls, and actually it is around the same position than before: $ cat foo.c int foo(unsigned int len, int flag) { if (flag) return (((long)len) * 234 / 5); return (((long)len) * 2 / 5); } $ cc1 -O2 foo.c $ cat foo.c .file "foo.c" .text .global __divdi3 .align 3 .global foo .type foo, @function foo: mov32 %r1,%r1 lsh %r2,32 jne %r2,0,.L5 mov %r2,5 lsh %r1,1 call __divdi3 lsh %r0,32 arsh %r0,32 exit .L5: mov %r2,5 mul %r1,234 call __divdi3 lsh %r0,32 arsh %r0,32 exit .size foo, .-foo .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" Second, when libcalls are tried by expand_moddiv in a sequence, but then discarded and not linked in the main sequence: $ cat foo.c int foo(unsigned int len, int flag) { if (flag) return (((long)len) * 234 / 5); return (((long)len) * 2 / 5); } $ cc1 -O2 foo.c $ cat foo.c .file "foo.c" .text .align 3 .global foo .type foo, @function foo: mov32 %r0,%r1 lsh %r2,32 jne %r2,0,.L5 add %r0,%r0 div %r0,5 lsh %r0,32 arsh %r0,32 exit .L5: mul %r0,234 div %r0,5 lsh %r0,32 arsh %r0,32 exit .size foo, .-foo .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" Note the .global now is not generated, as desired. As you can see below, I am adding a new RTX flag `is_libcall', with written form "/l". Before I get into serious testing etc, can you please confirm whether this is the right approach or not? In particular, I am a little bit concerned about the expectation I am using that the target of the `call' instruction emitted by emit_call_1 is always a (MEM (SYMBOL_REF ...)) when it is passed a SYMBOL_REF as the first argument (`fun' in emit_library_call_value_1). Thanks. diff --git a/gcc/calls.cc b/gcc/calls.cc index 6dd6f73e978..6c4a3725272 100644 --- a/gcc/calls.cc +++ b/gcc/calls.cc @@ -4370,10 +4370,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, || argvec[i].partial != 0) update_stack_alignment_for_call (&argvec[i].locate); - /* If this machine requires an external definition for library - functions, write one out. */ - assemble_external_libcall (fun); - original_args_size = args_size; args_size.constant = (aligned_upper_bound (args_size.constant + stack_pointer_delta, @@ -4717,6 +4713,9 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, valreg, old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); + /* Mark the emitted call as a libcall with the new flag. */ + RTL_LIBCALL_P (last_call_insn ()) = 1; + if (flag_ipa_ra) { rtx datum = orgfun; diff --git a/gcc/final.cc b/gcc/final.cc index eea572238f6..df57de5afd0 100644 --- a/gcc/final.cc +++ b/gcc/final.cc @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt) reorg.cc, since the branch splitting exposes new instructions with delay slots. */ +static rtx call_from_call_insn (rtx_call_insn *insn); + void shorten_branches (rtx_insn *first) { @@ -850,6 +852,24 @@ shorten_branches (rtx_insn *first) for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn)) { INSN_SHUID (insn) = i++; + + /* If this is a `call' instruction that implements a libcall, + and this machine requires an external definition for library + functions, write one out. */ + if (CALL_P (insn) && RTL_LIBCALL_P (insn)) + { + rtx nested_call = call_from_call_insn (safe_as_a <rtx_call_insn*> (insn)); + rtx mem = XEXP (nested_call, 0); + gcc_assert (GET_CODE (mem) == MEM); + rtx fun = XEXP (mem, 0); + gcc_assert (GET_CODE (fun) == SYMBOL_REF); + assemble_external_libcall (fun); + + /* Clear the LIBCALL flag to make sure we don't assemble the + external definition more than once. */ + RTL_LIBCALL_P (insn) = 0; + } + if (INSN_P (insn)) continue; diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc index e115f987173..26a06511619 100644 --- a/gcc/print-rtl.cc +++ b/gcc/print-rtl.cc @@ -882,6 +882,9 @@ rtx_writer::print_rtx (const_rtx in_rtx) if (RTX_FLAG (in_rtx, return_val)) fputs ("/i", m_outfile); + if (RTX_FLAG (in_rtx, is_libcall)) + fputs ("/l", m_outfile); + /* Print REG_NOTE names for EXPR_LIST and INSN_LIST. */ if ((GET_CODE (in_rtx) == EXPR_LIST || GET_CODE (in_rtx) == INSN_LIST diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc index 62c7895af60..eb0ae150a5b 100644 --- a/gcc/read-rtl.cc +++ b/gcc/read-rtl.cc @@ -1543,6 +1543,9 @@ read_flags (rtx return_rtx) case 'i': RTX_FLAG (return_rtx, return_val) = 1; break; + case 'l': + RTX_FLAG (return_rtx, is_libcall) = 1; + break; default: fatal_with_file_and_line ("unrecognized flag: `%c'", flag_char); } diff --git a/gcc/rtl.h b/gcc/rtl.h index 7a8c4709257..92c802d3876 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -401,6 +401,10 @@ struct GTY((desc("0"), tag("0"), Dumped as "/i" in RTL dumps. */ unsigned return_val : 1; + /* 1 in a CALL_INSN if it is a libcall. + Dumped as "/l" in RTL dumps. */ + unsigned is_libcall : 1; + union { /* The final union field is aligned to 64 bits on LP64 hosts, giving a 32-bit gap after the fields above. We optimize the @@ -1578,6 +1582,10 @@ jump_table_for_label (const rtx_code_label *label) #define RTL_PURE_CALL_P(RTX) \ (RTL_FLAG_CHECK1 ("RTL_PURE_CALL_P", (RTX), CALL_INSN)->return_val) +/* 1 if RTX is a libcall. */ +#define RTL_LIBCALL_P(RTX) \ + (RTL_FLAG_CHECK1 ("RTL_LIBCALL_P", (RTX), CALL_INSN)->is_libcall) + /* 1 if RTX is a call to a const or pure function. */ #define RTL_CONST_OR_PURE_CALL_P(RTX) \ (RTL_CONST_CALL_P (RTX) || RTL_PURE_CALL_P (RTX)) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 22:40 ` Jose E. Marchesi @ 2023-01-04 8:58 ` Jose E. Marchesi 2023-01-09 8:05 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Jose E. Marchesi @ 2023-01-04 8:58 UTC (permalink / raw) To: Jakub Jelinek; +Cc: gcc-patches, david.faust ping. Would this be a good approach for fixing the issue? > Hi Jakub. > >> On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: >>> So, I guess the right fix would be to call assemble_external_libcall >>> during final? The `.global FOO' directive would be generated >>> immediately before the call sequence, but I guess that would be ok. >> >> During final only if all the targets can deal with the effects of >> assemble_external_libcall being done in the middle of emitting assembly >> for the function. >> >> Otherwise, it could be e.g. done in the first loop of shorten_branches. >> >> Note, in calls.cc it is done only for emit_library_call_value_1 >> and not for emit_call_1, so if we do it late, we need to be able to find >> out what call is to a libcall and what is to a normal call. If there is >> no way to differentiate it right now, perhaps we need some flag somewhere, >> say on a SYMBOL_REF. And then assemble_external_libcall either only >> if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or >> perhaps anywhere in the function and its constant pool. > > Allright, the quick-and-dirty patch below seems to DTRT with simple > examples. > > First, when libcalls are generated. Note only one .global is generated > for all calls, and actually it is around the same position than before: > > $ cat foo.c > int foo(unsigned int len, int flag) > { > if (flag) > return (((long)len) * 234 / 5); > return (((long)len) * 2 / 5); > } > $ cc1 -O2 foo.c > $ cat foo.c > .file "foo.c" > .text > .global __divdi3 > .align 3 > .global foo > .type foo, @function > foo: > mov32 %r1,%r1 > lsh %r2,32 > jne %r2,0,.L5 > mov %r2,5 > lsh %r1,1 > call __divdi3 > lsh %r0,32 > arsh %r0,32 > exit > .L5: > mov %r2,5 > mul %r1,234 > call __divdi3 > lsh %r0,32 > arsh %r0,32 > exit > .size foo, .-foo > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > Second, when libcalls are tried by expand_moddiv in a sequence, but then > discarded and not linked in the main sequence: > > $ cat foo.c > int foo(unsigned int len, int flag) > { > if (flag) > return (((long)len) * 234 / 5); > return (((long)len) * 2 / 5); > } > $ cc1 -O2 foo.c > $ cat foo.c > .file "foo.c" > .text > .align 3 > .global foo > .type foo, @function > foo: > mov32 %r0,%r1 > lsh %r2,32 > jne %r2,0,.L5 > add %r0,%r0 > div %r0,5 > lsh %r0,32 > arsh %r0,32 > exit > .L5: > mul %r0,234 > div %r0,5 > lsh %r0,32 > arsh %r0,32 > exit > .size foo, .-foo > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > Note the .global now is not generated, as desired. > > As you can see below, I am adding a new RTX flag `is_libcall', with > written form "/l". > > Before I get into serious testing etc, can you please confirm whether > this is the right approach or not? > > In particular, I am a little bit concerned about the expectation I am > using that the target of the `call' instruction emitted by emit_call_1 > is always a (MEM (SYMBOL_REF ...)) when it is passed a SYMBOL_REF as the > first argument (`fun' in emit_library_call_value_1). > > Thanks. > > diff --git a/gcc/calls.cc b/gcc/calls.cc > index 6dd6f73e978..6c4a3725272 100644 > --- a/gcc/calls.cc > +++ b/gcc/calls.cc > @@ -4370,10 +4370,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, > || argvec[i].partial != 0) > update_stack_alignment_for_call (&argvec[i].locate); > > - /* If this machine requires an external definition for library > - functions, write one out. */ > - assemble_external_libcall (fun); > - > original_args_size = args_size; > args_size.constant = (aligned_upper_bound (args_size.constant > + stack_pointer_delta, > @@ -4717,6 +4713,9 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, > valreg, > old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); > > + /* Mark the emitted call as a libcall with the new flag. */ > + RTL_LIBCALL_P (last_call_insn ()) = 1; > + > if (flag_ipa_ra) > { > rtx datum = orgfun; > diff --git a/gcc/final.cc b/gcc/final.cc > index eea572238f6..df57de5afd0 100644 > --- a/gcc/final.cc > +++ b/gcc/final.cc > @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt) > reorg.cc, since the branch splitting exposes new instructions with delay > slots. */ > > +static rtx call_from_call_insn (rtx_call_insn *insn); > + > void > shorten_branches (rtx_insn *first) > { > @@ -850,6 +852,24 @@ shorten_branches (rtx_insn *first) > for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn)) > { > INSN_SHUID (insn) = i++; > + > + /* If this is a `call' instruction that implements a libcall, > + and this machine requires an external definition for library > + functions, write one out. */ > + if (CALL_P (insn) && RTL_LIBCALL_P (insn)) > + { > + rtx nested_call = call_from_call_insn (safe_as_a <rtx_call_insn*> (insn)); > + rtx mem = XEXP (nested_call, 0); > + gcc_assert (GET_CODE (mem) == MEM); > + rtx fun = XEXP (mem, 0); > + gcc_assert (GET_CODE (fun) == SYMBOL_REF); > + assemble_external_libcall (fun); > + > + /* Clear the LIBCALL flag to make sure we don't assemble the > + external definition more than once. */ > + RTL_LIBCALL_P (insn) = 0; > + } > + > if (INSN_P (insn)) > continue; > > diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc > index e115f987173..26a06511619 100644 > --- a/gcc/print-rtl.cc > +++ b/gcc/print-rtl.cc > @@ -882,6 +882,9 @@ rtx_writer::print_rtx (const_rtx in_rtx) > if (RTX_FLAG (in_rtx, return_val)) > fputs ("/i", m_outfile); > > + if (RTX_FLAG (in_rtx, is_libcall)) > + fputs ("/l", m_outfile); > + > /* Print REG_NOTE names for EXPR_LIST and INSN_LIST. */ > if ((GET_CODE (in_rtx) == EXPR_LIST > || GET_CODE (in_rtx) == INSN_LIST > diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc > index 62c7895af60..eb0ae150a5b 100644 > --- a/gcc/read-rtl.cc > +++ b/gcc/read-rtl.cc > @@ -1543,6 +1543,9 @@ read_flags (rtx return_rtx) > case 'i': > RTX_FLAG (return_rtx, return_val) = 1; > break; > + case 'l': > + RTX_FLAG (return_rtx, is_libcall) = 1; > + break; > default: > fatal_with_file_and_line ("unrecognized flag: `%c'", flag_char); > } > diff --git a/gcc/rtl.h b/gcc/rtl.h > index 7a8c4709257..92c802d3876 100644 > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -401,6 +401,10 @@ struct GTY((desc("0"), tag("0"), > Dumped as "/i" in RTL dumps. */ > unsigned return_val : 1; > > + /* 1 in a CALL_INSN if it is a libcall. > + Dumped as "/l" in RTL dumps. */ > + unsigned is_libcall : 1; > + > union { > /* The final union field is aligned to 64 bits on LP64 hosts, > giving a 32-bit gap after the fields above. We optimize the > @@ -1578,6 +1582,10 @@ jump_table_for_label (const rtx_code_label *label) > #define RTL_PURE_CALL_P(RTX) \ > (RTL_FLAG_CHECK1 ("RTL_PURE_CALL_P", (RTX), CALL_INSN)->return_val) > > +/* 1 if RTX is a libcall. */ > +#define RTL_LIBCALL_P(RTX) \ > + (RTL_FLAG_CHECK1 ("RTL_LIBCALL_P", (RTX), CALL_INSN)->is_libcall) > + > /* 1 if RTX is a call to a const or pure function. */ > #define RTL_CONST_OR_PURE_CALL_P(RTX) \ > (RTL_CONST_CALL_P (RTX) || RTL_PURE_CALL_P (RTX)) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2023-01-04 8:58 ` Jose E. Marchesi @ 2023-01-09 8:05 ` Richard Biener 2023-01-09 9:57 ` Jakub Jelinek 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2023-01-09 8:05 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: Jakub Jelinek, gcc-patches, david.faust On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > > ping. > Would this be a good approach for fixing the issue? adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more bits here. I really wonder how other targets avoid the issue you are pointing out? Do their assemblers prune unused (extern) .global? > > Hi Jakub. > > > >> On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: > >>> So, I guess the right fix would be to call assemble_external_libcall > >>> during final? The `.global FOO' directive would be generated > >>> immediately before the call sequence, but I guess that would be ok. > >> > >> During final only if all the targets can deal with the effects of > >> assemble_external_libcall being done in the middle of emitting assembly > >> for the function. > >> > >> Otherwise, it could be e.g. done in the first loop of shorten_branches. > >> > >> Note, in calls.cc it is done only for emit_library_call_value_1 > >> and not for emit_call_1, so if we do it late, we need to be able to find > >> out what call is to a libcall and what is to a normal call. If there is > >> no way to differentiate it right now, perhaps we need some flag somewhere, > >> say on a SYMBOL_REF. And then assemble_external_libcall either only > >> if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or > >> perhaps anywhere in the function and its constant pool. > > > > Allright, the quick-and-dirty patch below seems to DTRT with simple > > examples. > > > > First, when libcalls are generated. Note only one .global is generated > > for all calls, and actually it is around the same position than before: > > > > $ cat foo.c > > int foo(unsigned int len, int flag) > > { > > if (flag) > > return (((long)len) * 234 / 5); > > return (((long)len) * 2 / 5); > > } > > $ cc1 -O2 foo.c > > $ cat foo.c > > .file "foo.c" > > .text > > .global __divdi3 > > .align 3 > > .global foo > > .type foo, @function > > foo: > > mov32 %r1,%r1 > > lsh %r2,32 > > jne %r2,0,.L5 > > mov %r2,5 > > lsh %r1,1 > > call __divdi3 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .L5: > > mov %r2,5 > > mul %r1,234 > > call __divdi3 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .size foo, .-foo > > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > > > Second, when libcalls are tried by expand_moddiv in a sequence, but then > > discarded and not linked in the main sequence: > > > > $ cat foo.c > > int foo(unsigned int len, int flag) > > { > > if (flag) > > return (((long)len) * 234 / 5); > > return (((long)len) * 2 / 5); > > } > > $ cc1 -O2 foo.c > > $ cat foo.c > > .file "foo.c" > > .text > > .align 3 > > .global foo > > .type foo, @function > > foo: > > mov32 %r0,%r1 > > lsh %r2,32 > > jne %r2,0,.L5 > > add %r0,%r0 > > div %r0,5 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .L5: > > mul %r0,234 > > div %r0,5 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .size foo, .-foo > > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > > > Note the .global now is not generated, as desired. > > > > As you can see below, I am adding a new RTX flag `is_libcall', with > > written form "/l". > > > > Before I get into serious testing etc, can you please confirm whether > > this is the right approach or not? > > > > In particular, I am a little bit concerned about the expectation I am > > using that the target of the `call' instruction emitted by emit_call_1 > > is always a (MEM (SYMBOL_REF ...)) when it is passed a SYMBOL_REF as the > > first argument (`fun' in emit_library_call_value_1). > > > > Thanks. > > > > diff --git a/gcc/calls.cc b/gcc/calls.cc > > index 6dd6f73e978..6c4a3725272 100644 > > --- a/gcc/calls.cc > > +++ b/gcc/calls.cc > > @@ -4370,10 +4370,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, > > || argvec[i].partial != 0) > > update_stack_alignment_for_call (&argvec[i].locate); > > > > - /* If this machine requires an external definition for library > > - functions, write one out. */ > > - assemble_external_libcall (fun); > > - > > original_args_size = args_size; > > args_size.constant = (aligned_upper_bound (args_size.constant > > + stack_pointer_delta, > > @@ -4717,6 +4713,9 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, > > valreg, > > old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); > > > > + /* Mark the emitted call as a libcall with the new flag. */ > > + RTL_LIBCALL_P (last_call_insn ()) = 1; > > + > > if (flag_ipa_ra) > > { > > rtx datum = orgfun; > > diff --git a/gcc/final.cc b/gcc/final.cc > > index eea572238f6..df57de5afd0 100644 > > --- a/gcc/final.cc > > +++ b/gcc/final.cc > > @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt) > > reorg.cc, since the branch splitting exposes new instructions with delay > > slots. */ > > > > +static rtx call_from_call_insn (rtx_call_insn *insn); > > + > > void > > shorten_branches (rtx_insn *first) > > { > > @@ -850,6 +852,24 @@ shorten_branches (rtx_insn *first) > > for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn)) > > { > > INSN_SHUID (insn) = i++; > > + > > + /* If this is a `call' instruction that implements a libcall, > > + and this machine requires an external definition for library > > + functions, write one out. */ > > + if (CALL_P (insn) && RTL_LIBCALL_P (insn)) > > + { > > + rtx nested_call = call_from_call_insn (safe_as_a <rtx_call_insn*> (insn)); > > + rtx mem = XEXP (nested_call, 0); > > + gcc_assert (GET_CODE (mem) == MEM); > > + rtx fun = XEXP (mem, 0); > > + gcc_assert (GET_CODE (fun) == SYMBOL_REF); > > + assemble_external_libcall (fun); > > + > > + /* Clear the LIBCALL flag to make sure we don't assemble the > > + external definition more than once. */ > > + RTL_LIBCALL_P (insn) = 0; > > + } > > + > > if (INSN_P (insn)) > > continue; > > > > diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc > > index e115f987173..26a06511619 100644 > > --- a/gcc/print-rtl.cc > > +++ b/gcc/print-rtl.cc > > @@ -882,6 +882,9 @@ rtx_writer::print_rtx (const_rtx in_rtx) > > if (RTX_FLAG (in_rtx, return_val)) > > fputs ("/i", m_outfile); > > > > + if (RTX_FLAG (in_rtx, is_libcall)) > > + fputs ("/l", m_outfile); > > + > > /* Print REG_NOTE names for EXPR_LIST and INSN_LIST. */ > > if ((GET_CODE (in_rtx) == EXPR_LIST > > || GET_CODE (in_rtx) == INSN_LIST > > diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc > > index 62c7895af60..eb0ae150a5b 100644 > > --- a/gcc/read-rtl.cc > > +++ b/gcc/read-rtl.cc > > @@ -1543,6 +1543,9 @@ read_flags (rtx return_rtx) > > case 'i': > > RTX_FLAG (return_rtx, return_val) = 1; > > break; > > + case 'l': > > + RTX_FLAG (return_rtx, is_libcall) = 1; > > + break; > > default: > > fatal_with_file_and_line ("unrecognized flag: `%c'", flag_char); > > } > > diff --git a/gcc/rtl.h b/gcc/rtl.h > > index 7a8c4709257..92c802d3876 100644 > > --- a/gcc/rtl.h > > +++ b/gcc/rtl.h > > @@ -401,6 +401,10 @@ struct GTY((desc("0"), tag("0"), > > Dumped as "/i" in RTL dumps. */ > > unsigned return_val : 1; > > > > + /* 1 in a CALL_INSN if it is a libcall. > > + Dumped as "/l" in RTL dumps. */ > > + unsigned is_libcall : 1; > > + > > union { > > /* The final union field is aligned to 64 bits on LP64 hosts, > > giving a 32-bit gap after the fields above. We optimize the > > @@ -1578,6 +1582,10 @@ jump_table_for_label (const rtx_code_label *label) > > #define RTL_PURE_CALL_P(RTX) \ > > (RTL_FLAG_CHECK1 ("RTL_PURE_CALL_P", (RTX), CALL_INSN)->return_val) > > > > +/* 1 if RTX is a libcall. */ > > +#define RTL_LIBCALL_P(RTX) \ > > + (RTL_FLAG_CHECK1 ("RTL_LIBCALL_P", (RTX), CALL_INSN)->is_libcall) > > + > > /* 1 if RTX is a call to a const or pure function. */ > > #define RTL_CONST_OR_PURE_CALL_P(RTX) \ > > (RTL_CONST_CALL_P (RTX) || RTL_PURE_CALL_P (RTX)) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2023-01-09 8:05 ` Richard Biener @ 2023-01-09 9:57 ` Jakub Jelinek 2023-01-09 13:04 ` Richard Biener 2023-01-09 14:01 ` Jeff Law 0 siblings, 2 replies; 15+ messages in thread From: Jakub Jelinek @ 2023-01-09 9:57 UTC (permalink / raw) To: Richard Biener; +Cc: Jose E. Marchesi, gcc-patches, david.faust On Mon, Jan 09, 2023 at 09:05:26AM +0100, Richard Biener wrote: > On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > > > ping. > > Would this be a good approach for fixing the issue? > > adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more > bits here. That is obviously not the way to go, sure. > I really wonder how other targets avoid the issue you are pointing out? > Do their assemblers prune unused (extern) .global? I think no target solves this, if they see an extern call during expansion and emit some directive for those, they emit the global or whatever directive which remains there. If all bits for CALL_INSN are taken, can't we add a flag on the CALL rtx inside of the CALL_INSN pattern? Or a flag on the SYMBOL_REF inside of it (libcalls are always direct calls, aren't they) or SYMBOL_REF_FLAGS ? Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2023-01-09 9:57 ` Jakub Jelinek @ 2023-01-09 13:04 ` Richard Biener 2023-01-09 13:25 ` Jakub Jelinek 2023-01-09 14:01 ` Jeff Law 1 sibling, 1 reply; 15+ messages in thread From: Richard Biener @ 2023-01-09 13:04 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jose E. Marchesi, gcc-patches, david.faust On Mon, Jan 9, 2023 at 10:58 AM Jakub Jelinek <jakub@redhat.com> wrote: > > On Mon, Jan 09, 2023 at 09:05:26AM +0100, Richard Biener wrote: > > On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > ping. > > > Would this be a good approach for fixing the issue? > > > > adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more > > bits here. > > That is obviously not the way to go, sure. > > > I really wonder how other targets avoid the issue you are pointing out? > > Do their assemblers prune unused (extern) .global? > > I think no target solves this, if they see an extern call during expansion > and emit some directive for those, they emit the global or whatever directive > which remains there. > > If all bits for CALL_INSN are taken, can't we add a flag on the CALL > rtx inside of the CALL_INSN pattern? Or a flag on the SYMBOL_REF inside of > it (libcalls are always direct calls, aren't they) or SYMBOL_REF_FLAGS ? I suppose the SYMBOL_REF would be what I'd target here. Note we already have /* 1 if RTX is a symbol_ref that has been the library function in emit_library_call. */ #define SYMBOL_REF_USED(RTX) \ (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used) so can't we just use that during the final scan for the delayed assembling? > > Jakub > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2023-01-09 13:04 ` Richard Biener @ 2023-01-09 13:25 ` Jakub Jelinek 0 siblings, 0 replies; 15+ messages in thread From: Jakub Jelinek @ 2023-01-09 13:25 UTC (permalink / raw) To: Richard Biener; +Cc: Jose E. Marchesi, gcc-patches, david.faust On Mon, Jan 09, 2023 at 02:04:48PM +0100, Richard Biener via Gcc-patches wrote: > On Mon, Jan 9, 2023 at 10:58 AM Jakub Jelinek <jakub@redhat.com> wrote: > > > > On Mon, Jan 09, 2023 at 09:05:26AM +0100, Richard Biener wrote: > > > On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > > > > > ping. > > > > Would this be a good approach for fixing the issue? > > > > > > adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more > > > bits here. > > > > That is obviously not the way to go, sure. > > > > > I really wonder how other targets avoid the issue you are pointing out? > > > Do their assemblers prune unused (extern) .global? > > > > I think no target solves this, if they see an extern call during expansion > > and emit some directive for those, they emit the global or whatever directive > > which remains there. > > > > If all bits for CALL_INSN are taken, can't we add a flag on the CALL > > rtx inside of the CALL_INSN pattern? Or a flag on the SYMBOL_REF inside of > > it (libcalls are always direct calls, aren't they) or SYMBOL_REF_FLAGS ? > > I suppose the SYMBOL_REF would be what I'd target here. Note we already > have > > /* 1 if RTX is a symbol_ref that has been the library function in > emit_library_call. */ > #define SYMBOL_REF_USED(RTX) \ > (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used) > > so can't we just use that during the final scan for the delayed assembling? No, this one can't, it is used to avoid emitting the external directive multiple times. We need something next to it to identify for which symbols that should be done. Or of course if we are really out of bits that could be used for it, the above could be repurposed for SYMBOL_REF_LIBCALL and the current SYMBOL_REF_USED could be handled with a hash set. Jakub ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2023-01-09 9:57 ` Jakub Jelinek 2023-01-09 13:04 ` Richard Biener @ 2023-01-09 14:01 ` Jeff Law 1 sibling, 0 replies; 15+ messages in thread From: Jeff Law @ 2023-01-09 14:01 UTC (permalink / raw) To: Jakub Jelinek, Richard Biener; +Cc: Jose E. Marchesi, gcc-patches, david.faust On 1/9/23 02:57, Jakub Jelinek via Gcc-patches wrote: > On Mon, Jan 09, 2023 at 09:05:26AM +0100, Richard Biener wrote: >> On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches >> <gcc-patches@gcc.gnu.org> wrote: >>> >>> >>> ping. >>> Would this be a good approach for fixing the issue? >> >> adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more >> bits here. > > That is obviously not the way to go, sure. > >> I really wonder how other targets avoid the issue you are pointing out? >> Do their assemblers prune unused (extern) .global? > > I think no target solves this, if they see an extern call during expansion > and emit some directive for those, they emit the global or whatever directive > which remains there. > > If all bits for CALL_INSN are taken, can't we add a flag on the CALL > rtx inside of the CALL_INSN pattern? Or a flag on the SYMBOL_REF inside of > it (libcalls are always direct calls, aren't they) or SYMBOL_REF_FLAGS ? You might look at 32bit PA SOM. It was always a bit odd in this respect. You had to import every external symbol explicitly and it disliked importing something that wasn't used. I recall some special handling for libcalls as well. Jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 10:59 [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization Jose E. Marchesi 2022-12-08 12:20 ` Jakub Jelinek @ 2022-12-08 13:42 ` Richard Biener 2022-12-08 16:03 ` Jose E. Marchesi 2023-01-30 18:45 ` Andrew Pinski 2 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2022-12-08 13:42 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: gcc-patches > Am 08.12.2022 um 11:56 schrieb Jose E. Marchesi via Gcc-patches <gcc-patches@gcc.gnu.org>: > > The expand_expr_divmod function in expr.cc attempts to optimize cases > where both arguments of a division/modulus are known to be positive > when interpreted as signed. In these cases, both signed division and > unsigned division will raise the same value, and therefore the > cheapest option can be used. > > In order to determine what is the cheaper option in the current > target, expand_expr_divmod actually expands both a signed divmod and > an unsigned divmod using local "sequences": > > start_sequence (); > ... > expand_divmod (... signed ...); > ... > end_sequence (); > > start_sequence (); > ... > expand_divmod (... unsigned ...); > ... > end_sequence (); > > And then compares the cost of each generated sequence, choosing the > best one. Finally, it emits the selected expanded sequence and > returns the rtx with the result. > > This approach has a caveat. Some targets do not provide instructions > for division/modulus instructions. In the case of BPF, it provides > unsigned division/modulus, but not signed division/modulus. > > In these cases, the expand_divmod tries can contain calls to funcalls. > For example, in BPF: > > start_sequence (); > ... > expand_divmod (... signed ...); -> This generates funcall to __divdi3 > ... > end_sequence (); > > start_sequence (); > ... > expand_divmod (... unsigned ...); -> This generates direct `div' insn. > ... > end_sequence (); > > The problem is that when a funcall is expanded, an accompanying global > symbol definition is written in the output stream: > > .global __divdi3 > > And this symbol definition remains in the compiled assembly file, even > if the sequence using the direct `div' instruction above is used. > > This is particularly bad in BPF, because the kernel bpf loader chokes > on the spurious symbol __divdi3 and makes the resulting BPF object > unloadable (note that BPF objects are not linked before processed by > the kernel.) > > In order to fix this, this patch modifies expand_expr_divmod in the > following way: > > - When trying each sequence (signed, unsigned) the expand_divmod calls > are told to _not_ use libcalls if everything else fails. This is > done by passing OPTAB_WIDEN as the `methods' argument. (Before it > was using the default value OPTAB_LIB_WIDEN.) > > - If any of the tried expanded sequences contain a funcall, then the > optimization is not attempted. How do libcalls appear in iff you specify OPTABS_WIDEN only? Doesn’t that allow to simplify this and also use the sequence without a libcall? Richard > > A couple of BPF tests are also added to make sure this doesn't break > at any point in the future. > > Tested in bpf-unknown-none and x86_64-linux-gnu. > Regtested in x86_64-linux-gnu. No regressions. > > gcc/ChangeLog > > * expr.cc (expand_expr_divmod): Avoid side-effects of trying > sequences involving funcalls in optimization. > > gcc/testsuite/ChangeLog: > > * gcc.target/bpf/divmod-funcall-1.c: New test. > * gcc.target/bpf/divmod-funcall-2.c: Likewise. > --- > gcc/expr.cc | 44 +++++++++++-------- > .../gcc.target/bpf/divmod-funcall-1.c | 8 ++++ > .../gcc.target/bpf/divmod-funcall-2.c | 8 ++++ > 3 files changed, 41 insertions(+), 19 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index d9407432ea5..4d4be5d7bda 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -9168,32 +9168,38 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, > do_pending_stack_adjust (); > start_sequence (); > rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, > - op0, op1, target, 1); > + op0, op1, target, 1, OPTAB_WIDEN); > rtx_insn *uns_insns = get_insns (); > end_sequence (); > start_sequence (); > rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, > - op0, op1, target, 0); > + op0, op1, target, 0, OPTAB_WIDEN); > rtx_insn *sgn_insns = get_insns (); > end_sequence (); > - unsigned uns_cost = seq_cost (uns_insns, speed_p); > - unsigned sgn_cost = seq_cost (sgn_insns, speed_p); > > - /* If costs are the same then use as tie breaker the other other > - factor. */ > - if (uns_cost == sgn_cost) > - { > - uns_cost = seq_cost (uns_insns, !speed_p); > - sgn_cost = seq_cost (sgn_insns, !speed_p); > - } > - > - if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) > - { > - emit_insn (uns_insns); > - return uns_ret; > - } > - emit_insn (sgn_insns); > - return sgn_ret; > + /* Do not try to optimize if any of the sequences tried above > + resulted in a funcall. */ > + if (uns_ret && sgn_ret) > + { > + unsigned uns_cost = seq_cost (uns_insns, speed_p); > + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); > + > + /* If costs are the same then use as tie breaker the other > + other factor. */ > + if (uns_cost == sgn_cost) > + { > + uns_cost = seq_cost (uns_insns, !speed_p); > + sgn_cost = seq_cost (sgn_insns, !speed_p); > + } > + > + if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) > + { > + emit_insn (uns_insns); > + return uns_ret; > + } > + emit_insn (sgn_insns); > + return sgn_ret; > + } > } > return expand_divmod (mod_p, code, mode, treeop0, treeop1, > op0, op1, target, unsignedp); > diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > new file mode 100644 > index 00000000000..dffb1506f06 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler-not "__divdi3" } } */ > + > +int > +foo (unsigned int len) > +{ > + return ((unsigned long)len) * 234 / 5; > +} > diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > new file mode 100644 > index 00000000000..41e8e40c35c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler-not "__moddi3" } } */ > + > +int > +foo (unsigned int len) > +{ > + return ((unsigned long)len) * 234 % 5; > +} > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 13:42 ` Richard Biener @ 2022-12-08 16:03 ` Jose E. Marchesi 0 siblings, 0 replies; 15+ messages in thread From: Jose E. Marchesi @ 2022-12-08 16:03 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches >> Am 08.12.2022 um 11:56 schrieb Jose E. Marchesi via Gcc-patches <gcc-patches@gcc.gnu.org>: >> >> The expand_expr_divmod function in expr.cc attempts to optimize cases >> where both arguments of a division/modulus are known to be positive >> when interpreted as signed. In these cases, both signed division and >> unsigned division will raise the same value, and therefore the >> cheapest option can be used. >> >> In order to determine what is the cheaper option in the current >> target, expand_expr_divmod actually expands both a signed divmod and >> an unsigned divmod using local "sequences": >> >> start_sequence (); >> ... >> expand_divmod (... signed ...); >> ... >> end_sequence (); >> >> start_sequence (); >> ... >> expand_divmod (... unsigned ...); >> ... >> end_sequence (); >> >> And then compares the cost of each generated sequence, choosing the >> best one. Finally, it emits the selected expanded sequence and >> returns the rtx with the result. >> >> This approach has a caveat. Some targets do not provide instructions >> for division/modulus instructions. In the case of BPF, it provides >> unsigned division/modulus, but not signed division/modulus. >> >> In these cases, the expand_divmod tries can contain calls to funcalls. >> For example, in BPF: >> >> start_sequence (); >> ... >> expand_divmod (... signed ...); -> This generates funcall to __divdi3 >> ... >> end_sequence (); >> >> start_sequence (); >> ... >> expand_divmod (... unsigned ...); -> This generates direct `div' insn. >> ... >> end_sequence (); >> >> The problem is that when a funcall is expanded, an accompanying global >> symbol definition is written in the output stream: >> >> .global __divdi3 >> >> And this symbol definition remains in the compiled assembly file, even >> if the sequence using the direct `div' instruction above is used. >> >> This is particularly bad in BPF, because the kernel bpf loader chokes >> on the spurious symbol __divdi3 and makes the resulting BPF object >> unloadable (note that BPF objects are not linked before processed by >> the kernel.) >> >> In order to fix this, this patch modifies expand_expr_divmod in the >> following way: >> >> - When trying each sequence (signed, unsigned) the expand_divmod calls >> are told to _not_ use libcalls if everything else fails. This is >> done by passing OPTAB_WIDEN as the `methods' argument. (Before it >> was using the default value OPTAB_LIB_WIDEN.) >> >> - If any of the tried expanded sequences contain a funcall, then the >> optimization is not attempted. > > How do libcalls appear in iff you specify OPTABS_WIDEN only? Doesn’t > that allow to simplify this and also use the sequence without a > libcall? If you pass OPTABS_WIDEN only then libcalls are not an option and (as far as I can tell) expand_divmod returns NULL if a libcall is the only possibility. > Richard > >> >> A couple of BPF tests are also added to make sure this doesn't break >> at any point in the future. >> >> Tested in bpf-unknown-none and x86_64-linux-gnu. >> Regtested in x86_64-linux-gnu. No regressions. >> >> gcc/ChangeLog >> >> * expr.cc (expand_expr_divmod): Avoid side-effects of trying >> sequences involving funcalls in optimization. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/bpf/divmod-funcall-1.c: New test. >> * gcc.target/bpf/divmod-funcall-2.c: Likewise. >> --- >> gcc/expr.cc | 44 +++++++++++-------- >> .../gcc.target/bpf/divmod-funcall-1.c | 8 ++++ >> .../gcc.target/bpf/divmod-funcall-2.c | 8 ++++ >> 3 files changed, 41 insertions(+), 19 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c >> create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index d9407432ea5..4d4be5d7bda 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -9168,32 +9168,38 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, >> do_pending_stack_adjust (); >> start_sequence (); >> rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, >> - op0, op1, target, 1); >> + op0, op1, target, 1, OPTAB_WIDEN); >> rtx_insn *uns_insns = get_insns (); >> end_sequence (); >> start_sequence (); >> rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, >> - op0, op1, target, 0); >> + op0, op1, target, 0, OPTAB_WIDEN); >> rtx_insn *sgn_insns = get_insns (); >> end_sequence (); >> - unsigned uns_cost = seq_cost (uns_insns, speed_p); >> - unsigned sgn_cost = seq_cost (sgn_insns, speed_p); >> >> - /* If costs are the same then use as tie breaker the other other >> - factor. */ >> - if (uns_cost == sgn_cost) >> - { >> - uns_cost = seq_cost (uns_insns, !speed_p); >> - sgn_cost = seq_cost (sgn_insns, !speed_p); >> - } >> - >> - if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) >> - { >> - emit_insn (uns_insns); >> - return uns_ret; >> - } >> - emit_insn (sgn_insns); >> - return sgn_ret; >> + /* Do not try to optimize if any of the sequences tried above >> + resulted in a funcall. */ >> + if (uns_ret && sgn_ret) >> + { >> + unsigned uns_cost = seq_cost (uns_insns, speed_p); >> + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); >> + >> + /* If costs are the same then use as tie breaker the other >> + other factor. */ >> + if (uns_cost == sgn_cost) >> + { >> + uns_cost = seq_cost (uns_insns, !speed_p); >> + sgn_cost = seq_cost (sgn_insns, !speed_p); >> + } >> + >> + if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) >> + { >> + emit_insn (uns_insns); >> + return uns_ret; >> + } >> + emit_insn (sgn_insns); >> + return sgn_ret; >> + } >> } >> return expand_divmod (mod_p, code, mode, treeop0, treeop1, >> op0, op1, target, unsignedp); >> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c >> new file mode 100644 >> index 00000000000..dffb1506f06 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-final { scan-assembler-not "__divdi3" } } */ >> + >> +int >> +foo (unsigned int len) >> +{ >> + return ((unsigned long)len) * 234 / 5; >> +} >> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c >> new file mode 100644 >> index 00000000000..41e8e40c35c >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-final { scan-assembler-not "__moddi3" } } */ >> + >> +int >> +foo (unsigned int len) >> +{ >> + return ((unsigned long)len) * 234 % 5; >> +} >> -- >> 2.30.2 >> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2022-12-08 10:59 [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization Jose E. Marchesi 2022-12-08 12:20 ` Jakub Jelinek 2022-12-08 13:42 ` Richard Biener @ 2023-01-30 18:45 ` Andrew Pinski 2023-01-30 18:55 ` Jose E. Marchesi 2 siblings, 1 reply; 15+ messages in thread From: Andrew Pinski @ 2023-01-30 18:45 UTC (permalink / raw) To: Jose E. Marchesi; +Cc: gcc-patches On Thu, Dec 8, 2022 at 2:56 AM Jose E. Marchesi via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > The expand_expr_divmod function in expr.cc attempts to optimize cases > where both arguments of a division/modulus are known to be positive > when interpreted as signed. In these cases, both signed division and > unsigned division will raise the same value, and therefore the > cheapest option can be used. I suspect this issue is also the same as PR 48783 . Thanks, Andrew > > In order to determine what is the cheaper option in the current > target, expand_expr_divmod actually expands both a signed divmod and > an unsigned divmod using local "sequences": > > start_sequence (); > ... > expand_divmod (... signed ...); > ... > end_sequence (); > > start_sequence (); > ... > expand_divmod (... unsigned ...); > ... > end_sequence (); > > And then compares the cost of each generated sequence, choosing the > best one. Finally, it emits the selected expanded sequence and > returns the rtx with the result. > > This approach has a caveat. Some targets do not provide instructions > for division/modulus instructions. In the case of BPF, it provides > unsigned division/modulus, but not signed division/modulus. > > In these cases, the expand_divmod tries can contain calls to funcalls. > For example, in BPF: > > start_sequence (); > ... > expand_divmod (... signed ...); -> This generates funcall to __divdi3 > ... > end_sequence (); > > start_sequence (); > ... > expand_divmod (... unsigned ...); -> This generates direct `div' insn. > ... > end_sequence (); > > The problem is that when a funcall is expanded, an accompanying global > symbol definition is written in the output stream: > > .global __divdi3 > > And this symbol definition remains in the compiled assembly file, even > if the sequence using the direct `div' instruction above is used. > > This is particularly bad in BPF, because the kernel bpf loader chokes > on the spurious symbol __divdi3 and makes the resulting BPF object > unloadable (note that BPF objects are not linked before processed by > the kernel.) > > In order to fix this, this patch modifies expand_expr_divmod in the > following way: > > - When trying each sequence (signed, unsigned) the expand_divmod calls > are told to _not_ use libcalls if everything else fails. This is > done by passing OPTAB_WIDEN as the `methods' argument. (Before it > was using the default value OPTAB_LIB_WIDEN.) > > - If any of the tried expanded sequences contain a funcall, then the > optimization is not attempted. > > A couple of BPF tests are also added to make sure this doesn't break > at any point in the future. > > Tested in bpf-unknown-none and x86_64-linux-gnu. > Regtested in x86_64-linux-gnu. No regressions. > > gcc/ChangeLog > > * expr.cc (expand_expr_divmod): Avoid side-effects of trying > sequences involving funcalls in optimization. > > gcc/testsuite/ChangeLog: > > * gcc.target/bpf/divmod-funcall-1.c: New test. > * gcc.target/bpf/divmod-funcall-2.c: Likewise. > --- > gcc/expr.cc | 44 +++++++++++-------- > .../gcc.target/bpf/divmod-funcall-1.c | 8 ++++ > .../gcc.target/bpf/divmod-funcall-2.c | 8 ++++ > 3 files changed, 41 insertions(+), 19 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > > diff --git a/gcc/expr.cc b/gcc/expr.cc > index d9407432ea5..4d4be5d7bda 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -9168,32 +9168,38 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, > do_pending_stack_adjust (); > start_sequence (); > rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, > - op0, op1, target, 1); > + op0, op1, target, 1, OPTAB_WIDEN); > rtx_insn *uns_insns = get_insns (); > end_sequence (); > start_sequence (); > rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, > - op0, op1, target, 0); > + op0, op1, target, 0, OPTAB_WIDEN); > rtx_insn *sgn_insns = get_insns (); > end_sequence (); > - unsigned uns_cost = seq_cost (uns_insns, speed_p); > - unsigned sgn_cost = seq_cost (sgn_insns, speed_p); > > - /* If costs are the same then use as tie breaker the other other > - factor. */ > - if (uns_cost == sgn_cost) > - { > - uns_cost = seq_cost (uns_insns, !speed_p); > - sgn_cost = seq_cost (sgn_insns, !speed_p); > - } > - > - if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) > - { > - emit_insn (uns_insns); > - return uns_ret; > - } > - emit_insn (sgn_insns); > - return sgn_ret; > + /* Do not try to optimize if any of the sequences tried above > + resulted in a funcall. */ > + if (uns_ret && sgn_ret) > + { > + unsigned uns_cost = seq_cost (uns_insns, speed_p); > + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); > + > + /* If costs are the same then use as tie breaker the other > + other factor. */ > + if (uns_cost == sgn_cost) > + { > + uns_cost = seq_cost (uns_insns, !speed_p); > + sgn_cost = seq_cost (sgn_insns, !speed_p); > + } > + > + if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) > + { > + emit_insn (uns_insns); > + return uns_ret; > + } > + emit_insn (sgn_insns); > + return sgn_ret; > + } > } > return expand_divmod (mod_p, code, mode, treeop0, treeop1, > op0, op1, target, unsignedp); > diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > new file mode 100644 > index 00000000000..dffb1506f06 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler-not "__divdi3" } } */ > + > +int > +foo (unsigned int len) > +{ > + return ((unsigned long)len) * 234 / 5; > +} > diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > new file mode 100644 > index 00000000000..41e8e40c35c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c > @@ -0,0 +1,8 @@ > +/* { dg-do compile } */ > +/* { dg-final { scan-assembler-not "__moddi3" } } */ > + > +int > +foo (unsigned int len) > +{ > + return ((unsigned long)len) * 234 % 5; > +} > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization 2023-01-30 18:45 ` Andrew Pinski @ 2023-01-30 18:55 ` Jose E. Marchesi 0 siblings, 0 replies; 15+ messages in thread From: Jose E. Marchesi @ 2023-01-30 18:55 UTC (permalink / raw) To: Andrew Pinski; +Cc: gcc-patches > On Thu, Dec 8, 2022 at 2:56 AM Jose E. Marchesi via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> The expand_expr_divmod function in expr.cc attempts to optimize cases >> where both arguments of a division/modulus are known to be positive >> when interpreted as signed. In these cases, both signed division and >> unsigned division will raise the same value, and therefore the >> cheapest option can be used. > > I suspect this issue is also the same as PR 48783 . Yeah I kinda dropped the ball here. Will look into Jeff's suggestions on how to fix this without exhausting the CALL_INSN bits. > Thanks, > Andrew > >> >> In order to determine what is the cheaper option in the current >> target, expand_expr_divmod actually expands both a signed divmod and >> an unsigned divmod using local "sequences": >> >> start_sequence (); >> ... >> expand_divmod (... signed ...); >> ... >> end_sequence (); >> >> start_sequence (); >> ... >> expand_divmod (... unsigned ...); >> ... >> end_sequence (); >> >> And then compares the cost of each generated sequence, choosing the >> best one. Finally, it emits the selected expanded sequence and >> returns the rtx with the result. >> >> This approach has a caveat. Some targets do not provide instructions >> for division/modulus instructions. In the case of BPF, it provides >> unsigned division/modulus, but not signed division/modulus. >> >> In these cases, the expand_divmod tries can contain calls to funcalls. >> For example, in BPF: >> >> start_sequence (); >> ... >> expand_divmod (... signed ...); -> This generates funcall to __divdi3 >> ... >> end_sequence (); >> >> start_sequence (); >> ... >> expand_divmod (... unsigned ...); -> This generates direct `div' insn. >> ... >> end_sequence (); >> >> The problem is that when a funcall is expanded, an accompanying global >> symbol definition is written in the output stream: >> >> .global __divdi3 >> >> And this symbol definition remains in the compiled assembly file, even >> if the sequence using the direct `div' instruction above is used. >> >> This is particularly bad in BPF, because the kernel bpf loader chokes >> on the spurious symbol __divdi3 and makes the resulting BPF object >> unloadable (note that BPF objects are not linked before processed by >> the kernel.) >> >> In order to fix this, this patch modifies expand_expr_divmod in the >> following way: >> >> - When trying each sequence (signed, unsigned) the expand_divmod calls >> are told to _not_ use libcalls if everything else fails. This is >> done by passing OPTAB_WIDEN as the `methods' argument. (Before it >> was using the default value OPTAB_LIB_WIDEN.) >> >> - If any of the tried expanded sequences contain a funcall, then the >> optimization is not attempted. >> >> A couple of BPF tests are also added to make sure this doesn't break >> at any point in the future. >> >> Tested in bpf-unknown-none and x86_64-linux-gnu. >> Regtested in x86_64-linux-gnu. No regressions. >> >> gcc/ChangeLog >> >> * expr.cc (expand_expr_divmod): Avoid side-effects of trying >> sequences involving funcalls in optimization. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/bpf/divmod-funcall-1.c: New test. >> * gcc.target/bpf/divmod-funcall-2.c: Likewise. >> --- >> gcc/expr.cc | 44 +++++++++++-------- >> .../gcc.target/bpf/divmod-funcall-1.c | 8 ++++ >> .../gcc.target/bpf/divmod-funcall-2.c | 8 ++++ >> 3 files changed, 41 insertions(+), 19 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c >> create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index d9407432ea5..4d4be5d7bda 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -9168,32 +9168,38 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, >> do_pending_stack_adjust (); >> start_sequence (); >> rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, >> - op0, op1, target, 1); >> + op0, op1, target, 1, OPTAB_WIDEN); >> rtx_insn *uns_insns = get_insns (); >> end_sequence (); >> start_sequence (); >> rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, >> - op0, op1, target, 0); >> + op0, op1, target, 0, OPTAB_WIDEN); >> rtx_insn *sgn_insns = get_insns (); >> end_sequence (); >> - unsigned uns_cost = seq_cost (uns_insns, speed_p); >> - unsigned sgn_cost = seq_cost (sgn_insns, speed_p); >> >> - /* If costs are the same then use as tie breaker the other other >> - factor. */ >> - if (uns_cost == sgn_cost) >> - { >> - uns_cost = seq_cost (uns_insns, !speed_p); >> - sgn_cost = seq_cost (sgn_insns, !speed_p); >> - } >> - >> - if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) >> - { >> - emit_insn (uns_insns); >> - return uns_ret; >> - } >> - emit_insn (sgn_insns); >> - return sgn_ret; >> + /* Do not try to optimize if any of the sequences tried above >> + resulted in a funcall. */ >> + if (uns_ret && sgn_ret) >> + { >> + unsigned uns_cost = seq_cost (uns_insns, speed_p); >> + unsigned sgn_cost = seq_cost (sgn_insns, speed_p); >> + >> + /* If costs are the same then use as tie breaker the other >> + other factor. */ >> + if (uns_cost == sgn_cost) >> + { >> + uns_cost = seq_cost (uns_insns, !speed_p); >> + sgn_cost = seq_cost (sgn_insns, !speed_p); >> + } >> + >> + if (uns_cost < sgn_cost || (uns_cost == sgn_cost && unsignedp)) >> + { >> + emit_insn (uns_insns); >> + return uns_ret; >> + } >> + emit_insn (sgn_insns); >> + return sgn_ret; >> + } >> } >> return expand_divmod (mod_p, code, mode, treeop0, treeop1, >> op0, op1, target, unsignedp); >> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c >> new file mode 100644 >> index 00000000000..dffb1506f06 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-1.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-final { scan-assembler-not "__divdi3" } } */ >> + >> +int >> +foo (unsigned int len) >> +{ >> + return ((unsigned long)len) * 234 / 5; >> +} >> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c >> new file mode 100644 >> index 00000000000..41e8e40c35c >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/bpf/divmod-funcall-2.c >> @@ -0,0 +1,8 @@ >> +/* { dg-do compile } */ >> +/* { dg-final { scan-assembler-not "__moddi3" } } */ >> + >> +int >> +foo (unsigned int len) >> +{ >> + return ((unsigned long)len) * 234 % 5; >> +} >> -- >> 2.30.2 >> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-01-30 18:56 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-08 10:59 [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization Jose E. Marchesi 2022-12-08 12:20 ` Jakub Jelinek 2022-12-08 13:02 ` Jose E. Marchesi 2022-12-08 13:19 ` Jakub Jelinek 2022-12-08 22:40 ` Jose E. Marchesi 2023-01-04 8:58 ` Jose E. Marchesi 2023-01-09 8:05 ` Richard Biener 2023-01-09 9:57 ` Jakub Jelinek 2023-01-09 13:04 ` Richard Biener 2023-01-09 13:25 ` Jakub Jelinek 2023-01-09 14:01 ` Jeff Law 2022-12-08 13:42 ` Richard Biener 2022-12-08 16:03 ` Jose E. Marchesi 2023-01-30 18:45 ` Andrew Pinski 2023-01-30 18:55 ` Jose E. Marchesi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).