Hi Richard, On 02/05/18 11:15, Richard Biener wrote: > On Mon, Apr 30, 2018 at 7:41 PM, Kyrill Tkachov > wrote: >> Hi all, >> >> We can improve the performance of complex floating-point multiplications by >> inlining the expansion a bit more aggressively. >> We can inline complex x = a * b as: >> x = (ar*br - ai*bi) + i(ar*bi + br*ai); >> if (isunordered (__real__ x, __imag__ x)) >> x = __muldc3 (a, b); //Or __mulsc3 for single-precision >> >> That way the common case where no NaNs are produced we can avoid the libgcc >> call and fall back to the >> NaN handling stuff in libgcc if either components of the expansion are NaN. >> >> The implementation is done in expand_complex_multiplication in >> tree-complex.c and the above expansion >> will be done when optimising for -O1 and greater and when not optimising for >> size. >> At -O0 and -Os the single call to libgcc will be emitted. >> >> For the code: >> __complex double >> foo (__complex double a, __complex double b) >> { >> return a * b; >> } >> >> We will now emit at -O2 for aarch64: >> foo: >> fmul d16, d1, d3 >> fmul d6, d1, d2 >> fnmsub d5, d0, d2, d16 >> fmadd d4, d0, d3, d6 >> fcmp d5, d4 >> bvs .L8 >> fmov d1, d4 >> fmov d0, d5 >> ret >> .L8: >> stp x29, x30, [sp, -16]! >> mov x29, sp >> bl __muldc3 >> ldp x29, x30, [sp], 16 >> ret >> >> Instead of just a branch to __muldc3. >> >> Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, >> x86_64-unknown-linux-gnu. >> >> Ok for trunk? (GCC 9) > > + /* If optimizing for size or not at all just do a libcall. */ > + if (optimize == 0 || optimize_function_for_size_p (cfun)) > + { > + expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); > + return; > + } > > use optimize_bb_for_size_p instead please (get BB from the mult stmt). > Done. > /* Expand a complex multiplication or division to a libcall to the c99 > + compliant routines. Unlike expand_complex_libcall create and insert > + the call, assign it to an output variable and return that rather than > + modifying existing statements in place. */ > + > +static tree > +insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, > + tree ai, tree br, tree bi) > +{ > > can you please try merging both functions instead? > I tried that initially, it looked a bit awkward because we sometimes want to modify an existing assignment in-place, other times we want to insert it as a new statement. This latest version keeps only one function and adds an inplace_p parameter. > Also it shows a possible issue if with -fnon-call-exceptions the original > multiplication has EH edges. I think you want to side-step that > by doing the libcall-only way in that case as well (stmt_can_throw_internal). Ok, done. > > + tree isunordered_decl = builtin_decl_explicit (BUILT_IN_ISUNORDERED); > + tree isunordered_res = create_tmp_var (integer_type_node); > + gimple *tmpr_unord_check > + = gimple_build_call (isunordered_decl, 2, tmpr, tmpi); > + gimple_call_set_lhs (tmpr_unord_check, isunordered_res); > + > + gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT); > + gimple *check > + = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node, > + NULL_TREE, NULL_TREE); > > why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with > UNORDERED_EXPR? Note again that might trap/throw with -fsignalling-nans > so better avoid this transform for flag_signalling_nans as well... No reason, I was just not familiar enough with the tree-level expressions. Using UNORDERED_EXPR does look much better. > > + /* We have a conditional block with some assignments in cond_bb. > + Wire up the PHIs to wrap up. */ > + if (gimple_in_ssa_p (cfun)) > + { > > we are always in SSA form(?) (probably tree-complex.c can use some TLC here). I think we're always in SSA form. I suspected this was an artifact from the past. I've removed these conditionals in this latest version and testing doesn't show any problems. > > + /* If we are not worrying about NaNs expand to > + (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */ > + expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai, > + br, bi, &rr, &ri); > > I think the function is badly worded - this isn't about limited > ranges, no? Which > also means that we can dispatch to this simple variant not only for > flag_complex_method != 2 but for !HONOR_NANS && !HONOR_INFINITIES? > Maybe that should be done as followup. > Hmm, I've struggled a bit to come up with a good names. I've renamed it to expand_complex_multiplication_components. Is that more descriptive? I'm happy to enable this expansion path for !HONOR_NANS && !HONOR_INFINITIES as a separate patch. Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-unknown-linux-gnu. Thanks, Kyrill 2018-05-02 Kyrylo Tkachov PR tree-optimization/70291 * tree-complex.c (expand_complex_libcall): Add type, inplace_p arguments. Change return type to tree. Emit libcall as a new statement rather than replacing existing one when inplace_p is true. (expand_complex_multiplication_components): New function. (expand_complex_multiplication): Expand floating-point complex multiplication using the above. (expand_complex_division): Rename inner_type parameter to type. Update expand_complex_libcall call-site. (expand_complex_operations_1): Update expand_complex_multiplication and expand_complex_division call-sites. 2018-05-02 Kyrylo Tkachov PR tree-optimization/70291 * gcc.dg/complex-6.c: New test. * gcc.dg/complex-7.c: Likewise.