Hi all, We can improve the performance of complex floating-point multiplications by inlining the expansion a bit more aggressively. We can inline complex x = a * b as: x = (ar*br - ai*bi) + i(ar*bi + br*ai); if (isunordered (__real__ x, __imag__ x)) x = __muldc3 (a, b); //Or __mulsc3 for single-precision That way the common case where no NaNs are produced we can avoid the libgcc call and fall back to the NaN handling stuff in libgcc if either components of the expansion are NaN. The implementation is done in expand_complex_multiplication in tree-complex.c and the above expansion will be done when optimising for -O1 and greater and when not optimising for size. At -O0 and -Os the single call to libgcc will be emitted. For the code: __complex double foo (__complex double a, __complex double b) { return a * b; } We will now emit at -O2 for aarch64: foo: fmul d16, d1, d3 fmul d6, d1, d2 fnmsub d5, d0, d2, d16 fmadd d4, d0, d3, d6 fcmp d5, d4 bvs .L8 fmov d1, d4 fmov d0, d5 ret .L8: stp x29, x30, [sp, -16]! mov x29, sp bl __muldc3 ldp x29, x30, [sp], 16 ret Instead of just a branch to __muldc3. Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-unknown-linux-gnu. Ok for trunk? (GCC 9) Thanks, Kyrill 2018-04-30 Kyrylo Tkachov PR tree-optimization/70291 * tree-complex.c (insert_complex_mult_libcall): New function. (expand_complex_multiplication_limited_range): Likewise. (expand_complex_multiplication): Expand floating-point complex multiplication using the above. 2018-04-30 Kyrylo Tkachov PR tree-optimization/70291 * gcc.dg/complex-6.c: New test. * gcc.dg/complex-7.c: Likewise.