public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/98986] New: Try matching both orders of commutative RTX operations when there is no canonical order
@ 2021-02-07 18:07 ktkachov at gcc dot gnu.org
  2021-02-08  9:20 ` [Bug rtl-optimization/98986] " rguenth at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2021-02-07 18:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98986

            Bug ID: 98986
           Summary: Try matching both orders of commutative RTX operations
                    when there is no canonical order
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

The motivating aarch64 testcase is this:

#include <arm_neon.h>
int32x4_t
foo (int16x4_t a, int16x4_t b)
{
  int16x4_t tmp = vdup_n_s16 (vget_lane_s16 (b, 3));

  return vmull_s16 (tmp, a);
}

int32x4_t
foo2 (int16x4_t a, int16x4_t b)
{
  int16x4_t tmp = vdup_n_s16 (vget_lane_s16 (b, 3));

  return vmull_s16 (a, tmp);
}

Both functions should generate the widening-mult-by-lane form:
        smull   v0.4s, v0.4h, v1.h[3]   // 13   [c=16 l=4] 
aarch64_vec_smult_lane_v4hi

However only the second function foo2 manages to match it.
We have a pattern for this in aarch64-simd.md:
(define_insn "aarch64_vec_<su>mult_lane<Qlane>"
  [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
        (mult:<VWIDE>
          (ANY_EXTEND:<VWIDE>
            (match_operand:<VCOND> 1 "register_operand" "w"))
          (ANY_EXTEND:<VWIDE>
            (vec_duplicate:<VCOND>
              (vec_select:<VEL>
                (match_operand:VDQHS 2 "register_operand" "<vwx>")
                (parallel [(match_operand:SI 3 "immediate_operand" "i")]))))))]
  "TARGET_SIMD"
  {
    operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
    return "<su>mull\\t%0.<Vwtype>, %1.<Vcondtype>, %2.<Vetype>[%3]";
  }
  [(set_attr "type" "neon_mul_<Vetype>_scalar_long")]
)

For foo combine tries and fails to match the vec_select in the first arm of the
mult:
(set (reg:V4SI 93 [ <retval> ])
    (mult:V4SI (sign_extend:V4SI (vec_duplicate:V4HI (vec_select:HI (reg:V4HI
99)
                    (parallel:V4HI [
                            (const_int 3 [0x3])
                        ]))))
        (sign_extend:V4SI (reg:V4HI 98))))

Unfortunately, due to the sign_extends on both arm of the mult there is no
canonical order for these expressions as both arms of the MULT are RTX_UNARY
expressions and swap_commutative_operands_p doesn't try to swap them around.
I guess we can work around this by adding more patterns in the backend to match
the two different orders we can get in this situation, but we've got 
so many similar patterns in the backend...

Do you think it's feasible to get recog or combine to try out both permutations
of such commutative operations when matching without blowing up compile time?
Any other ideas for resolving this are welcome

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-02-10 17:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-07 18:07 [Bug rtl-optimization/98986] New: Try matching both orders of commutative RTX operations when there is no canonical order ktkachov at gcc dot gnu.org
2021-02-08  9:20 ` [Bug rtl-optimization/98986] " rguenth at gcc dot gnu.org
2021-02-08 11:00 ` segher at gcc dot gnu.org
2021-02-10 12:23 ` rsandifo at gcc dot gnu.org
2021-02-10 12:27 ` rguenther at suse dot de
2021-02-10 16:53 ` segher at gcc dot gnu.org
2021-02-10 17:02 ` segher at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).