public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/104049] New: [12 Regression] vec_select to subreg lowering causes superfluous moves
@ 2022-01-16 12:19 tnfchris at gcc dot gnu.org
  2022-01-17  2:20 ` [Bug rtl-optimization/104049] " pinskia at gcc dot gnu.org
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-01-16 12:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104049

            Bug ID: 104049
           Summary: [12 Regression] vec_select to subreg lowering causes
                    superfluous moves
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-*

Consider:

int test (uint8_t *p, uint32_t t[1][1], int n) {

  int sum = 0;
  uint32_t a0;
  for (int i = 0; i < 4; i++, p++)
    t[i][0] = p[0];

  for (int i = 0; i < 4; i++) {
    {
      int t0 = t[0][i] + t[0][i];
      a0 = t0;
    };
    sum += a0;
  }
  return (((uint16_t)sum) + ((uint32_t)sum >> 16)) >> 1;
}

Which after the reduction gets SLP'd used to generate at -O3

        addv    s0, v0.4s
        fmov    w0, s0
        lsr     w1, w0, 16
        add     w0, w1, w0, uxth
        lsr     w0, w0, 1

which was pretty good. However in GCC 12 we now generate worse code:

        addv    s0, v0.4s
        fmov    w0, s0
        fmov    w1, s0
        and     w0, w0, 65535
        add     w0, w0, w1, lsr 16
        lsr     w0, w0, 1

Notice the double transfer of the same value.

This is because at the RTL level the original mov becomes a vec_select

(insn 19 18 20 2 (set (reg:SI 102 [ _43 ])
        (vec_select:SI (reg:V4SI 117)
            (parallel [
                    (const_int 0 [0])
                ]))) -1
     (nil))

which previously stayed as a vec_select and the RA would use this pattern for
the w -> r move.

Now however this vec_select gets transformed into a subreg 0, which causes
combine to push the subreg into each instruction using reg 102.

(insn 21 18 22 2 (set (reg:SI 120)
        (and:SI (subreg:SI (reg:V4SI 117) 0)
            (const_int 65535 [0xffff]))) "/app/example.c":30:27 492 {andsi3}
     (nil))
(insn 22 21 28 2 (set (reg:SI 121)
        (plus:SI (lshiftrt:SI (subreg:SI (reg:V4SI 117) 0)
                (const_int 16 [0x10]))
            (reg:SI 120))) "/app/example.c":30:27 211 {*add_lsr_si}
     (expr_list:REG_DEAD (reg:SI 120)
        (expr_list:REG_DEAD (reg:V4SI 117)
            (nil))))

and because these operations don't exist on the w side, reload is forced to
materialized many duplicate moves from w -> r.  So every operation that gets
the subreg pushed into it for which we don't have an operation for on the w
side gets an extra move.

Aside from that, we seem to lose that the & can be folded into the subreg by
simply truncating the subreg from SI to HI and zero extending that out.

A different reproducer is

#include <arm_neon.h>

typedef int v4si __attribute__ ((vector_size (16)));

int bar (v4si x)
{
  unsigned int sum = vaddvq_s32 (x);
  return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16));
}

Note that using -frename-registers does get us to an optimal sequence here
which is better than GCC 11.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-05-21  9:11 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-16 12:19 [Bug rtl-optimization/104049] New: [12 Regression] vec_select to subreg lowering causes superfluous moves tnfchris at gcc dot gnu.org
2022-01-17  2:20 ` [Bug rtl-optimization/104049] " pinskia at gcc dot gnu.org
2022-01-18 13:48 ` rguenth at gcc dot gnu.org
2022-01-18 17:03 ` vmakarov at gcc dot gnu.org
2022-01-18 17:39 ` tnfchris at gcc dot gnu.org
2022-01-19  3:28 ` pinskia at gcc dot gnu.org
2022-01-19  3:30 ` pinskia at gcc dot gnu.org
2022-01-19  3:39 ` [Bug target/104049] " pinskia at gcc dot gnu.org
2022-02-01 11:26 ` tnfchris at gcc dot gnu.org
2022-03-25 12:00 ` jakub at gcc dot gnu.org
2022-03-25 14:00 ` jakub at gcc dot gnu.org
2022-03-25 14:42 ` tnfchris at gcc dot gnu.org
2022-03-25 14:50 ` tnfchris at gcc dot gnu.org
2022-03-25 14:52 ` tnfchris at gcc dot gnu.org
2022-04-01 11:29 ` rsandifo at gcc dot gnu.org
2022-04-04 14:06 ` tnfchris at gcc dot gnu.org
2022-04-07  7:29 ` cvs-commit at gcc dot gnu.org
2022-04-07  7:32 ` tnfchris at gcc dot gnu.org
2023-04-26  6:55 ` [Bug target/104049] [12/13/14 " rguenth at gcc dot gnu.org
2023-07-27  9:22 ` rguenth at gcc dot gnu.org
2024-02-21  6:37 ` pinskia at gcc dot gnu.org
2024-05-21  9:10 ` [Bug target/104049] [12/13/14/15 " jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).