public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110780] New: aarch64 NEON redundant displaced ld3
@ 2023-07-23 20:58 nate at thatsmathematics dot com
  2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
  2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: nate at thatsmathematics dot com @ 2023-07-23 20:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780

            Bug ID: 110780
           Summary: aarch64 NEON redundant displaced ld3
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nate at thatsmathematics dot com
  Target Milestone: ---

Compile the following with gcc 14.0.0 20230723 on aarch64 with -O3:

#include <stdint.h>
void CSI2toBE12(uint8_t* pCSI2, uint8_t* pBE, uint8_t* pCSI2LineEnd)
{
    while (pCSI2 < pCSI2LineEnd) {
        pBE[0] = pCSI2[0];
        pBE[1] = ((pCSI2[2] & 0xf) << 4) | (pCSI2[1] >> 4);
        pBE[2] = ((pCSI2[1] & 0xf) << 4) | (pCSI2[2] >> 4);
        pCSI2 += 3;
        pBE += 3;
    }
}

Godbolt: https://godbolt.org/z/WshTPKzY5

In the inner loop (.L5 of the godbolt asm) we have

        ld3     {v25.16b - v27.16b}, [x3]
        add     x6, x3, 1
        // no intervening stores
        ld3     {v25.16b - v27.16b}, [x6]

The second load is redundant.  v25, v26 are the same as what was already in
v26, v27 respectively.  The value loaded into v27 is new but it is not used in
the subsequent code.

This might also account for some extra later complexity, because it means that
the last 48 bytes of the input can't be handled by this loop (or else the
second load would be out of bounds by one byte) and so must be handled
specially.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/110780] aarch64 NEON redundant displaced ld3
  2023-07-23 20:58 [Bug target/110780] New: aarch64 NEON redundant displaced ld3 nate at thatsmathematics dot com
@ 2023-07-23 21:05 ` pinskia at gcc dot gnu.org
  2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-23 21:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
          Component|target                      |tree-optimization
   Last reconfirmed|                            |2023-07-23
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The vectorizer produces:
  vectp_pBE.28_115 = (unsigned char[48] *) ivtmp.73_338;
  _218 = ivtmp.68_335 + 1;
  vectp_pCSI2.19_107 = (unsigned char[48] *) _218;
  vectp_pCSI2.10_98 = (unsigned char[48] *) ivtmp.68_335;
  vect_array.12 = .LOAD_LANES (MEM <unsigned char[48]> [(uint8_t
*)vectp_pCSI2.10_98]);
  vect__1.13_100 = vect_array.12[0];
  vect__1.14_101 = vect_array.12[1];
  vect__1.15_102 = vect_array.12[2];
  vect_array.12 ={v} {CLOBBER};
  vect__4.16_103 = vect__1.14_101 >> 4;
  vect__22.17_104 = vect__1.15_102 << 4;
  vect__5.18_105 = vect__4.16_103 | vect__22.17_104;
  vect_array.21 = .LOAD_LANES (MEM <unsigned char[48]> [(uint8_t
*)vectp_pCSI2.19_107]);
  vect__6.22_109 = vect_array.21[0];
  vect__6.23_110 = vect_array.21[1];
  vect_array.21 ={v} {CLOBBER};

Here vect__6.22_109 is the same as vect__1.14_101 and vect__6.23_110 is the
same as vect__1.15_102 (if I did this correctly).

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug tree-optimization/110780] aarch64 NEON redundant displaced ld3
  2023-07-23 20:58 [Bug target/110780] New: aarch64 NEON redundant displaced ld3 nate at thatsmathematics dot com
  2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
@ 2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-07-24 19:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I guess the problem is that we can only remove the “redundant” loads once we've
added the versioning check between pBE and pCSI2.  We could remove the
redundancy after vectorisation, but it would be nice to do it during, not least
because it might improve costing.

Richard, do you remember an earlier PR for something similar?

We do remove the redundancy if the pointers are marked restrict.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-24 19:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-23 20:58 [Bug target/110780] New: aarch64 NEON redundant displaced ld3 nate at thatsmathematics dot com
2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
2023-07-24 19:12 ` rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).