public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110780] New: aarch64 NEON redundant displaced ld3
@ 2023-07-23 20:58 nate at thatsmathematics dot com
2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: nate at thatsmathematics dot com @ 2023-07-23 20:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780
Bug ID: 110780
Summary: aarch64 NEON redundant displaced ld3
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: nate at thatsmathematics dot com
Target Milestone: ---
Compile the following with gcc 14.0.0 20230723 on aarch64 with -O3:
#include <stdint.h>
void CSI2toBE12(uint8_t* pCSI2, uint8_t* pBE, uint8_t* pCSI2LineEnd)
{
while (pCSI2 < pCSI2LineEnd) {
pBE[0] = pCSI2[0];
pBE[1] = ((pCSI2[2] & 0xf) << 4) | (pCSI2[1] >> 4);
pBE[2] = ((pCSI2[1] & 0xf) << 4) | (pCSI2[2] >> 4);
pCSI2 += 3;
pBE += 3;
}
}
Godbolt: https://godbolt.org/z/WshTPKzY5
In the inner loop (.L5 of the godbolt asm) we have
ld3 {v25.16b - v27.16b}, [x3]
add x6, x3, 1
// no intervening stores
ld3 {v25.16b - v27.16b}, [x6]
The second load is redundant. v25, v26 are the same as what was already in
v26, v27 respectively. The value loaded into v27 is new but it is not used in
the subsequent code.
This might also account for some extra later complexity, because it means that
the last 48 bytes of the input can't be handled by this loop (or else the
second load would be out of bounds by one byte) and so must be handled
specially.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/110780] aarch64 NEON redundant displaced ld3
2023-07-23 20:58 [Bug target/110780] New: aarch64 NEON redundant displaced ld3 nate at thatsmathematics dot com
@ 2023-07-23 21:05 ` pinskia at gcc dot gnu.org
2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-23 21:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
Component|target |tree-optimization
Last reconfirmed| |2023-07-23
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The vectorizer produces:
vectp_pBE.28_115 = (unsigned char[48] *) ivtmp.73_338;
_218 = ivtmp.68_335 + 1;
vectp_pCSI2.19_107 = (unsigned char[48] *) _218;
vectp_pCSI2.10_98 = (unsigned char[48] *) ivtmp.68_335;
vect_array.12 = .LOAD_LANES (MEM <unsigned char[48]> [(uint8_t
*)vectp_pCSI2.10_98]);
vect__1.13_100 = vect_array.12[0];
vect__1.14_101 = vect_array.12[1];
vect__1.15_102 = vect_array.12[2];
vect_array.12 ={v} {CLOBBER};
vect__4.16_103 = vect__1.14_101 >> 4;
vect__22.17_104 = vect__1.15_102 << 4;
vect__5.18_105 = vect__4.16_103 | vect__22.17_104;
vect_array.21 = .LOAD_LANES (MEM <unsigned char[48]> [(uint8_t
*)vectp_pCSI2.19_107]);
vect__6.22_109 = vect_array.21[0];
vect__6.23_110 = vect_array.21[1];
vect_array.21 ={v} {CLOBBER};
Here vect__6.22_109 is the same as vect__1.14_101 and vect__6.23_110 is the
same as vect__1.15_102 (if I did this correctly).
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/110780] aarch64 NEON redundant displaced ld3
2023-07-23 20:58 [Bug target/110780] New: aarch64 NEON redundant displaced ld3 nate at thatsmathematics dot com
2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
@ 2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-07-24 19:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I guess the problem is that we can only remove the “redundant” loads once we've
added the versioning check between pBE and pCSI2. We could remove the
redundancy after vectorisation, but it would be nice to do it during, not least
because it might improve costing.
Richard, do you remember an earlier PR for something similar?
We do remove the redundancy if the pointers are marked restrict.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-07-24 19:12 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-23 20:58 [Bug target/110780] New: aarch64 NEON redundant displaced ld3 nate at thatsmathematics dot com
2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
2023-07-24 19:12 ` rsandifo at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).