public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily
@ 2021-07-24 8:17 glisse at gcc dot gnu.org
2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-07-24 8:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
Bug ID: 101611
Summary: AVX2 vector arithmetic shift lowered to scalar
unnecessarily
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: glisse at gcc dot gnu.org
Target Milestone: ---
Target: x86_64-*-*
Stealing the example from PR 56873
#define SIZE 32
typedef long long veci __attribute__((vector_size(SIZE)));
veci f(veci a, veci b){
return a>>b;
}
but compiling with -O3 -mavx2 this time, gcc produces scalar code
vmovq %xmm1, %rcx
vmovq %xmm0, %rax
vpextrq $1, %xmm0, %rsi
sarq %cl, %rax
vextracti128 $0x1, %ymm0, %xmm0
vpextrq $1, %xmm1, %rcx
vextracti128 $0x1, %ymm1, %xmm1
movq %rax, %rdx
sarq %cl, %rsi
vmovq %xmm0, %rax
vmovq %xmm1, %rcx
vmovq %rdx, %xmm5
sarq %cl, %rax
vpextrq $1, %xmm1, %rcx
movq %rax, %rdi
vpextrq $1, %xmm0, %rax
vpinsrq $1, %rsi, %xmm5, %xmm0
sarq %cl, %rax
vmovq %rdi, %xmm4
vpinsrq $1, %rax, %xmm4, %xmm1
vinserti128 $0x1, %xmm1, %ymm0, %ymm0
ret
while clang outputs much shorter vector code
vpbroadcastq .LCPI0_0(%rip), %ymm2 # ymm2 =
[9223372036854775808,9223372036854775808,9223372036854775808,9223372036854775808]
vpsrlvq %ymm1, %ymm2, %ymm2
vpsrlvq %ymm1, %ymm0, %ymm0
vpxor %ymm2, %ymm0, %ymm0
vpsubq %ymm2, %ymm0, %ymm0
retq
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
@ 2021-07-24 14:45 ` hjl.tools at gmail dot com
2021-07-24 15:43 ` jakub at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2021-07-24 14:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-07-24
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
This may be related to PR 101579.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
@ 2021-07-24 15:43 ` jakub at gcc dot gnu.org
2021-07-24 16:25 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-24 15:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
For arithmetic V[24]DImode >> scalar we have PR98856 in GCC 12, but indeed, for
arithmetic V[24]DImode >> V[24]DImode
logical ((x >> y) ^ (0x8000000000000000ULL >> y)) - (0x8000000000000000ULL >>
y)
can be used.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
2021-07-24 15:43 ` jakub at gcc dot gnu.org
@ 2021-07-24 16:25 ` jakub at gcc dot gnu.org
2021-07-26 12:53 ` jakub at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-24 16:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
--- gcc/config/i386/sse.md.jj 2021-07-22 12:37:20.439532859 +0200
+++ gcc/config/i386/sse.md 2021-07-24 18:03:07.328126900 +0200
@@ -20499,13 +20499,34 @@ (define_expand "vlshr<mode>3"
(match_operand:VI48_256 2 "nonimmediate_operand")))]
"TARGET_AVX2")
-(define_expand "vashr<mode>3"
- [(set (match_operand:VI8_256_512 0 "register_operand")
- (ashiftrt:VI8_256_512
- (match_operand:VI8_256_512 1 "register_operand")
- (match_operand:VI8_256_512 2 "nonimmediate_operand")))]
+(define_expand "vashrv8di3"
+ [(set (match_operand:V8DI 0 "register_operand")
+ (ashiftrt:V8DI
+ (match_operand:V8DI 1 "register_operand")
+ (match_operand:V8DI 2 "nonimmediate_operand")))]
"TARGET_AVX512F")
+(define_expand "vashrv4di3"
+ [(set (match_operand:V4DI 0 "register_operand")
+ (ashiftrt:V4DI
+ (match_operand:V4DI 1 "register_operand")
+ (match_operand:V4DI 2 "nonimmediate_operand")))]
+ "TARGET_AVX2"
+{
+ if (!TARGET_AVX512VL)
+ {
+ rtx mask = ix86_build_signbit_mask (V4DImode, 1, 0);
+ rtx t1 = gen_reg_rtx (V4DImode);
+ rtx t2 = gen_reg_rtx (V4DImode);
+ rtx t3 = gen_reg_rtx (V4DImode);
+ emit_insn (gen_vlshrv4di3 (t1, operands[1], operands[2]));
+ emit_insn (gen_vlshrv4di3 (t2, mask, operands[2]));
+ emit_insn (gen_xorv4di3 (t3, t1, t2));
+ emit_insn (gen_subv4di3 (operands[0], t3, t2));
+ DONE;
+ }
+})
+
(define_expand "vashr<mode>3"
[(set (match_operand:VI12_128 0 "register_operand")
(ashiftrt:VI12_128
@@ -20527,12 +20548,12 @@ (define_expand "vashr<mode>3"
}
})
-(define_expand "vashrv2di3<mask_name>"
+(define_expand "vashrv2di3"
[(set (match_operand:V2DI 0 "register_operand")
(ashiftrt:V2DI
(match_operand:V2DI 1 "register_operand")
(match_operand:V2DI 2 "nonimmediate_operand")))]
- "TARGET_XOP || TARGET_AVX512VL"
+ "TARGET_XOP || TARGET_AVX2"
{
if (TARGET_XOP)
{
@@ -20541,6 +20562,18 @@ (define_expand "vashrv2di3<mask_name>"
emit_insn (gen_xop_shav2di3 (operands[0], operands[1], neg));
DONE;
}
+ if (!TARGET_AVX512VL)
+ {
+ rtx mask = ix86_build_signbit_mask (V2DImode, 1, 0);
+ rtx t1 = gen_reg_rtx (V2DImode);
+ rtx t2 = gen_reg_rtx (V2DImode);
+ rtx t3 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_vlshrv2di3 (t1, operands[1], operands[2]));
+ emit_insn (gen_vlshrv2di3 (t2, mask, operands[2]));
+ emit_insn (gen_xorv2di3 (t3, t1, t2));
+ emit_insn (gen_subv2di3 (operands[0], t3, t2));
+ DONE;
+ }
})
(define_expand "vashrv4si3"
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (2 preceding siblings ...)
2021-07-24 16:25 ` jakub at gcc dot gnu.org
@ 2021-07-26 12:53 ` jakub at gcc dot gnu.org
2021-07-26 13:10 ` glisse at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 12:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 51205
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51205&action=edit
gcc12-pr101611.patch
Full untested fix.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (3 preceding siblings ...)
2021-07-26 12:53 ` jakub at gcc dot gnu.org
@ 2021-07-26 13:10 ` glisse at gcc dot gnu.org
2021-07-26 13:26 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-07-26 13:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #5 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #2)
> for arithmetic V[24]DImode >> V[24]DImode
> logical ((x >> y) ^ (0x8000000000000000ULL >> y)) - (0x8000000000000000ULL
> >> y)
> can be used.
I guess it would be complicated to try and implement this fallback strategy in
a generic way so other modes/targets could benefit.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (4 preceding siblings ...)
2021-07-26 13:10 ` glisse at gcc dot gnu.org
@ 2021-07-26 13:26 ` jakub at gcc dot gnu.org
2021-07-26 14:07 ` glisse at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 13:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I think except for x86 it is very unusual to support logical but not arithmetic
vector right shifts, are you aware of any other target that suffers from these?
Even vector by vector shifts are rare, if my grep doesn't miss anything, only
aarch64, arm, x86, mips, rs6000 and s390 has them. I've grepped tmp-mddump.md
for each of them and except for x86 where we have the known issues I only see
some weird vlshrti3 pattern that doesn't have vashrti3 counterpart, but the
vlshr<mode>3 and vashr<mode>3 optabs AFAIK should be used solely for vector
modes and nothing else.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (5 preceding siblings ...)
2021-07-26 13:26 ` jakub at gcc dot gnu.org
@ 2021-07-26 14:07 ` glisse at gcc dot gnu.org
2021-07-26 14:33 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2021-07-26 14:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #7 from Marc Glisse <glisse at gcc dot gnu.org> ---
The same strategy to implement arithmetic shift in terms of logical shift works
not just for vector>>vector but also vector>>scalar and scalar>>scalar. But it
is probably not worth the trouble indeed, especially since your target patch is
ready :-)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (6 preceding siblings ...)
2021-07-26 14:07 ` glisse at gcc dot gnu.org
@ 2021-07-26 14:33 ` jakub at gcc dot gnu.org
2021-07-28 8:53 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-26 14:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
That is true, but I think even for vector >> scalar and scalar >> scalar shifts
it will be quite rare to support logical and not support arithmetic shifts.
And on x86, as can be seen in the PR98856 changes, yes, this way of expressing
it is possible, but not always the shortest.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (7 preceding siblings ...)
2021-07-26 14:33 ` jakub at gcc dot gnu.org
@ 2021-07-28 8:53 ` cvs-commit at gcc dot gnu.org
2021-07-28 8:54 ` jakub at gcc dot gnu.org
2021-08-04 22:01 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-28 8:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:88d0f70a326eeb42b479aa537f8a81bf5a199346
commit r12-2557-g88d0f70a326eeb42b479aa537f8a81bf5a199346
Author: Jakub Jelinek <jakub@redhat.com>
Date: Wed Jul 28 10:52:51 2021 +0200
i386: Improve AVX2 expansion of vector >> vector DImode arithm. shifts
[PR101611]
AVX2 introduced vector >> vector shifts, but unfortunately for V{2,4}DImode
it only supports logical and not arithmetic shifts, only AVX512F for
V8DImode or AVX512VL for V{2,4}DImode fixed that omission.
Earlier in GCC12 cycle I've committed vector >> scalar arithmetic shift
emulation using various sequences, this patch handles the vector >> vector
case. No need to adjust costs, the previous cost adjustment actually
covers even the vector by vector shifts.
The patch emits the right arithmetic V{2,4}DImode shifts using 2 logical
right
V{2,4}DImode shifts (once of the original operands, once of sign mask
constant by the vector shift count), xor and subtraction, on each element
(long long) x >> y is done as
(((unsigned long long) x >> y) ^ (0x8000000000000000ULL >> y))
- (0x8000000000000000ULL >> y)
i.e. if x doesn't have in some element the MSB set, it is just the logical
shift, if it does, then the xor and subtraction cause also all higher bits
to be set.
2021-07-28 Jakub Jelinek <jakub@redhat.com>
PR target/101611
* config/i386/sse.md (vashr<mode>3): Split into vashrv8di3 expander
and vashrv4di3 expander, where the latter requires just TARGET_AVX2
and has special !TARGET_AVX512VL expansion.
(vashrv2di3<mask_name>): Rename to ...
(vashrv2di3): ... this. Change condition to TARGET_XOP ||
TARGET_AVX2
and add special !TARGET_XOP && !TARGET_AVX512VL expansion.
* gcc.target/i386/avx2-pr101611-1.c: New test.
* gcc.target/i386/avx2-pr101611-2.c: New test.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (8 preceding siblings ...)
2021-07-28 8:53 ` cvs-commit at gcc dot gnu.org
@ 2021-07-28 8:54 ` jakub at gcc dot gnu.org
2021-08-04 22:01 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-07-28 8:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/101611] AVX2 vector arithmetic shift lowered to scalar unnecessarily
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
` (9 preceding siblings ...)
2021-07-28 8:54 ` jakub at gcc dot gnu.org
@ 2021-08-04 22:01 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-04 22:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101611
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-08-04 22:01 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-24 8:17 [Bug target/101611] New: AVX2 vector arithmetic shift lowered to scalar unnecessarily glisse at gcc dot gnu.org
2021-07-24 14:45 ` [Bug target/101611] " hjl.tools at gmail dot com
2021-07-24 15:43 ` jakub at gcc dot gnu.org
2021-07-24 16:25 ` jakub at gcc dot gnu.org
2021-07-26 12:53 ` jakub at gcc dot gnu.org
2021-07-26 13:10 ` glisse at gcc dot gnu.org
2021-07-26 13:26 ` jakub at gcc dot gnu.org
2021-07-26 14:07 ` glisse at gcc dot gnu.org
2021-07-26 14:33 ` jakub at gcc dot gnu.org
2021-07-28 8:53 ` cvs-commit at gcc dot gnu.org
2021-07-28 8:54 ` jakub at gcc dot gnu.org
2021-08-04 22:01 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).