* [PATCH 0/3][x86] Enable pass_late_combine for x86.
@ 2024-06-28 5:27 liuhongt
2024-06-28 5:27 ` [PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables liuhongt
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: liuhongt @ 2024-06-28 5:27 UTC (permalink / raw)
To: gcc-patches; +Cc: ubizjak
Because of the issue described in PR115610, late_combine is disabled by
default.The series try to solve the regressions and enable late_combine.
There're 4 regressions observed.
1. The first one is related to pass_stv2, because late_combine will restore
transformation did in the pass. Move the pass after pass_late_combine can
solve the issue.
2. The second one is related to pass_rpad, both pre_reload and post_reload
late_combine would restore the transformation. So besides moving pass_rpad
after pre_reload late_combine, target_insn_cost is defined to prevent
post_reload pass_late_combine to revert the optimziation did in pass_rpad.
3. The third one is related to avx512 kmask, lshirt + zero_extend are combined
into *<insn>si3_zext which doesn't support k alternative, and an extra move
between GPR and KMASK and regressed
gcc.target/i386/zero_extendkmask.c scan-assembler-not (?n)shr[bwl],
the solution is extending the pattern to ?k alternative just like what we did
before for other patterns.
4. The fourth one is fake, it's because pass_late_combine generates better code but
break scan assembly.
.i.e
Under 32-bit target, gcc used to generate broadcast from stack and
then do the real operation.
After enabling flate_combine, they're combined into embeded broadcast
operations.
Tested with SPEC2017, flate_combine reduces codesize by ~0.6%, which means
there're lots of small improvements.
Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
Ok for trunk?
liuhongt (3):
[avx512 testsuite] Define mask as extern instead of uninitialized
local variables.
Extend lshifrtsi3_1_zext to ?k alternative.
[x86] Enable flate-combine.
gcc/config/i386/i386-features.cc | 16 +++++++----
gcc/config/i386/i386-options.cc | 4 ---
gcc/config/i386/i386-passes.def | 4 +--
gcc/config/i386/i386-protos.h | 1 +
gcc/config/i386/i386.cc | 18 ++++++++++++
gcc/config/i386/i386.md | 19 +++++++++----
gcc/config/i386/sse.md | 28 +++++++++++++++++++
.../gcc.target/i386/avx512bitalg-vpopcntb.c | 3 +-
.../gcc.target/i386/avx512bitalg-vpopcntbvl.c | 4 +--
.../gcc.target/i386/avx512bitalg-vpopcntw.c | 2 +-
.../gcc.target/i386/avx512bitalg-vpopcntwvl.c | 4 +--
.../i386/avx512f-broadcast-pr87767-1.c | 4 +--
.../i386/avx512f-broadcast-pr87767-5.c | 1 -
.../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c | 2 +-
.../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c | 2 +-
.../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c | 2 +-
.../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c | 2 +-
.../i386/avx512vl-broadcast-pr87767-1.c | 4 +--
.../i386/avx512vl-broadcast-pr87767-5.c | 2 --
.../i386/avx512vpopcntdq-vpopcntd.c | 5 ++--
.../i386/avx512vpopcntdq-vpopcntq.c | 2 +-
gcc/testsuite/gcc.target/i386/pr91333.c | 2 +-
.../gcc.target/i386/vect-strided-4.c | 2 +-
23 files changed, 93 insertions(+), 40 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables.
2024-06-28 5:27 [PATCH 0/3][x86] Enable pass_late_combine for x86 liuhongt
@ 2024-06-28 5:27 ` liuhongt
2024-06-28 5:27 ` [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative liuhongt
2024-06-28 5:27 ` [PATCH 3/3] [x86] Enable flate-combine liuhongt
2 siblings, 0 replies; 6+ messages in thread
From: liuhongt @ 2024-06-28 5:27 UTC (permalink / raw)
To: gcc-patches; +Cc: ubizjak
The testcases are supposed to scan for vpopcnt{b,w,d,q} operations
with k mask, but mask is defined as uninitialized local variable which
will be set as 0 at rtl expand phase.
And it's further simplified off by late_combine which caused scan assembly failure.
Move the definition of mask outside to make the testcases more stable.
gcc/testsuite/ChangeLog:
PR target/115610
* gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as
extern instead of uninitialized local variables.
* gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto.
* gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto.
* gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.
---
gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c | 3 +--
gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c | 4 ++--
gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c | 2 +-
gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c | 4 ++--
gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c | 5 +++--
gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c | 2 +-
6 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
index 44b82c0519d..66d24107c26 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c
@@ -7,10 +7,9 @@
#include <x86intrin.h>
extern __m512i z, z1;
-
+extern __mmask16 msk;
int foo ()
{
- __mmask16 msk;
__m512i c = _mm512_popcnt_epi8 (z);
asm volatile ("" : "+v" (c));
c = _mm512_mask_popcnt_epi8 (z1, msk, z);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
index 8c2dfaba9c6..8ab05653f7c 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c
@@ -11,11 +11,11 @@
extern __m256i y, y_1;
extern __m128i x, x_1;
+extern __mmask32 msk32;
+extern __mmask16 msk16;
int foo ()
{
- __mmask32 msk32;
- __mmask16 msk16;
__m256i c256 = _mm256_popcnt_epi8 (y);
asm volatile ("" : "+v" (c256));
c256 = _mm256_mask_popcnt_epi8 (y_1, msk32, y);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
index 2ef8589f6c1..c741bf48a51 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c
@@ -7,10 +7,10 @@
#include <x86intrin.h>
extern __m512i z, z1;
+extern __mmask16 msk;
int foo ()
{
- __mmask16 msk;
__m512i c = _mm512_popcnt_epi16 (z);
asm volatile ("" : "+v" (c));
c = _mm512_mask_popcnt_epi16 (z1, msk, z);
diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
index c976461b12e..79bb3c31e85 100644
--- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
+++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c
@@ -11,11 +11,11 @@
extern __m256i y, y_1;
extern __m128i x, x_1;
+extern __mmask16 msk16;
+extern __mmask8 msk8;
int foo ()
{
- __mmask16 msk16;
- __mmask8 msk8;
__m256i c256 = _mm256_popcnt_epi16 (y);
asm volatile ("" : "+v" (c256));
c256 = _mm256_mask_popcnt_epi16 (y_1, msk16, y);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
index b4d82f97032..776a4753d8e 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c
@@ -15,11 +15,12 @@
extern __m128i x, x_1;
extern __m256i y, y_1;
extern __m512i z, z_1;
+extern __mmask16 msk;
+extern __mmask8 msk8;
+
int foo ()
{
- __mmask16 msk;
- __mmask8 msk8;
__m128i a = _mm_popcnt_epi32 (x);
asm volatile ("" : "+v" (a));
a = _mm_mask_popcnt_epi32 (x_1, msk8, x);
diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
index e87d6c999b6..c6314ac5deb 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c
@@ -15,10 +15,10 @@
extern __m128i x, x_1;
extern __m256i y, y_1;
extern __m512i z, z_1;
+extern __mmask8 msk;
int foo ()
{
- __mmask8 msk;
__m128i a = _mm_popcnt_epi64 (x);
asm volatile ("" : "+v" (a));
a = _mm_mask_popcnt_epi64 (x_1, msk, x);
--
2.31.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative.
2024-06-28 5:27 [PATCH 0/3][x86] Enable pass_late_combine for x86 liuhongt
2024-06-28 5:27 ` [PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables liuhongt
@ 2024-06-28 5:27 ` liuhongt
2024-06-28 5:59 ` Uros Bizjak
2024-06-28 5:27 ` [PATCH 3/3] [x86] Enable flate-combine liuhongt
2 siblings, 1 reply; 6+ messages in thread
From: liuhongt @ 2024-06-28 5:27 UTC (permalink / raw)
To: gcc-patches; +Cc: ubizjak
late_combine will combine lshift + zero into *lshifrtsi3_1_zext which
cause extra mov between gpr and kmask, add ?k to the pattern.
gcc/ChangeLog:
PR target/115610
* config/i386/i386.md (<*insnsi3_zext): Add alternative ?k,
enable it only for lshiftrt and under avx512bw.
* config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and
add corresponding define_split after it.
---
gcc/config/i386/i386.md | 19 +++++++++++++------
gcc/config/i386/sse.md | 28 ++++++++++++++++++++++++++++
2 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index fd48e764469..57a10c1af48 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -16836,10 +16836,10 @@ (define_insn "*bmi2_<insn>si3_1_zext"
(set_attr "mode" "SI")])
(define_insn "*<insn>si3_1_zext"
- [(set (match_operand:DI 0 "register_operand" "=r,r,r")
+ [(set (match_operand:DI 0 "register_operand" "=r,r,r,?k")
(zero_extend:DI
- (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
- (match_operand:QI 2 "nonmemory_operand" "cI,r,cI"))))
+ (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm,k")
+ (match_operand:QI 2 "nonmemory_operand" "cI,r,cI,I"))))
(clobber (reg:CC FLAGS_REG))]
"TARGET_64BIT
&& ix86_binary_operator_ok (<CODE>, SImode, operands, TARGET_APX_NDD)"
@@ -16850,6 +16850,8 @@ (define_insn "*<insn>si3_1_zext"
case TYPE_ISHIFTX:
return "#";
+ case TYPE_MSKLOG:
+ return "#";
default:
if (operands[2] == const1_rtx
&& (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
@@ -16860,8 +16862,8 @@ (define_insn "*<insn>si3_1_zext"
: "<shift>{l}\t{%2, %k0|%k0, %2}";
}
}
- [(set_attr "isa" "*,bmi2,apx_ndd")
- (set_attr "type" "ishift,ishiftx,ishift")
+ [(set_attr "isa" "*,bmi2,apx_ndd,avx512bw")
+ (set_attr "type" "ishift,ishiftx,ishift,msklog")
(set (attr "length_immediate")
(if_then_else
(and (match_operand 2 "const1_operand")
@@ -16869,7 +16871,12 @@ (define_insn "*<insn>si3_1_zext"
(match_test "optimize_function_for_size_p (cfun)")))
(const_string "0")
(const_string "*")))
- (set_attr "mode" "SI")])
+ (set_attr "mode" "SI")
+ (set (attr "enabled")
+ (if_then_else
+ (eq_attr "alternative" "3")
+ (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512BW")
+ (const_string "*")))])
;; Convert shift to the shiftx pattern to avoid flags dependency.
(define_split
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0be2dcd8891..20665a6f097 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -2179,6 +2179,34 @@ (define_split
(match_dup 2)))
(unspec [(const_int 0)] UNSPEC_MASKOP)])])
+(define_insn "*klshrsi3_1_zext"
+ [(set (match_operand:DI 0 "register_operand" "=k")
+ (zero_extend:DI
+ (lshiftrt:SI (match_operand:SI 1 "register_operand" "k")
+ (match_operand 2 "const_0_to_31_operand" "I"))))
+ (unspec [(const_int 0)] UNSPEC_MASKOP)]
+ "TARGET_AVX512BW"
+ "kshiftrd\t{%2, %1, %0|%0, %1, %2}"
+ [(set_attr "type" "msklog")
+ (set_attr "prefix" "vex")
+ (set_attr "mode" "SI")])
+
+(define_split
+ [(set (match_operand:DI 0 "mask_reg_operand")
+ (zero_extend:DI
+ (lshiftrt:SI
+ (match_operand:SI 1 "mask_reg_operand")
+ (match_operand 2 "const_0_to_31_operand"))))
+ (clobber (reg:CC FLAGS_REG))]
+ "TARGET_AVX512BW && reload_completed"
+ [(parallel
+ [(set (match_dup 0)
+ (zero_extend:DI
+ (lshiftrt:SI
+ (match_dup 1)
+ (match_dup 2))))
+ (unspec [(const_int 0)] UNSPEC_MASKOP)])])
+
(define_insn "ktest<mode>"
[(set (reg:CC FLAGS_REG)
(unspec:CC
--
2.31.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3] [x86] Enable flate-combine.
2024-06-28 5:27 [PATCH 0/3][x86] Enable pass_late_combine for x86 liuhongt
2024-06-28 5:27 ` [PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables liuhongt
2024-06-28 5:27 ` [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative liuhongt
@ 2024-06-28 5:27 ` liuhongt
2024-06-28 6:03 ` Uros Bizjak
2 siblings, 1 reply; 6+ messages in thread
From: liuhongt @ 2024-06-28 5:27 UTC (permalink / raw)
To: gcc-patches; +Cc: ubizjak
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
define target_insn_cost to prevent post_reload pass_late_combine to
revert the optimziation did in pass_rpad.
Adjust testcases since pass_late_combine generates better code but
break scan assembly.
.i.e
Under 32-bit target, gcc used to generate broadcast from stack and
then do the real operation.
After flate_combine, they're combined into embeded broadcast
operations.
gcc/ChangeLog:
* config/i386/i386-features.cc (ix86_rpad_gate): New function.
* config/i386/i386-options.cc (ix86_override_options_after_change):
Don't disable flate_combine.
* config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
after pre_reload pas_late_combine.
* config/i386/i386-protos.h (ix86_rpad_gate): New declare.
* config/i386/i386.cc (ix86_insn_cost): New function.
(TARGET_INSN_COST): Define.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus
testcase.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto.
* gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto.
* gcc.target/i386/pr91333.c: Ditto.
* gcc.target/i386/vect-strided-4.c: Ditto.
---
gcc/config/i386/i386-features.cc | 16 +++++++++++-----
gcc/config/i386/i386-options.cc | 4 ----
gcc/config/i386/i386-passes.def | 4 ++--
gcc/config/i386/i386-protos.h | 1 +
gcc/config/i386/i386.cc | 18 ++++++++++++++++++
.../i386/avx512f-broadcast-pr87767-1.c | 4 ++--
.../i386/avx512f-broadcast-pr87767-5.c | 1 -
.../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c | 2 +-
.../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c | 2 +-
.../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c | 2 +-
.../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c | 2 +-
.../i386/avx512vl-broadcast-pr87767-1.c | 4 ++--
.../i386/avx512vl-broadcast-pr87767-5.c | 2 --
gcc/testsuite/gcc.target/i386/pr91333.c | 2 +-
gcc/testsuite/gcc.target/i386/vect-strided-4.c | 2 +-
15 files changed, 42 insertions(+), 24 deletions(-)
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 607d1991460..fc224ed06b0 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -2995,6 +2995,16 @@ make_pass_insert_endbr_and_patchable_area (gcc::context *ctxt)
return new pass_insert_endbr_and_patchable_area (ctxt);
}
+bool
+ix86_rpad_gate ()
+{
+ return (TARGET_AVX
+ && TARGET_SSE_PARTIAL_REG_DEPENDENCY
+ && TARGET_SSE_MATH
+ && optimize
+ && optimize_function_for_speed_p (cfun));
+}
+
/* At entry of the nearest common dominator for basic blocks with
conversions/rcp/sqrt/rsqrt/round, generate a single
vxorps %xmmN, %xmmN, %xmmN
@@ -3232,11 +3242,7 @@ public:
/* opt_pass methods: */
bool gate (function *) final override
{
- return (TARGET_AVX
- && TARGET_SSE_PARTIAL_REG_DEPENDENCY
- && TARGET_SSE_MATH
- && optimize
- && optimize_function_for_speed_p (cfun));
+ return ix86_rpad_gate ();
}
unsigned int execute (function *) final override
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 9c12d498928..1ef2c71a7a2 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -1944,10 +1944,6 @@ ix86_override_options_after_change (void)
flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
}
- /* Late combine tends to undo some of the effects of STV and RPAD,
- by combining instructions back to their original form. */
- if (!OPTION_SET_P (flag_late_combine_instructions))
- flag_late_combine_instructions = 0;
}
/* Clear stack slot assignments remembered from previous functions.
diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def
index 7d96766f7b9..2d29f65da88 100644
--- a/gcc/config/i386/i386-passes.def
+++ b/gcc/config/i386/i386-passes.def
@@ -25,11 +25,11 @@ along with GCC; see the file COPYING3. If not see
*/
INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
- INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
+ INSERT_PASS_AFTER (pass_late_combine, 1, pass_stv, false /* timode_p */);
/* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and
CONSTM1_RTX generated by the STV pass can be CSEed. */
INSERT_PASS_BEFORE (pass_cse2, 1, pass_stv, true /* timode_p */);
INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbr_and_patchable_area);
- INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency);
+ INSERT_PASS_AFTER (pass_late_combine, 1, pass_remove_partial_avx_dependency);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 4f48dc0bf75..3dbd18dc70b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -422,6 +422,7 @@ extern rtl_opt_pass *make_pass_remove_partial_avx_dependency
(gcc::context *);
extern bool ix86_has_no_direct_extern_access;
+extern bool ix86_rpad_gate ();
/* In i386-expand.cc. */
bool ix86_check_builtin_isa_match (unsigned int, HOST_WIDE_INT*,
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 1f71ed04be6..9d2b7d1f174 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21371,6 +21371,22 @@ ix86_shift_rotate_cost (const struct processor_costs *cost,
}
}
+static int
+ix86_insn_cost (rtx_insn *insn, bool speed)
+{
+ int insn_cost = 0;
+ /* Add extra cost to avoid post_reload late_combine revert
+ the optimization did in pass_rpad. */
+ if (reload_completed
+ && ix86_rpad_gate ()
+ && recog_memoized (insn) >= 0
+ && get_attr_avx_partial_xmm_update (insn)
+ == AVX_PARTIAL_XMM_UPDATE_TRUE)
+ insn_cost += COSTS_N_INSNS (3);
+
+ return insn_cost + pattern_cost (PATTERN (insn), speed);
+}
+
/* Compute a (partial) cost for rtx X. Return true if the complete
cost has been computed, and false if subexpressions should be
scanned. In either case, *TOTAL contains the cost result. */
@@ -26514,6 +26530,8 @@ static const scoped_attribute_specs *const ix86_attribute_table[] =
#define TARGET_MEMORY_MOVE_COST ix86_memory_move_cost
#undef TARGET_RTX_COSTS
#define TARGET_RTX_COSTS ix86_rtx_costs
+#undef TARGET_INSN_COST
+#define TARGET_INSN_COST ix86_insn_cost
#undef TARGET_ADDRESS_COST
#define TARGET_ADDRESS_COST ix86_address_cost
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
index 138dbb4c973..3a50749e610 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
@@ -3,8 +3,8 @@
/* { dg-options "-O2 -mavx512f -mavx512dq" } */
/* { dg-additional-options "-fno-PIE" { target ia32 } } */
/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */
/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
index d22251bc2a3..ea2f64861d0 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
@@ -3,7 +3,6 @@
/* { dg-options "-O2 -mavx512f" } */
/* { dg-additional-options "-fno-PIE" { target ia32 } } */
/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to8\\\}" { target ia32 } } } */
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */
/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
index 8c117207efa..bbcc5ed0bec 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
@@ -1,6 +1,6 @@
/* { dg-do compile } */
/* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "vfmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
#define type __m512
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
index cc705af8ea5..fc72dd6e557 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
@@ -1,6 +1,6 @@
/* { dg-do compile } */
/* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "vfmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
#define type __m512
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
index db5c34678c0..342de482da8 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
@@ -1,6 +1,6 @@
/* { dg-do compile } */
/* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "vfnmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
#define type __m512
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
index 7815251b82d..f56a3f8acc4 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
@@ -1,6 +1,6 @@
/* { dg-do compile } */
/* { dg-options "-mavx512f -O2" } */
-/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "vfnmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
#define type __m512
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
index e6df4d25f36..08898445be5 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
@@ -3,8 +3,8 @@
/* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
/* { dg-additional-options "-fno-PIE" { target ia32 } } */
/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 } } */
-/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
index ebdc3619d8e..c57a2e29767 100644
--- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
@@ -3,8 +3,6 @@
/* { dg-options "-O2 -mavx512f -mavx512vl" } */
/* { dg-additional-options "-fno-PIE" { target ia32 } } */
/* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
-/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to2\\\}" { target ia32 } } } */
-/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to4\\\}" { target ia32 } } } */
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */
/* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr91333.c b/gcc/testsuite/gcc.target/i386/pr91333.c
index 2bdff871024..b4940b5c9ec 100644
--- a/gcc/testsuite/gcc.target/i386/pr91333.c
+++ b/gcc/testsuite/gcc.target/i386/pr91333.c
@@ -1,6 +1,6 @@
/* { dg-do compile { target { ! ia32 } } } */
/* { dg-options "-O2 -mavx" } */
-/* { dg-final { scan-assembler-times "vmovapd|vmovsd" 3 } } */
+/* { dg-final { scan-assembler-times "vmovapd|vmovsd" 2 } } */
static inline double g (double x){
asm volatile ("" : "+x" (x));
diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-4.c b/gcc/testsuite/gcc.target/i386/vect-strided-4.c
index dd922926a2a..3fb9f07886e 100644
--- a/gcc/testsuite/gcc.target/i386/vect-strided-4.c
+++ b/gcc/testsuite/gcc.target/i386/vect-strided-4.c
@@ -15,6 +15,6 @@ void foo (int * __restrict a, int * __restrict b, int *c, int s)
/* Vectorization factor two, two two-element stores to a using movq
and two two-element stores to b via pextrq/movhps of the high part. */
-/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-times "movq\[\t ]+%xmm\[0-9]" 2 } } */
/* { dg-final { scan-assembler-times "pextrq" 2 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "movhps" 2 { target { ia32 } } } } */
--
2.31.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative.
2024-06-28 5:27 ` [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative liuhongt
@ 2024-06-28 5:59 ` Uros Bizjak
0 siblings, 0 replies; 6+ messages in thread
From: Uros Bizjak @ 2024-06-28 5:59 UTC (permalink / raw)
To: liuhongt; +Cc: gcc-patches
On Fri, Jun 28, 2024 at 7:29 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> late_combine will combine lshift + zero into *lshifrtsi3_1_zext which
> cause extra mov between gpr and kmask, add ?k to the pattern.
>
> gcc/ChangeLog:
>
> PR target/115610
> * config/i386/i386.md (<*insnsi3_zext): Add alternative ?k,
> enable it only for lshiftrt and under avx512bw.
> * config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and
> add corresponding define_split after it.
OK.
Thanks,
Uros.
> ---
> gcc/config/i386/i386.md | 19 +++++++++++++------
> gcc/config/i386/sse.md | 28 ++++++++++++++++++++++++++++
> 2 files changed, 41 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index fd48e764469..57a10c1af48 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -16836,10 +16836,10 @@ (define_insn "*bmi2_<insn>si3_1_zext"
> (set_attr "mode" "SI")])
>
> (define_insn "*<insn>si3_1_zext"
> - [(set (match_operand:DI 0 "register_operand" "=r,r,r")
> + [(set (match_operand:DI 0 "register_operand" "=r,r,r,?k")
> (zero_extend:DI
> - (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm")
> - (match_operand:QI 2 "nonmemory_operand" "cI,r,cI"))))
> + (any_shiftrt:SI (match_operand:SI 1 "nonimmediate_operand" "0,rm,rm,k")
> + (match_operand:QI 2 "nonmemory_operand" "cI,r,cI,I"))))
> (clobber (reg:CC FLAGS_REG))]
> "TARGET_64BIT
> && ix86_binary_operator_ok (<CODE>, SImode, operands, TARGET_APX_NDD)"
> @@ -16850,6 +16850,8 @@ (define_insn "*<insn>si3_1_zext"
> case TYPE_ISHIFTX:
> return "#";
>
> + case TYPE_MSKLOG:
> + return "#";
> default:
> if (operands[2] == const1_rtx
> && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun))
> @@ -16860,8 +16862,8 @@ (define_insn "*<insn>si3_1_zext"
> : "<shift>{l}\t{%2, %k0|%k0, %2}";
> }
> }
> - [(set_attr "isa" "*,bmi2,apx_ndd")
> - (set_attr "type" "ishift,ishiftx,ishift")
> + [(set_attr "isa" "*,bmi2,apx_ndd,avx512bw")
> + (set_attr "type" "ishift,ishiftx,ishift,msklog")
> (set (attr "length_immediate")
> (if_then_else
> (and (match_operand 2 "const1_operand")
> @@ -16869,7 +16871,12 @@ (define_insn "*<insn>si3_1_zext"
> (match_test "optimize_function_for_size_p (cfun)")))
> (const_string "0")
> (const_string "*")))
> - (set_attr "mode" "SI")])
> + (set_attr "mode" "SI")
> + (set (attr "enabled")
> + (if_then_else
> + (eq_attr "alternative" "3")
> + (symbol_ref "<CODE> == LSHIFTRT && TARGET_AVX512BW")
> + (const_string "*")))])
>
> ;; Convert shift to the shiftx pattern to avoid flags dependency.
> (define_split
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 0be2dcd8891..20665a6f097 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -2179,6 +2179,34 @@ (define_split
> (match_dup 2)))
> (unspec [(const_int 0)] UNSPEC_MASKOP)])])
>
> +(define_insn "*klshrsi3_1_zext"
> + [(set (match_operand:DI 0 "register_operand" "=k")
> + (zero_extend:DI
> + (lshiftrt:SI (match_operand:SI 1 "register_operand" "k")
> + (match_operand 2 "const_0_to_31_operand" "I"))))
> + (unspec [(const_int 0)] UNSPEC_MASKOP)]
> + "TARGET_AVX512BW"
> + "kshiftrd\t{%2, %1, %0|%0, %1, %2}"
> + [(set_attr "type" "msklog")
> + (set_attr "prefix" "vex")
> + (set_attr "mode" "SI")])
> +
> +(define_split
> + [(set (match_operand:DI 0 "mask_reg_operand")
> + (zero_extend:DI
> + (lshiftrt:SI
> + (match_operand:SI 1 "mask_reg_operand")
> + (match_operand 2 "const_0_to_31_operand"))))
> + (clobber (reg:CC FLAGS_REG))]
> + "TARGET_AVX512BW && reload_completed"
> + [(parallel
> + [(set (match_dup 0)
> + (zero_extend:DI
> + (lshiftrt:SI
> + (match_dup 1)
> + (match_dup 2))))
> + (unspec [(const_int 0)] UNSPEC_MASKOP)])])
> +
> (define_insn "ktest<mode>"
> [(set (reg:CC FLAGS_REG)
> (unspec:CC
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] [x86] Enable flate-combine.
2024-06-28 5:27 ` [PATCH 3/3] [x86] Enable flate-combine liuhongt
@ 2024-06-28 6:03 ` Uros Bizjak
0 siblings, 0 replies; 6+ messages in thread
From: Uros Bizjak @ 2024-06-28 6:03 UTC (permalink / raw)
To: liuhongt; +Cc: gcc-patches
On Fri, Jun 28, 2024 at 7:29 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also
> define target_insn_cost to prevent post_reload pass_late_combine to
> revert the optimziation did in pass_rpad.
>
> Adjust testcases since pass_late_combine generates better code but
> break scan assembly.
>
> .i.e
> Under 32-bit target, gcc used to generate broadcast from stack and
> then do the real operation.
> After flate_combine, they're combined into embeded broadcast
> operations.
>
> gcc/ChangeLog:
>
> * config/i386/i386-features.cc (ix86_rpad_gate): New function.
> * config/i386/i386-options.cc (ix86_override_options_after_change):
> Don't disable flate_combine.
> * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad
> after pre_reload pas_late_combine.
> * config/i386/i386-protos.h (ix86_rpad_gate): New declare.
> * config/i386/i386.cc (ix86_insn_cost): New function.
> (TARGET_INSN_COST): Define.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus
> testcase.
> * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto.
> * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto.
> * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto.
> * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto.
> * gcc.target/i386/pr91333.c: Ditto.
> * gcc.target/i386/vect-strided-4.c: Ditto.
LGTM.
Thanks,
Uros.
> ---
> gcc/config/i386/i386-features.cc | 16 +++++++++++-----
> gcc/config/i386/i386-options.cc | 4 ----
> gcc/config/i386/i386-passes.def | 4 ++--
> gcc/config/i386/i386-protos.h | 1 +
> gcc/config/i386/i386.cc | 18 ++++++++++++++++++
> .../i386/avx512f-broadcast-pr87767-1.c | 4 ++--
> .../i386/avx512f-broadcast-pr87767-5.c | 1 -
> .../gcc.target/i386/avx512f-fmadd-sf-zmm-7.c | 2 +-
> .../gcc.target/i386/avx512f-fmsub-sf-zmm-7.c | 2 +-
> .../gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c | 2 +-
> .../gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c | 2 +-
> .../i386/avx512vl-broadcast-pr87767-1.c | 4 ++--
> .../i386/avx512vl-broadcast-pr87767-5.c | 2 --
> gcc/testsuite/gcc.target/i386/pr91333.c | 2 +-
> gcc/testsuite/gcc.target/i386/vect-strided-4.c | 2 +-
> 15 files changed, 42 insertions(+), 24 deletions(-)
>
> diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
> index 607d1991460..fc224ed06b0 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -2995,6 +2995,16 @@ make_pass_insert_endbr_and_patchable_area (gcc::context *ctxt)
> return new pass_insert_endbr_and_patchable_area (ctxt);
> }
>
> +bool
> +ix86_rpad_gate ()
> +{
> + return (TARGET_AVX
> + && TARGET_SSE_PARTIAL_REG_DEPENDENCY
> + && TARGET_SSE_MATH
> + && optimize
> + && optimize_function_for_speed_p (cfun));
> +}
> +
> /* At entry of the nearest common dominator for basic blocks with
> conversions/rcp/sqrt/rsqrt/round, generate a single
> vxorps %xmmN, %xmmN, %xmmN
> @@ -3232,11 +3242,7 @@ public:
> /* opt_pass methods: */
> bool gate (function *) final override
> {
> - return (TARGET_AVX
> - && TARGET_SSE_PARTIAL_REG_DEPENDENCY
> - && TARGET_SSE_MATH
> - && optimize
> - && optimize_function_for_speed_p (cfun));
> + return ix86_rpad_gate ();
> }
>
> unsigned int execute (function *) final override
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 9c12d498928..1ef2c71a7a2 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1944,10 +1944,6 @@ ix86_override_options_after_change (void)
> flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
> }
>
> - /* Late combine tends to undo some of the effects of STV and RPAD,
> - by combining instructions back to their original form. */
> - if (!OPTION_SET_P (flag_late_combine_instructions))
> - flag_late_combine_instructions = 0;
> }
>
> /* Clear stack slot assignments remembered from previous functions.
> diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def
> index 7d96766f7b9..2d29f65da88 100644
> --- a/gcc/config/i386/i386-passes.def
> +++ b/gcc/config/i386/i386-passes.def
> @@ -25,11 +25,11 @@ along with GCC; see the file COPYING3. If not see
> */
>
> INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
> - INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
> + INSERT_PASS_AFTER (pass_late_combine, 1, pass_stv, false /* timode_p */);
> /* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and
> CONSTM1_RTX generated by the STV pass can be CSEed. */
> INSERT_PASS_BEFORE (pass_cse2, 1, pass_stv, true /* timode_p */);
>
> INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbr_and_patchable_area);
>
> - INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency);
> + INSERT_PASS_AFTER (pass_late_combine, 1, pass_remove_partial_avx_dependency);
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 4f48dc0bf75..3dbd18dc70b 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -422,6 +422,7 @@ extern rtl_opt_pass *make_pass_remove_partial_avx_dependency
> (gcc::context *);
>
> extern bool ix86_has_no_direct_extern_access;
> +extern bool ix86_rpad_gate ();
>
> /* In i386-expand.cc. */
> bool ix86_check_builtin_isa_match (unsigned int, HOST_WIDE_INT*,
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1f71ed04be6..9d2b7d1f174 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -21371,6 +21371,22 @@ ix86_shift_rotate_cost (const struct processor_costs *cost,
> }
> }
>
> +static int
> +ix86_insn_cost (rtx_insn *insn, bool speed)
> +{
> + int insn_cost = 0;
> + /* Add extra cost to avoid post_reload late_combine revert
> + the optimization did in pass_rpad. */
> + if (reload_completed
> + && ix86_rpad_gate ()
> + && recog_memoized (insn) >= 0
> + && get_attr_avx_partial_xmm_update (insn)
> + == AVX_PARTIAL_XMM_UPDATE_TRUE)
> + insn_cost += COSTS_N_INSNS (3);
> +
> + return insn_cost + pattern_cost (PATTERN (insn), speed);
> +}
> +
> /* Compute a (partial) cost for rtx X. Return true if the complete
> cost has been computed, and false if subexpressions should be
> scanned. In either case, *TOTAL contains the cost result. */
> @@ -26514,6 +26530,8 @@ static const scoped_attribute_specs *const ix86_attribute_table[] =
> #define TARGET_MEMORY_MOVE_COST ix86_memory_move_cost
> #undef TARGET_RTX_COSTS
> #define TARGET_RTX_COSTS ix86_rtx_costs
> +#undef TARGET_INSN_COST
> +#define TARGET_INSN_COST ix86_insn_cost
> #undef TARGET_ADDRESS_COST
> #define TARGET_ADDRESS_COST ix86_address_cost
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
> index 138dbb4c973..3a50749e610 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-1.c
> @@ -3,8 +3,8 @@
> /* { dg-options "-O2 -mavx512f -mavx512dq" } */
> /* { dg-additional-options "-fno-PIE" { target ia32 } } */
> /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
> -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */
> -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */
> +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to16\\\}" 2 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 3 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 3 { target { ! ia32 } } } } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
> index d22251bc2a3..ea2f64861d0 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-broadcast-pr87767-5.c
> @@ -3,7 +3,6 @@
> /* { dg-options "-O2 -mavx512f" } */
> /* { dg-additional-options "-fno-PIE" { target ia32 } } */
> /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
> -/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to8\\\}" { target ia32 } } } */
> /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %zmm\[0-9\]+" 4 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %zmm\[0-9\]+" 4 { target { ! ia32 } } } } */
>
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
> index 8c117207efa..bbcc5ed0bec 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmadd-sf-zmm-7.c
> @@ -1,6 +1,6 @@
> /* { dg-do compile } */
> /* { dg-options "-mavx512f -O2" } */
> -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "vfmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
>
> #define type __m512
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
> index cc705af8ea5..fc72dd6e557 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fmsub-sf-zmm-7.c
> @@ -1,6 +1,6 @@
> /* { dg-do compile } */
> /* { dg-options "-mavx512f -O2" } */
> -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "vfmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
>
> #define type __m512
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
> index db5c34678c0..342de482da8 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c
> @@ -1,6 +1,6 @@
> /* { dg-do compile } */
> /* { dg-options "-mavx512f -O2" } */
> -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "vfnmadd...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
>
> #define type __m512
> diff --git a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
> index 7815251b82d..f56a3f8acc4 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c
> @@ -1,6 +1,6 @@
> /* { dg-do compile } */
> /* { dg-options "-mavx512f -O2" } */
> -/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vbroadcastss\[^\n\]*%zmm\[0-9\]+" 1 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "vfnmsub...ps\[^\n\]*%zmm\[0-9\]+" 1 } } */
>
> #define type __m512
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
> index e6df4d25f36..08898445be5 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-1.c
> @@ -3,8 +3,8 @@
> /* { dg-options "-O2 -mavx512f -mavx512vl -mavx512dq" } */
> /* { dg-additional-options "-fno-PIE" { target ia32 } } */
> /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
> -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 } } */
> -/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 } } */
> +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to2\\\}" 2 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to4\\\}" 4 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "\[^\n\]*\\\{1to8\\\}" 2 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 3 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 3 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
> index ebdc3619d8e..c57a2e29767 100644
> --- a/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512vl-broadcast-pr87767-5.c
> @@ -3,8 +3,6 @@
> /* { dg-options "-O2 -mavx512f -mavx512vl" } */
> /* { dg-additional-options "-fno-PIE" { target ia32 } } */
> /* { dg-additional-options "-mdynamic-no-pic" { target { *-*-darwin* && ia32 } } }
> -/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to2\\\}" { target ia32 } } } */
> -/* { dg-final { scan-assembler-not "\[^\n\]*\\\{1to4\\\}" { target ia32 } } } */
> /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %xmm\[0-9\]+" 4 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%(?:r|e)\[^\n\]*, %ymm\[0-9\]+" 4 } } */
> /* { dg-final { scan-assembler-times "vpbroadcastq\[\\t \]+%r\[^\n\]*, %xmm\[0-9\]+" 4 { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr91333.c b/gcc/testsuite/gcc.target/i386/pr91333.c
> index 2bdff871024..b4940b5c9ec 100644
> --- a/gcc/testsuite/gcc.target/i386/pr91333.c
> +++ b/gcc/testsuite/gcc.target/i386/pr91333.c
> @@ -1,6 +1,6 @@
> /* { dg-do compile { target { ! ia32 } } } */
> /* { dg-options "-O2 -mavx" } */
> -/* { dg-final { scan-assembler-times "vmovapd|vmovsd" 3 } } */
> +/* { dg-final { scan-assembler-times "vmovapd|vmovsd" 2 } } */
>
> static inline double g (double x){
> asm volatile ("" : "+x" (x));
> diff --git a/gcc/testsuite/gcc.target/i386/vect-strided-4.c b/gcc/testsuite/gcc.target/i386/vect-strided-4.c
> index dd922926a2a..3fb9f07886e 100644
> --- a/gcc/testsuite/gcc.target/i386/vect-strided-4.c
> +++ b/gcc/testsuite/gcc.target/i386/vect-strided-4.c
> @@ -15,6 +15,6 @@ void foo (int * __restrict a, int * __restrict b, int *c, int s)
>
> /* Vectorization factor two, two two-element stores to a using movq
> and two two-element stores to b via pextrq/movhps of the high part. */
> -/* { dg-final { scan-assembler-times "movq" 2 } } */
> +/* { dg-final { scan-assembler-times "movq\[\t ]+%xmm\[0-9]" 2 } } */
> /* { dg-final { scan-assembler-times "pextrq" 2 { target { ! ia32 } } } } */
> /* { dg-final { scan-assembler-times "movhps" 2 { target { ia32 } } } } */
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-28 6:03 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-28 5:27 [PATCH 0/3][x86] Enable pass_late_combine for x86 liuhongt
2024-06-28 5:27 ` [PATCH 1/3] [avx512 testsuite] Define mask as extern instead of uninitialized local variables liuhongt
2024-06-28 5:27 ` [PATCH 2/3] Extend lshifrtsi3_1_zext to ?k alternative liuhongt
2024-06-28 5:59 ` Uros Bizjak
2024-06-28 5:27 ` [PATCH 3/3] [x86] Enable flate-combine liuhongt
2024-06-28 6:03 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).