From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id E6FDD3858D1E; Wed, 13 Mar 2024 01:03:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E6FDD3858D1E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1710291825; bh=cVtteQH5qMMskBHDBC0THScQhlMYc9Tooh34jLu1Dvg=; h=From:To:Subject:Date:From; b=WY+K5bii87PSA6Hwr3xfTUwdNSmN8/+EOtco7yj2SdT0oL7xsNAR+QAX00RtSWtTF DQo8Zn51LVclKHEfxqyq/ymkOGQ1RKATYQLATq+qOt3ksDsGChP2n6BGHJ4Df02e+H 8aQfTJwYKj5TtvbFQL23fKommgmhx2ROmG8H9K50= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work162-vpair)] Power10: Add options to disable load and store vector pair. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work162-vpair X-Git-Oldrev: 3ca2a9f1c968d61a4de44f410f89b4f98fefd2f2 X-Git-Newrev: 8135a35053e1bf1723ef225a3d75c19a0684f6f2 Message-Id: <20240313010345.E6FDD3858D1E@sourceware.org> Date: Wed, 13 Mar 2024 01:03:45 +0000 (GMT) List-Id: https://gcc.gnu.org/g:8135a35053e1bf1723ef225a3d75c19a0684f6f2 commit 8135a35053e1bf1723ef225a3d75c19a0684f6f2 Author: Michael Meissner Date: Tue Mar 12 20:09:21 2024 -0400 Power10: Add options to disable load and store vector pair. In working on some future patches that involve utilizing vector pair instructions, I wanted to be able to tune my program to enable or disable using the vector pair load or store operations while still keeping the other operations on the vector pair. This patch adds two undocumented tuning options. The -mno-load-vector-pair option would tell GCC to generate two load vector instructions instead of a single load vector pair. The -mno-store-vector-pair option would tell GCC to generate two store vector instructions instead of a single store vector pair. If either -mno-load-vector-pair is used, GCC will not generate the indexed stxvpx instruction. Similarly if -mno-store-vector-pair is used, GCC will not generate the indexed lxvpx instruction. The reason for this is to enable splitting the {,p}lxvp or {,p}stxvp instructions after reload without needing a scratch GPR register. The default for -mcpu=power10 is that both load vector pair and store vector pair are enabled. I added code so that the user code can modify these settings using either a '#pragma GCC target' directive or used __attribute__((__target__(...))) in the function declaration. I added tests for the switches, #pragma, and attribute options. I have built this on both little endian power10 systems and big endian power9 systems doing the normal bootstrap and test. There were no regressions in any of the tests, and the new tests passed. Can I check this patch into the master branch? 2024-03-12 Michael Meissner gcc/ * config/rs6000/mma.md (movoo): Add support for -mno-load-vector-pair and -mno-store-vector-pair. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add support for -mload-vector-pair and -mstore-vector-pair. (POWERPC_MASKS): Likewise. * config/rs6000/rs6000.cc (rs6000_setup_reg_addr_masks): Only allow indexed mode for OOmode if we are generating both load vector pair and store vector pair instructions. (rs6000_option_override_internal): Add support for -mno-load-vector-pair and -mno-store-vector-pair. (rs6000_opt_masks): Likewise. * config/rs6000/rs6000.md (isa attribute): Add lxvp and stxvp attributes. (enabled attribute): Likewise. * config/rs6000/rs6000.opt (-mload-vector-pair): New option. (-mstore-vector-pair): Likewise. gcc/testsuite/ * gcc.target/powerpc/vector-pair-attribute.c: New test. * gcc.target/powerpc/vector-pair-pragma.c: New test. * gcc.target/powerpc/vector-pair-switch1.c: New test. * gcc.target/powerpc/vector-pair-switch2.c: New test. * gcc.target/powerpc/vector-pair-switch3.c: New test. * gcc.target/powerpc/vector-pair-switch4.c: New test. Diff: --- gcc/config/rs6000/mma.md | 19 +++++--- gcc/config/rs6000/rs6000-cpus.def | 8 +++- gcc/config/rs6000/rs6000.cc | 30 +++++++++++- gcc/config/rs6000/rs6000.md | 10 +++- gcc/config/rs6000/rs6000.opt | 8 ++++ .../gcc.target/powerpc/vector-pair-attribute.c | 39 +++++++++++++++ .../gcc.target/powerpc/vector-pair-pragma.c | 55 ++++++++++++++++++++++ .../gcc.target/powerpc/vector-pair-switch1.c | 16 +++++++ .../gcc.target/powerpc/vector-pair-switch2.c | 17 +++++++ .../gcc.target/powerpc/vector-pair-switch3.c | 17 +++++++ .../gcc.target/powerpc/vector-pair-switch4.c | 17 +++++++ 11 files changed, 225 insertions(+), 11 deletions(-) diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md index 04e2d0066df..6a7d8a836db 100644 --- a/gcc/config/rs6000/mma.md +++ b/gcc/config/rs6000/mma.md @@ -292,27 +292,34 @@ gcc_assert (false); }) +;; If the user used -mno-store-vector-pair or -mno-load-vector pair, use an +;; alternative that does not allow indexed addresses so we can split the load +;; or store. (define_insn_and_split "*movoo" - [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,ZwO,wa") - (match_operand:OO 1 "input_operand" "ZwO,wa,wa"))] + [(set (match_operand:OO 0 "nonimmediate_operand" "=wa,wa,ZwO,QwO,wa") + (match_operand:OO 1 "input_operand" "ZwO,QwO,wa,wa,wa"))] "TARGET_MMA && (gpc_reg_operand (operands[0], OOmode) || gpc_reg_operand (operands[1], OOmode))" "@ lxvp%X1 %x0,%1 + # stxvp%X0 %x1,%0 + # #" "&& reload_completed - && (!MEM_P (operands[0]) && !MEM_P (operands[1]))" + && ((MEM_P (operands[0]) && !TARGET_STORE_VECTOR_PAIR) + || (MEM_P (operands[1]) && !TARGET_LOAD_VECTOR_PAIR) + || (!MEM_P (operands[0]) && !MEM_P (operands[1])))" [(const_int 0)] { rs6000_split_multireg_move (operands[0], operands[1]); DONE; } - [(set_attr "type" "vecload,vecstore,veclogical") + [(set_attr "type" "vecload,vecload,vecstore,vecstore,veclogical") (set_attr "size" "256") - (set_attr "length" "*,*,8")]) - + (set_attr "length" "*,8,*,8,8") + (set_attr "isa" "lxvp,*,stxvp,*,*")]) ;; Vector quad support. XOmode can only live in FPRs. (define_expand "movxo" diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 47365534af8..6e0b2449b18 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -77,10 +77,12 @@ /* Flags that need to be turned off if -mno-power10. */ /* We comment out PCREL_OPT here to disable it by default because SPEC2017 performance was degraded by it. */ -#define OTHER_POWER10_MASKS (OPTION_MASK_MMA \ +#define OTHER_POWER10_MASKS (OPTION_MASK_LOAD_VECTOR_PAIR \ + | OPTION_MASK_MMA \ | OPTION_MASK_PCREL \ /* | OPTION_MASK_PCREL_OPT */ \ - | OPTION_MASK_PREFIXED) + | OPTION_MASK_PREFIXED \ + | OPTION_MASK_STORE_VECTOR_PAIR) #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \ | OPTION_MASK_POWER10 \ @@ -131,6 +133,7 @@ | OPTION_MASK_FLOAT128_KEYWORD \ | OPTION_MASK_FPRND \ | OPTION_MASK_FUTURE \ + | OPTION_MASK_LOAD_VECTOR_PAIR \ | OPTION_MASK_POWER10 \ | OPTION_MASK_POWER11 \ | OPTION_MASK_P10_FUSION \ @@ -158,6 +161,7 @@ | OPTION_MASK_QUAD_MEMORY_ATOMIC \ | OPTION_MASK_RECIP_PRECISION \ | OPTION_MASK_SOFT_FLOAT \ + | OPTION_MASK_STORE_VECTOR_PAIR \ | OPTION_MASK_STRICT_ALIGN_OPTIONAL \ | OPTION_MASK_VSX) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 84d918ef7b8..08198fa9fdf 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -2722,7 +2722,9 @@ rs6000_setup_reg_addr_masks (void) /* Vector pairs can do both indexed and offset loads if the instructions are enabled, otherwise they can only do offset loads since it will be broken into two vector moves. Vector quads can - only do offset loads. */ + only do offset loads. If the user restricted generation of either + of the LXVP or STXVP instructions, do not allow indexed mode so + that we can split the load/store. */ else if ((addr_mask != 0) && TARGET_MMA && (m2 == OOmode || m2 == XOmode)) { @@ -2730,7 +2732,9 @@ rs6000_setup_reg_addr_masks (void) if (rc == RELOAD_REG_FPR || rc == RELOAD_REG_VMX) { addr_mask |= RELOAD_REG_QUAD_OFFSET; - if (m2 == OOmode) + if (m2 == OOmode + && TARGET_LOAD_VECTOR_PAIR + && TARGET_STORE_VECTOR_PAIR) addr_mask |= RELOAD_REG_INDEXED; } } @@ -4375,6 +4379,26 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_MMA; } + /* Warn if -m-load-vector-pair or -m-store-vector-pair are used and MMA is + not set. */ + if (!TARGET_MMA && TARGET_LOAD_VECTOR_PAIR) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_LOAD_VECTOR_PAIR) != 0) + warning (0, "%qs should not be used unless you use %qs", + "-mload-vector-pair", "-mmma"); + + rs6000_isa_flags &= ~OPTION_MASK_LOAD_VECTOR_PAIR; + } + + if (!TARGET_MMA && TARGET_STORE_VECTOR_PAIR) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_STORE_VECTOR_PAIR) != 0) + warning (0, "%qs should not be used unless you use %qs", + "-mstore-vector-pair", "-mmma"); + + rs6000_isa_flags &= OPTION_MASK_STORE_VECTOR_PAIR; + } + /* Enable power10 fusion if we are tuning for power10, even if we aren't generating power10 instructions. */ if (!(rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION)) @@ -24469,6 +24493,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "hard-dfp", OPTION_MASK_DFP, false, true }, { "htm", OPTION_MASK_HTM, false, true }, { "isel", OPTION_MASK_ISEL, false, true }, + { "load-vector-pair", OPTION_MASK_LOAD_VECTOR_PAIR, false, true }, { "mfcrf", OPTION_MASK_MFCRF, false, true }, { "mfpgpr", 0, false, true }, { "mma", OPTION_MASK_MMA, false, true }, @@ -24493,6 +24518,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "quad-memory-atomic", OPTION_MASK_QUAD_MEMORY_ATOMIC, false, true }, { "recip-precision", OPTION_MASK_RECIP_PRECISION, false, true }, { "save-toc-indirect", OPTION_MASK_SAVE_TOC_INDIRECT, false, true }, + { "store-vector-pair", OPTION_MASK_STORE_VECTOR_PAIR, false, true }, { "string", 0, false, true }, { "update", OPTION_MASK_NO_UPDATE, true , true }, { "vsx", OPTION_MASK_VSX, false, true }, diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index abc809448ad..66f35b4dbe9 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -355,7 +355,7 @@ (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,lxvp,stxvp" (const_string "any")) ;; Is this alternative enabled for the current CPU/ISA/etc.? @@ -403,6 +403,14 @@ (and (eq_attr "isa" "p10") (match_test "TARGET_POWER10")) (const_int 1) + + (and (eq_attr "isa" "lxvp") + (match_test "TARGET_LOAD_VECTOR_PAIR")) + (const_int 1) + + (and (eq_attr "isa" "stxvp") + (match_test "TARGET_STORE_VECTOR_PAIR")) + (const_int 1) ] (const_int 0))) ;; If this instruction is microcoded on the CELL processor diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 621ebd65a88..86a53fd3023 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -603,6 +603,14 @@ mmma Target Mask(MMA) Var(rs6000_isa_flags) Generate (do not generate) MMA instructions. +mload-vector-pair +Target Undocumented Mask(LOAD_VECTOR_PAIR) Var(rs6000_isa_flags) +Generate (do not generate) load vector pair instructions. + +mstore-vector-pair +Target Undocumented Mask(STORE_VECTOR_PAIR) Var(rs6000_isa_flags) +Generate (do not generate) store vector pair instructions. + mrelative-jumptables Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c new file mode 100644 index 00000000000..985a44aca85 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-attribute.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test if we can control generating load and store vector pair via the target + attribute. */ + +__attribute__((__target__("load-vector-pair,store-vector-pair"))) +void +test_load_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 1 stxvp. */ +} + +__attribute__((__target__("load-vector-pair,no-store-vector-pair"))) +void +test_load_no_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 2 stxv. */ +} + +__attribute__((__target__("no-load-vector-pair,store-vector-pair"))) +void +test_store_no_load (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 1 stxvp. */ +} + +__attribute__((__target__("no-load-vector-pair,no-store-vector-pair"))) +void +test_no_load_or_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c new file mode 100644 index 00000000000..74c6baf8185 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-pragma.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test if we can control generating load and store vector pair via the #pragma + directive. */ + +#pragma gcc push_options +#pragma GCC target("load-vector-pair,store-vector-pair") + +void +test_load_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 1 stxvp. */ +} + +#pragma gcc pop_options + +#pragma gcc push_options +#pragma GCC target("load-vector-pair,no-store-vector-pair") + +void +test_load_no_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 2 stxv. */ +} + +#pragma gcc pop_options + +#pragma gcc push_options +#pragma GCC target("no-load-vector-pair,store-vector-pair") + +void +test_store_no_load (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 1 stxvp. */ +} + +#pragma gcc pop_options + +#pragma gcc push_options +#pragma GCC target("no-load-vector-pair,no-store-vector-pair") + +void +test_no_load_or_store (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +#pragma gcc pop_options + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 4 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c new file mode 100644 index 00000000000..48e433b378e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test if we generate load and store vector pair by default on power 10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 1 stxvp. */ +} + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mp?lxvx?\M} } } */ +/* { dg-final { scan-assembler-not {\mp?stxvx?\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c new file mode 100644 index 00000000000..2a38c2f2aae --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-store-vector-pair" } */ + +/* Test if we generate load vector pair but not store vector pair if + -mno-store-vector-pair is used on power10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 1 lxvp, 2 stxv. */ +} + +/* { dg-final { scan-assembler-times {\mp?lxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mp?stxvpx?\M} } } */ +/* { dg-final { scan-assembler-not {\mp?lxvx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c new file mode 100644 index 00000000000..fd273056b8f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch3.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-load-vector-pair" } */ + +/* Test if we do not generate load vector pair but generate store vector pair + if -mno-load-vector-pair is used on power10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 1 stxvp. */ +} + +/* { dg-final { scan-assembler-not {\mp?lxvpx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?stxvpx?\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 2 } } */ +/* { dg-final { scan-assembler-not {\mp?stxvx?\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c new file mode 100644 index 00000000000..01686e073fe --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-switch4.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mno-load-vector-pair -mno-store-vector-pair" } */ + +/* Test if we do not generate load and store vector pair if directed to on + power 10. */ + +void +test (__vector_pair *p, __vector_pair *q) +{ + *p = *q; /* 2 lxv, 2 stxv. */ +} + +/* { dg-final { scan-assembler-not {\mp?lxvpx?\M} } } */ +/* { dg-final { scan-assembler-not {\mp?stxvpx?\M} } } */ +/* { dg-final { scan-assembler-times {\mp?lxvx?\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?stxvx?\M} 2 } } */