From: Richard Biener <richard.guenther@gmail.com>
To: Ajit Agarwal <aagarwa1@linux.ibm.com>
Cc: "Kewen.Lin" <linkw@linux.ibm.com>,
Vladimir Makarov <vmakarov.gcc@gmail.com>,
Michael Meissner <meissner@linux.ibm.com>,
Segher Boessenkool <segher@kernel.crashing.org>,
Peter Bergner <bergner@linux.ibm.com>,
David Edelsohn <dje.gcc@gmail.com>,
gcc-patches <gcc-patches@gcc.gnu.org>,
Richard Sandiford <richard.sandiford@arm.com>
Subject: Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp
Date: Mon, 15 Jan 2024 10:33:32 +0100 [thread overview]
Message-ID: <CAFiYyc2hUn=MjrsrCExCcZJRp2=AAAidCzx_4YHdCVB20RFm5g@mail.gmail.com> (raw)
In-Reply-To: <f4131a17-61dd-4720-8667-b20ec240d1f0@linux.ibm.com>
On Sun, Jan 14, 2024 at 4:29 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>
> Hello All:
>
> This patch add the vecload pass to replace adjacent memory accesses lxv with lxvp
> instructions. This pass is added before ira pass.
>
> vecload pass removes one of the defined adjacent lxv (load) and replace with lxvp.
> Due to removal of one of the defined loads the allocno is has only uses but
> not defs.
>
> Due to this IRA pass doesn't assign register pairs like registers in sequence.
> Changes are made in IRA register allocator to assign sequential registers to
> adjacent loads.
>
> Some of the registers are cleared and are not set as profitable registers due
> to zero cost is greater than negative costs and checks are added to compare
> positive costs.
>
> LRA register is changed not to reassign them to different register and form
> the sequential register pairs intact.
>
>
> contrib/check_GNU_style.sh run on patch looks good.
>
> Bootstrapped and regtested for powerpc64-linux-gnu.
>
> Spec2017 benchmarks are run and I get impressive benefits for some of the FP
> benchmarks.
I want to point out the aarch64 target recently got a ld/st fusion
pass which sounds
related. It would be nice to have at least common infrastructure for
this (the aarch64
one also looks quite more powerful)
> Thanks & Regards
> Ajit
>
>
> rs6000: New pass for replacement of adjacent lxv with lxvp.
>
> New pass to replace adjacent memory addresses lxv with lxvp.
> This pass is registered before ira rtl pass.
>
> 2024-01-14 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-passes.def: Registered vecload pass.
> * config/rs6000/rs6000-vecload-opt.cc: Add new pass.
> * config.gcc: Add new executable.
> * config/rs6000/rs6000-protos.h: Add new prototype for vecload
> pass.
> * config/rs6000/rs6000.cc: Add new prototype for vecload pass.
> * config/rs6000/t-rs6000: Add new rule.
> * ira-color.cc: Form register pair with adjacent loads.
> * lra-assigns.cc: Skip modifying register pair assignment.
> * lra-int.h: Add pseudo_conflict field in lra_reg_p structure.
> * lra.cc: Initialize pseudo_conflict field.
> * ira-build.cc: Use of REG_FREQ.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/powerpc/vecload.C: New test.
> * g++.target/powerpc/vecload1.C: New test.
> * gcc.target/powerpc/mma-builtin-1.c: Modify test.
> ---
> gcc/config.gcc | 4 +-
> gcc/config/rs6000/rs6000-passes.def | 4 +
> gcc/config/rs6000/rs6000-protos.h | 5 +-
> gcc/config/rs6000/rs6000-vecload-opt.cc | 432 ++++++++++++++++++
> gcc/config/rs6000/rs6000.cc | 8 +-
> gcc/config/rs6000/t-rs6000 | 5 +
> gcc/ira-color.cc | 220 ++++++++-
> gcc/lra-assigns.cc | 118 ++++-
> gcc/lra-int.h | 2 +
> gcc/lra.cc | 1 +
> gcc/testsuite/g++.target/powerpc/vecload.C | 15 +
> gcc/testsuite/g++.target/powerpc/vecload1.C | 22 +
> .../gcc.target/powerpc/mma-builtin-1.c | 4 +-
> 13 files changed, 816 insertions(+), 24 deletions(-)
> create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc
> create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C
> create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index f0676c830e8..4cf15e807de 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
> ;;
> powerpc*-*-*)
> cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-vecload-opt.o"
> extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
> extra_objs="${extra_objs} rs6000-builtins.o rs6000-builtin.o"
> extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
> @@ -555,7 +555,7 @@ riscv*)
> ;;
> rs6000*-*-*)
> extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-vecload-opt.o"
> extra_objs="${extra_objs} rs6000-call.o rs6000-pcrel-opt.o"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.cc \$(srcdir)/config/rs6000/rs6000-call.cc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-pcrel-opt.cc"
> diff --git a/gcc/config/rs6000/rs6000-passes.def b/gcc/config/rs6000/rs6000-passes.def
> index ca899d5f7af..8bd172dd779 100644
> --- a/gcc/config/rs6000/rs6000-passes.def
> +++ b/gcc/config/rs6000/rs6000-passes.def
> @@ -29,6 +29,10 @@ along with GCC; see the file COPYING3. If not see
> for loads and stores. */
> INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
>
> + /* Pass to replace adjacent memory addresses lxv instruction with lxvp
> + instruction. */
> + INSERT_PASS_BEFORE (pass_ira, 1, pass_analyze_vecload);
> +
> /* Pass to do the PCREL_OPT optimization that combines the load of an
> external symbol's address along with a single load or store using that
> address as a base register. */
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index f70118ea40f..83ee773a6f8 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -343,12 +343,15 @@ namespace gcc { class context; }
> class rtl_opt_pass;
>
> extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
> +extern rtl_opt_pass *make_pass_analyze_vecload (gcc::context *);
> extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
> extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
> extern bool rs6000_quadword_masked_address_p (const_rtx exp);
> extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
> extern rtx rs6000_gen_stvx (enum machine_mode, rtx, rtx);
> -
> +extern bool mode_supports_dq_form (machine_mode);
> +extern bool get_memref_parts (rtx, rtx *, HOST_WIDE_INT *, HOST_WIDE_INT *);
> +extern rtx adjacent_mem_locations (rtx, rtx);
> extern void rs6000_emit_xxspltidp_v2df (rtx, long value);
> extern gimple *currently_expanding_gimple_stmt;
> extern bool rs6000_opaque_type_invalid_use_p (gimple *);
> diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc b/gcc/config/rs6000/rs6000-vecload-opt.cc
> new file mode 100644
> index 00000000000..d9c11a6caf1
> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-vecload-opt.cc
> @@ -0,0 +1,432 @@
> +/* Subroutines used to replace lxv with lxvp
> + for TARGET_POWER10 and TARGET_VSX,
> +
> + Copyright (C) 2020-2023 Free Software Foundation, Inc.
> + Contributed by Ajit Kumar Agarwal <aagarwa1@linux.ibm.com>.
> +
> + This file is part of GCC.
> +
> + GCC is free software; you can redistribute it and/or modify it
> + under the terms of the GNU General Public License as published
> + by the Free Software Foundation; either version 3, or (at your
> + option) any later version.
> +
> + GCC is distributed in the hope that it will be useful, but WITHOUT
> + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
> + License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with GCC; see the file COPYING3. If not see
> + <http://www.gnu.org/licenses/>. */
> +
> +#define IN_TARGET_CODE 1
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "rtl.h"
> +#include "tree-pass.h"
> +#include "df.h"
> +#include "dumpfile.h"
> +#include "rs6000-internal.h"
> +#include "rs6000-protos.h"
> +
> +/* Return false if dependent rtx LOC is SUBREG. */
> +static bool
> +is_feasible (rtx_insn *insn)
> +{
> + df_ref use;
> + df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
> + {
> + struct df_link *def_link = DF_REF_CHAIN (use);
> + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref))
> + continue;
> + while (def_link && def_link->ref)
> + {
> + rtx *loc = DF_REF_LOC (def_link->ref);
> + if (!loc || *loc == NULL_RTX)
> + return false;
> + if (GET_CODE (*loc) == SUBREG)
> + return false;
> + def_link = def_link->next;
> + }
> + }
> + return true;
> +}
> +
> +/* df_scan_rescan the unspec instruction where operands
> + are reversed. */
> +void set_rescan_for_unspec (rtx_insn *insn)
> +{
> + df_ref use;
> + df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> + rtx_insn *select_insn2;
> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
> + {
> + struct df_link *def_link = DF_REF_CHAIN (use);
> + while (def_link && def_link->ref)
> + {
> + select_insn2 = DF_REF_INSN (def_link->ref);
> + rtx set = single_set (select_insn2);
> +
> + if (set == NULL_RTX)
> + return;
> +
> + if (set != NULL_RTX)
> + {
> + rtx op0 = SET_SRC (set);
> + if (GET_CODE (op0) != UNSPEC)
> + return;
> +
> + if (GET_CODE (op0) == VEC_SELECT
> + && GET_CODE (XEXP (op0, 1)) == PARALLEL)
> + return;
> +
> + if (GET_CODE (op0) == UNSPEC)
> + df_insn_rescan (select_insn2);
> + }
> + def_link = def_link->next;
> + }
> + }
> +}
> +
> +/* Return dependent UNSPEC instruction. */
> +rtx_insn *get_rtx_UNSPEC (rtx_insn *insn)
> +{
> + df_ref use;
> + df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
> + rtx_insn *select_insn2;
> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
> + {
> + struct df_link *def_link = DF_REF_CHAIN (use);
> + while (def_link && def_link->ref)
> + {
> + select_insn2 = DF_REF_INSN (def_link->ref);
> + rtx set = single_set (select_insn2);
> +
> + if (set == NULL_RTX)
> + return 0;
> +
> + if (set != NULL_RTX)
> + {
> + rtx op0 = SET_SRC (set);
> +
> + if (GET_CODE (op0) == UNSPEC)
> + return select_insn2;
> + }
> + def_link = def_link->next;
> + }
> + }
> + return 0;
> +}
> +
> +/* Replace identified lxv with lxvp.
> + Bail out if following condition are true:
> +
> + - dependent instruction of load is vec_select instruction,
> +
> + - machine mode of unspec is not same as machine mode
> + of lxv instruction.
> +
> + - dependent instruction is not unspec.
> +
> + - Source operand of unspec is eq instruction. */
> +
> +static bool
> +replace_lxv_with_lxvp (rtx_insn *insn1, rtx_insn *insn2)
> +{
> + rtx body = PATTERN (insn1);
> + rtx src_exp = SET_SRC (body);
> + rtx dest_exp = SET_DEST (body);
> + rtx lxv;
> + rtx insn2_body = PATTERN (insn2);
> + rtx insn2_dest_exp = SET_DEST (insn2_body);
> +
> + if (GET_MODE (src_exp) != GET_MODE (SET_SRC (insn2_body)))
> + return false;
> +
> + if (GET_MODE (dest_exp) == TImode)
> + return false;
> +
> + if (!ALTIVEC_OR_VSX_VECTOR_MODE (GET_MODE (dest_exp)))
> + return false;
> +
> + if (!is_feasible (insn1))
> + return false;
> +
> + if (!is_feasible (insn2))
> + return false;
> +
> + for (rtx note = REG_NOTES (insn1); note; note = XEXP (note, 1))
> + if (REG_NOTE_KIND (note) == REG_EQUAL
> + || REG_NOTE_KIND (note) == REG_EQUIV)
> + return false;
> +
> + int no_dep = 0;
> + df_ref use;
> + df_insn_info *insn_info = DF_INSN_INFO_GET (insn1);
> + rtx_insn *select_insn2;
> +
> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
> + {
> + struct df_link *def_link = DF_REF_CHAIN (use);
> + while (def_link && def_link->ref)
> + {
> + select_insn2 = DF_REF_INSN (def_link->ref);
> + rtx set = single_set (select_insn2);
> +
> + if (set == NULL_RTX)
> + return false;
> +
> + if (set != NULL_RTX)
> + {
> + rtx op0 = SET_SRC (set);
> +
> + if (GET_CODE (op0) != UNSPEC)
> + return false;
> +
> + if (GET_CODE (op0) == VEC_SELECT
> + && GET_CODE (XEXP (op0, 1)) == PARALLEL)
> + return false;
> +
> + if (GET_CODE (op0) == UNSPEC)
> + {
> + if (GET_MODE (op0) != XOmode
> + && GET_MODE (op0) != GET_MODE (dest_exp))
> + return false;
> +
> + int nvecs = XVECLEN (op0, 0);
> + for (int i = 0; i < nvecs; i++)
> + {
> + rtx op;
> + op = XVECEXP (op0, 0, i);
> +
> + if (GET_MODE (op) == OOmode)
> + return false;
> + if (GET_CODE (op) == EQ)
> + return false;
> + }
> + }
> + ++no_dep;
> + }
> + def_link = def_link->next;
> + }
> + }
> +
> + rtx_insn *insn = get_rtx_UNSPEC (insn1);
> +
> + if (insn && insn == get_rtx_UNSPEC (insn2) && no_dep == 1)
> + return false;
> +
> +
> + insn_info = DF_INSN_INFO_GET (insn2);
> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
> + {
> + struct df_link *def_link = DF_REF_CHAIN (use);
> + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref))
> + continue;
> + while (def_link && def_link->ref)
> + {
> + rtx *loc = DF_REF_LOC (def_link->ref);
> + *loc = dest_exp;
> + def_link = def_link->next;
> + }
> + }
> +
> + insn_info = DF_INSN_INFO_GET (insn1);
> + FOR_EACH_INSN_INFO_DEF (use, insn_info)
> + {
> + struct df_link *def_link = DF_REF_CHAIN (use);
> + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link->ref))
> + continue;
> + while (def_link && def_link->ref)
> + {
> + rtx *loc = DF_REF_LOC (def_link->ref);
> + PUT_MODE_RAW (*loc, OOmode);
> + *loc = insn2_dest_exp;
> + def_link = def_link->next;
> + }
> + }
> +
> + set_rescan_for_unspec (insn1);
> + set_rescan_for_unspec (insn2);
> + df_insn_rescan (insn1);
> + df_insn_rescan (insn2);
> +
> + PUT_MODE_RAW (src_exp, OOmode);
> + PUT_MODE_RAW (dest_exp, OOmode);
> + lxv = gen_movoo (dest_exp, src_exp);
> + rtx_insn *new_insn = emit_insn_before (lxv, insn1);
> + set_block_for_insn (new_insn, BLOCK_FOR_INSN (insn1));
> + df_insn_rescan (new_insn);
> +
> + if (dump_file)
> + {
> + unsigned int new_uid = INSN_UID (new_insn);
> + fprintf (dump_file, "Replacing lxv %d with lxvp %d\n",
> + INSN_UID (insn1), new_uid);
> + print_rtl_single (dump_file, new_insn);
> + print_rtl_single (dump_file, insn1);
> + print_rtl_single (dump_file, insn2);
> +
> + }
> +
> + df_insn_delete (insn1);
> + remove_insn (insn1);
> + df_insn_delete (insn2);
> + remove_insn (insn2);
> + insn1->set_deleted ();
> + insn2->set_deleted ();
> + return true;
> +}
> +
> +/* Identify adjacent memory address lxv instruction and
> + replace them with lxvp instruction. */
> +unsigned int
> +rs6000_analyze_vecload (function *fun)
> +{
> + df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> + df_analyze ();
> + df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> + /* Rebuild ud- and du-chains. */
> + df_remove_problem (df_chain);
> + df_process_deferred_rescans ();
> + df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
> + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
> + df_analyze ();
> + df_set_flags (DF_DEFER_INSN_RESCAN);
> +
> + basic_block bb;
> + bool changed = false;
> + rtx_insn *insn, *curr_insn = 0;
> + rtx_insn *insn1 = 0, *insn2 = 0;
> + bool first_vec_insn = false;
> + unsigned int regno = 0;
> + int index = -1;
> + FOR_ALL_BB_FN (bb, fun)
> + {
> + index = bb->index;
> + FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
> + {
> + if (LABEL_P (insn))
> + continue;
> +
> + if (NONDEBUG_INSN_P (insn) && GET_CODE (PATTERN (insn)) == SET)
> + {
> + rtx set = single_set (insn);
> + rtx src = SET_SRC (set);
> + machine_mode mode = GET_MODE (SET_DEST (set));
> +
> + if (MEM_P (src))
> + {
> + if (mem_operand_ds_form (src, mode)
> + || (mode_supports_dq_form (mode)
> + && quad_address_p (XEXP (src, 0), mode, false)))
> + {
> + if (first_vec_insn)
> + {
> + first_vec_insn = false;
> + rtx addr = XEXP (src, 0);
> +
> + if (REG_P (addr))
> + continue;
> +
> + insn2 = insn;
> + rtx insn1_src = SET_SRC (PATTERN (insn1));
> +
> + int offset = 0;
> +
> + if (GET_CODE (addr) == PLUS
> + && XEXP (addr, 1)
> + && !REG_P (XEXP (addr, 1))
> + && CONST_INT_P (XEXP (addr, 1)))
> + {
> + rtx off = XEXP (addr, 1);
> + offset = INTVAL (off);
> + }
> +
> + if ((offset % 2 == 0)
> + && adjacent_mem_locations (insn1_src, src)
> + == insn1_src)
> + {
> + rtx op0 = XEXP (addr, 0);
> +
> + if (regno == REGNO (op0)
> + && index == bb->index)
> + {
> + index = -1;
> + changed
> + = replace_lxv_with_lxvp (insn1, insn2);
> + }
> + }
> + }
> +
> + else if (REG_P (XEXP (src, 0))
> + && GET_CODE (XEXP (src, 0)) != PLUS)
> + {
> + regno = REGNO (XEXP (src,0));
> + first_vec_insn = true;
> + insn1 = insn;
> + }
> + else if (GET_CODE (XEXP (src, 0)) == PLUS)
> + {
> + rtx addr = XEXP (src, 0);
> + rtx op0 = XEXP (addr, 0);
> +
> + if (REG_P (op0))
> + regno = REGNO (op0);
> +
> + first_vec_insn = true;
> + insn1 = insn;
> + }
> + }
> + }
> + }
> + }
> + }
> + return changed;
> +}
> +
> +const pass_data pass_data_analyze_vecload =
> +{
> + RTL_PASS, /* type */
> + "vecload", /* name */
> + OPTGROUP_NONE, /* optinfo_flags */
> + TV_NONE, /* tv_id */
> + 0, /* properties_required */
> + 0, /* properties_provided */
> + 0, /* properties_destroyed */
> + 0, /* todo_flags_start */
> + TODO_df_finish, /* todo_flags_finish */
> +};
> +
> +class pass_analyze_vecload : public rtl_opt_pass
> +{
> +public:
> + pass_analyze_vecload(gcc::context *ctxt)
> + : rtl_opt_pass(pass_data_analyze_vecload, ctxt)
> + {}
> +
> + /* opt_pass methods: */
> + virtual bool gate (function *)
> + {
> + return (optimize > 0 && TARGET_VSX && TARGET_POWER10);
> + }
> +
> + virtual unsigned int execute (function *fun)
> + {
> + return rs6000_analyze_vecload (fun);
> + }
> +}; // class pass_analyze_vecload
> +
> +rtl_opt_pass *
> +make_pass_analyze_vecload (gcc::context *ctxt)
> +{
> + return new pass_analyze_vecload (ctxt);
> +}
> +
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 6b9a40fcc66..5f0ec8239c1 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -387,7 +387,7 @@ mode_supports_vmx_dform (machine_mode mode)
> /* Return true if we have D-form addressing in VSX registers. This addressing
> is more limited than normal d-form addressing in that the offset must be
> aligned on a 16-byte boundary. */
> -static inline bool
> +bool
> mode_supports_dq_form (machine_mode mode)
> {
> return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_QUAD_OFFSET)
> @@ -1178,6 +1178,8 @@ static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
> secondary_reload_info *,
> bool);
> rtl_opt_pass *make_pass_analyze_swaps (gcc::context*);
> +rtl_opt_pass *make_pass_analyze_vecload (gcc::context*);
> +
>
> /* Hash table stuff for keeping track of TOC entries. */
>
> @@ -18644,7 +18646,7 @@ set_to_load_agen (rtx_insn *out_insn, rtx_insn *in_insn)
> This function only looks for REG or REG+CONST address forms.
> REG+REG address form will return false. */
>
> -static bool
> +bool
> get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset,
> HOST_WIDE_INT *size)
> {
> @@ -18676,7 +18678,7 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset,
> adjacent, then return the argument that has the lower address.
> Otherwise, return NULL_RTX. */
>
> -static rtx
> +rtx
> adjacent_mem_locations (rtx mem1, rtx mem2)
> {
> rtx reg1, reg2;
> diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
> index f183b42ce1d..0b6852f2d38 100644
> --- a/gcc/config/rs6000/t-rs6000
> +++ b/gcc/config/rs6000/t-rs6000
> @@ -35,6 +35,11 @@ rs6000-p8swap.o: $(srcdir)/config/rs6000/rs6000-p8swap.cc
> $(COMPILE) $<
> $(POSTCOMPILE)
>
> +rs6000-vecload-opt.o: $(srcdir)/config/rs6000/rs6000-vecload-opt.cc
> + $(COMPILE) $<
> + $(POSTCOMPILE)
> +
> +
> rs6000-d.o: $(srcdir)/config/rs6000/rs6000-d.cc
> $(COMPILE) $<
> $(POSTCOMPILE)
> diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
> index 214a4f16d3c..73e9891a529 100644
> --- a/gcc/ira-color.cc
> +++ b/gcc/ira-color.cc
> @@ -1047,6 +1047,8 @@ setup_profitable_hard_regs (void)
> continue;
> data = ALLOCNO_COLOR_DATA (a);
> if (ALLOCNO_UPDATED_HARD_REG_COSTS (a) == NULL
> + && ALLOCNO_CLASS_COST (a) > 0
> + && ALLOCNO_MEMORY_COST (a) > 0
> && ALLOCNO_CLASS_COST (a) > ALLOCNO_MEMORY_COST (a)
> /* Do not empty profitable regs for static chain pointer
> pseudo when non-local goto is used. */
> @@ -1131,6 +1133,8 @@ setup_profitable_hard_regs (void)
> hard_regno))
> continue;
> if (ALLOCNO_UPDATED_MEMORY_COST (a) < costs[j]
> + && ALLOCNO_UPDATED_MEMORY_COST (a) > 0
> + && costs[j] > 0
> /* Do not remove HARD_REGNO for static chain pointer
> pseudo when non-local goto is used. */
> && ! non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a)))
> @@ -1919,6 +1923,181 @@ spill_soft_conflicts (ira_allocno_t a, bitmap allocnos_to_spill,
> }
> }
>
> +/* Form register pair for adjacent memory addresses access allocno. */
> +static int
> +form_register_pairs (ira_allocno_t a, int regno, HARD_REG_SET *conflicting_regs)
> +{
> + int n = ALLOCNO_NUM_OBJECTS (a);
> + int best_hard_regno = -1;
> + for (int i = 0; i < n; i++)
> + {
> + ira_object_t obj = ALLOCNO_OBJECT (a, i);
> + ira_object_t conflict_obj;
> + ira_object_conflict_iterator oci;
> +
> + if (OBJECT_CONFLICT_ARRAY (obj) == NULL)
> + {
> + continue;
> + }
> + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci)
> + {
> + ira_allocno_t conflict_a = OBJECT_ALLOCNO (conflict_obj);
> +
> + machine_mode mode = ALLOCNO_MODE (a);
> + machine_mode confl_mode = ALLOCNO_MODE (conflict_a);
> + int a_nregs = ira_reg_class_max_nregs[ALLOCNO_CLASS(a)][mode];
> + int cl = ALLOCNO_CLASS (conflict_a);
> + int conf_nregs = ira_reg_class_max_nregs[cl][confl_mode];
> + HARD_REG_SET profitable_regs
> + = ALLOCNO_COLOR_DATA (a)->profitable_hard_regs;
> +
> + if (mode != confl_mode && a_nregs < conf_nregs)
> + {
> + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) == 0)
> + {
> + enum reg_class aclass = ALLOCNO_CLASS (a);
> +
> + if (regno < ira_class_hard_regs[aclass][0])
> + regno = ira_class_hard_regs[aclass][0];
> +
> + if (ALLOCNO_HARD_REGNO (conflict_a) > 0)
> + best_hard_regno = ALLOCNO_HARD_REGNO (conflict_a) + 1;
> + else
> + best_hard_regno = regno + 1;
> +
> + if (ALLOCNO_HARD_REGNO (conflict_a) < 0)
> + {
> + if (check_hard_reg_p (a, best_hard_regno, conflicting_regs,
> + profitable_regs))
> + {
> + if (best_hard_regno % 2 == 0)
> + {
> + int hard_reg = ira_class_hard_regs[aclass][0];
> + if (best_hard_regno - 1 < hard_reg)
> + return best_hard_regno + 1;
> + else
> + return best_hard_regno - 1;
> + }
> + return best_hard_regno;
> + }
> + else return -1;
> + }
> + else return best_hard_regno;
> + }
> +
> + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) != 0
> + && DF_REG_DEF_COUNT (ALLOCNO_REGNO (conflict_a)) == 0)
> + {
> + best_hard_regno = ALLOCNO_HARD_REGNO (conflict_a) - 1;
> + if (check_hard_reg_p (a, best_hard_regno, conflicting_regs,
> + profitable_regs))
> + {
> + return best_hard_regno;
> + }
> + }
> + else if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) != 0)
> + {
> + best_hard_regno = ALLOCNO_HARD_REGNO (conflict_a) + 2;
> +
> + if (check_hard_reg_p (a, best_hard_regno, conflicting_regs,
> + profitable_regs))
> + {
> + return best_hard_regno;
> + }
> + else if (ira_class_hard_regs[ALLOCNO_CLASS (a)][0] <= (regno + 1)
> + && check_hard_reg_p (a, regno + 1, conflicting_regs,
> + profitable_regs))
> + return regno+1;
> +
> + else return -1;
> + }
> + }
> + else if (mode != confl_mode && a_nregs > conf_nregs)
> + {
> + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (conflict_a)) == 0)
> + {
> + enum reg_class aclass = ALLOCNO_CLASS (a);
> +
> + if (regno < ira_class_hard_regs[aclass][0])
> + regno = ira_class_hard_regs[aclass][0];
> + if (ALLOCNO_ASSIGNED_P (conflict_a)
> + && ALLOCNO_HARD_REGNO (conflict_a) > 0)
> + {
> + best_hard_regno = ALLOCNO_HARD_REGNO (conflict_a) - 1;
> + return best_hard_regno;
> + }
> + else
> + best_hard_regno = regno;
> +
> + if (check_hard_reg_p (a, best_hard_regno, conflicting_regs,
> + profitable_regs))
> + {
> + if (best_hard_regno % 2 != 0)
> + {
> + return best_hard_regno;
> + }
> + return best_hard_regno;
> + }
> + }
> + }
> + else
> + {
> + if (ALLOCNO_HARD_REGNO (conflict_a) > 0
> + && DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) != 0
> + && DF_REG_DEF_COUNT (ALLOCNO_REGNO (conflict_a)) == 0)
> + {
> + if (ALLOCNO_ASSIGNED_P (conflict_a))
> + best_hard_regno = ALLOCNO_HARD_REGNO (conflict_a) + 1;
> + else
> + best_hard_regno = regno;
> +
> + if (check_hard_reg_p (a, best_hard_regno, conflicting_regs,
> + profitable_regs))
> + {
> + if (best_hard_regno % 2 != 0)
> + {
> + return best_hard_regno ;
> + }
> + return best_hard_regno;
> + }
> +
> + int i = 0;
> + enum reg_class aclass = ALLOCNO_CLASS (a);
> + int class_size = ira_class_hard_regs_num[aclass];
> + while (i < best_hard_regno)
> + {
> + int last_hard_regno
> + = ira_class_hard_regs[aclass][class_size - 1];
> + if ((i + best_hard_regno) <= last_hard_regno
> + && check_hard_reg_p (a, best_hard_regno + i, conflicting_regs,
> + profitable_regs))
> + return best_hard_regno + i;
> + ++i;
> + }
> +
> + best_hard_regno -= 3;
> + i = 0;
> +
> + while (i < best_hard_regno)
> + {
> + int hard_reg
> + = ira_class_hard_regs[aclass][0];
> + if ((best_hard_regno - i) >= hard_reg
> + && check_hard_reg_p (a, best_hard_regno - i, conflicting_regs,
> + profitable_regs))
> + return best_hard_regno - i;
> + ++i;
> + }
> +
> + return -1;
> +
> + }
> + }
> + }
> + }
> + return -1;
> +}
> +
> /* Choose a hard register for allocno A. If RETRY_P is TRUE, it means
> that the function called from function
> `ira_reassign_conflict_allocnos' and `allocno_reload_assign'. In
> @@ -1974,6 +2153,13 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> #ifdef STACK_REGS
> no_stack_reg_p = false;
> #endif
> + int maxim_regno = 0;
> + for (i = 0; i < class_size; i++)
> + {
> + if (ira_class_hard_regs[aclass][i] > maxim_regno)
> + maxim_regno = ira_class_hard_regs[aclass][i];
> + }
> +
> if (! retry_p)
> start_update_cost ();
> mem_cost += ALLOCNO_UPDATED_MEMORY_COST (a);
> @@ -2078,7 +2264,9 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> }
> else
> {
> - if (conflict_nregs == n_objects && conflict_nregs > 1)
> + int num = OBJECT_SUBWORD (conflict_obj);
> +
> + if (conflict_nregs == n_objects)
> {
> int num = OBJECT_SUBWORD (conflict_obj);
>
> @@ -2090,8 +2278,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> hard_regno + num);
> }
> else
> - conflicting_regs[word]
> - |= ira_reg_mode_hard_regset[hard_regno][mode];
> + {
> + SET_HARD_REG_BIT (conflicting_regs[word],
> + hard_regno + num);
> + conflicting_regs[word]
> + |= ira_reg_mode_hard_regset[hard_regno][mode];
> + }
> if (hard_reg_set_subset_p (profitable_hard_regs,
> conflicting_regs[word]))
> goto fail;
> @@ -2185,6 +2377,20 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> }
> if (min_cost > cost)
> min_cost = cost;
> +
> + int reg_pair = form_register_pairs (a, hard_regno, conflicting_regs);
> +
> + if (reg_pair > 0)
> + {
> + if (reg_pair >= ira_class_hard_regs[aclass][0]
> + && reg_pair < maxim_regno)
> + {
> + min_full_cost = full_cost;
> + best_hard_regno = reg_pair;
> + break;
> + }
> + }
> +
> if (min_full_cost > full_cost)
> {
> min_full_cost = full_cost;
> @@ -2196,7 +2402,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p)
> }
> if (internal_flag_ira_verbose > 5 && ira_dump_file != NULL)
> fprintf (ira_dump_file, "\n");
> - if (min_full_cost > mem_cost
> + if (best_hard_regno < 0 && min_full_cost > mem_cost
> /* Do not spill static chain pointer pseudo when non-local goto
> is used. */
> && ! non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a)))
> @@ -2473,6 +2679,8 @@ init_allocno_threads (void)
> /* Set up initial thread data: */
> ALLOCNO_COLOR_DATA (a)->first_thread_allocno
> = ALLOCNO_COLOR_DATA (a)->next_thread_allocno = a;
> + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) == 0)
> + ALLOCNO_FREQ (a) += ALLOCNO_FREQ (a);
> ALLOCNO_COLOR_DATA (a)->thread_freq = ALLOCNO_FREQ (a);
> ALLOCNO_COLOR_DATA (a)->hard_reg_prefs = 0;
> for (pref = ALLOCNO_PREFS (a); pref != NULL; pref = pref->next_pref)
> @@ -3315,6 +3523,10 @@ improve_allocation (void)
> }
> min_cost = INT_MAX;
> best = -1;
> +
> + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) == 0)
> + continue;
> +
> /* Now we choose hard register for A which results in highest
> allocation cost improvement. */
> for (j = 0; j < class_size; j++)
> diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc
> index 7aa210e986f..332508044f2 100644
> --- a/gcc/lra-assigns.cc
> +++ b/gcc/lra-assigns.cc
> @@ -1131,6 +1131,95 @@ assign_hard_regno (int hard_regno, int regno)
> /* Array used for sorting different pseudos. */
> static int *sorted_pseudos;
>
> +/* Skip reasign the register assignment with register pair adjacent
> + memory access allocno. */
> +static bool
> +can_reassign (HARD_REG_SET conflict_set, int hard_regno,
> + machine_mode mode, int regno, int max_regno)
> +{
> + int end_regno = end_hard_regno (mode, hard_regno);
> + int reg = hard_regno;
> +
> + while (++reg < end_regno)
> + {
> + if (TEST_HARD_REG_BIT (conflict_set, reg))
> + {
> + for (int k = FIRST_PSEUDO_REGISTER ; k < max_regno; k++)
> + {
> + machine_mode mode = lra_reg_info[regno].biggest_mode;
> + machine_mode confl_mode = lra_reg_info[k].biggest_mode;
> + if (reg == reg_renumber[k] && mode != confl_mode)
> + {
> + int nregs = hard_regno_nregs (hard_regno, mode);
> + int conf_nregs = hard_regno_nregs (hard_regno, confl_mode);
> + enum reg_class cl1 = lra_get_allocno_class (regno);
> + enum reg_class cl2 = lra_get_allocno_class (k);
> + int cl1_num = ira_class_hard_regs_num[cl1];
> + int cl2_num = ira_class_hard_regs_num[cl2];
> +
> + if (cl1 == cl2 && cl1_num == cl2_num
> + && nregs > conf_nregs)
> + {
> + lra_reg_info[regno].pseudo_conflict = true;;
> + return false;
> + }
> + }
> + }
> + }
> + }
> +
> + reg = hard_regno;
> +
> + if ((reg - 1) >= ira_class_hard_regs[lra_get_allocno_class (regno)][0])
> + if (TEST_HARD_REG_BIT (conflict_set, reg-1))
> + {
> + for (int k = FIRST_PSEUDO_REGISTER ; k < max_regno; k++)
> + {
> + machine_mode mode = lra_reg_info[regno].biggest_mode;
> + machine_mode confl_mode = lra_reg_info[k].biggest_mode;
> +
> + if ((reg - 1) == reg_renumber[k] && mode != confl_mode)
> + {
> + machine_mode mode = lra_reg_info[regno].biggest_mode;
> + machine_mode confl_mode = lra_reg_info[k].biggest_mode;
> + int nregs = hard_regno_nregs (hard_regno, mode);
> + int conf_nregs = hard_regno_nregs (hard_regno, confl_mode);
> + enum reg_class cl1 = lra_get_allocno_class (regno);
> + enum reg_class cl2 = lra_get_allocno_class (k);
> + int cl1_num = ira_class_hard_regs_num[cl1];
> + int cl2_num = ira_class_hard_regs_num[cl2];
> +
> + if (cl1 == cl2 && cl1 != GENERAL_REGS
> + && cl1_num == cl2_num
> + && nregs < conf_nregs)
> + {
> + bitmap_iterator bi;
> + unsigned int uid;
> + EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_bitmap,
> + 0, uid, bi)
> + {
> + struct lra_insn_reg *ir;
> +
> + for (ir = lra_get_insn_regs (uid); ir != NULL;
> + ir = ir->next)
> + if (ir->regno >= FIRST_PSEUDO_REGISTER)
> + if (ir->regno == k)
> + {
> + if (lra_reg_info[k].pseudo_conflict)
> + return false;
> +
> + lra_reg_info[k].pseudo_conflict = true;;
> + return false;
> + }
> + }
> + }
> + }
> + }
> + }
> +
> + return true;
> +}
> +
> /* The constraints pass is allowed to create equivalences between
> pseudos that make the current allocation "incorrect" (in the sense
> that pseudos are assigned to hard registers from their own conflict
> @@ -1221,13 +1310,13 @@ setup_live_pseudos_and_spill_after_risky_transforms (bitmap
> val = lra_reg_info[regno].val;
> offset = lra_reg_info[regno].offset;
> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
> + {
> if (!lra_reg_val_equal_p (conflict_regno, val, offset)
> /* If it is multi-register pseudos they should start on
> the same hard register. */
> || hard_regno != reg_renumber[conflict_regno])
> {
> int conflict_hard_regno = reg_renumber[conflict_regno];
> -
> biggest_mode = lra_reg_info[conflict_regno].biggest_mode;
> biggest_nregs = hard_regno_nregs (conflict_hard_regno,
> biggest_mode);
> @@ -1240,6 +1329,12 @@ setup_live_pseudos_and_spill_after_risky_transforms (bitmap
> conflict_hard_regno
> - (WORDS_BIG_ENDIAN ? nregs_diff : 0));
> }
> + }
> + bool reassign = can_reassign (conflict_set, hard_regno,
> + mode, regno, max_regno);
> + if (!reassign)
> + continue;
> +
> if (! overlaps_hard_reg_set_p (conflict_set, mode, hard_regno))
> {
> update_lives (regno, false);
> @@ -1393,7 +1488,9 @@ assign_by_spills (void)
> for (n = 0, i = lra_constraint_new_regno_start; i < max_regno; i++)
> if (reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
> && regno_allocno_class_array[i] != NO_REGS)
> + {
> sorted_pseudos[n++] = i;
> + }
> bitmap_initialize (&insn_conflict_pseudos, ®_obstack);
> bitmap_initialize (&spill_pseudos_bitmap, ®_obstack);
> bitmap_initialize (&best_spill_pseudos_bitmap, ®_obstack);
> @@ -1415,6 +1512,10 @@ assign_by_spills (void)
> for (i = 0; i < n; i++)
> {
> regno = sorted_pseudos[i];
> +
> + if (lra_reg_info[i].pseudo_conflict)
> + continue;
> +
> if (reg_renumber[regno] >= 0)
> continue;
> if (lra_dump_file != NULL)
> @@ -1541,7 +1642,11 @@ assign_by_spills (void)
> || bitmap_bit_p (&lra_optional_reload_pseudos, i))
> && reg_renumber[i] < 0 && lra_reg_info[i].nrefs != 0
> && regno_allocno_class_array[i] != NO_REGS)
> + {
> + if (lra_reg_info[i].pseudo_conflict)
> + continue;
> sorted_pseudos[n++] = i;
> + }
> bitmap_clear (&do_not_assign_nonreload_pseudos);
> if (n != 0 && lra_dump_file != NULL)
> fprintf (lra_dump_file, " Reassigning non-reload pseudos\n");
> @@ -1638,17 +1743,6 @@ lra_assign (bool &fails_p)
> bitmap_initialize (&all_spilled_pseudos, ®_obstack);
> create_live_range_start_chains ();
> setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseudos);
> - if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking)
> - /* Check correctness of allocation but only when there are no hard reg
> - splits and asm errors as in the case of errors explicit insns involving
> - hard regs are added or the asm is removed and this can result in
> - incorrect allocation. */
> - for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
> - if (lra_reg_info[i].nrefs != 0
> - && reg_renumber[i] >= 0
> - && overlaps_hard_reg_set_p (lra_reg_info[i].conflict_hard_regs,
> - PSEUDO_REGNO_MODE (i), reg_renumber[i]))
> - gcc_unreachable ();
> /* Setup insns to process on the next constraint pass. */
> bitmap_initialize (&changed_pseudo_bitmap, ®_obstack);
> init_live_reload_and_inheritance_pseudos ();
> diff --git a/gcc/lra-int.h b/gcc/lra-int.h
> index 5cdf92be7fc..9e590d8fb74 100644
> --- a/gcc/lra-int.h
> +++ b/gcc/lra-int.h
> @@ -95,6 +95,8 @@ public:
> *non-debug* insns. */
> int nrefs, freq;
> int last_reload;
> + /* Skip reasign register pair with adjacent memory access allocno. */
> + bool pseudo_conflict;
> /* rtx used to undo the inheritance. It can be non-null only
> between subsequent inheritance and undo inheritance passes. */
> rtx restore_rtx;
> diff --git a/gcc/lra.cc b/gcc/lra.cc
> index 69081a8e025..5cc97ce7506 100644
> --- a/gcc/lra.cc
> +++ b/gcc/lra.cc
> @@ -1359,6 +1359,7 @@ initialize_lra_reg_info_element (int i)
> lra_reg_info[i].nrefs = lra_reg_info[i].freq = 0;
> lra_reg_info[i].last_reload = 0;
> lra_reg_info[i].restore_rtx = NULL_RTX;
> + lra_reg_info[i].pseudo_conflict = false;
> lra_reg_info[i].val = get_new_reg_value ();
> lra_reg_info[i].offset = 0;
> lra_reg_info[i].copies = NULL;
> diff --git a/gcc/testsuite/g++.target/powerpc/vecload.C b/gcc/testsuite/g++.target/powerpc/vecload.C
> new file mode 100644
> index 00000000000..c523572cf3c
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/vecload.C
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <altivec.h>
> +
> +void
> +foo (__vector_quad *dst, vector unsigned char *ptr, vector unsigned char src)
> +{
> + __vector_quad acc;
> + __builtin_mma_xvf32ger(&acc, src, ptr[0]);
> + __builtin_mma_xvf32gerpp(&acc, src, ptr[1]);
> + *dst = acc;
> +}
> +/* { dg-final { scan-assembler {\mlxvp\M} } } */
> diff --git a/gcc/testsuite/g++.target/powerpc/vecload1.C b/gcc/testsuite/g++.target/powerpc/vecload1.C
> new file mode 100644
> index 00000000000..d10ff0cdf36
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/vecload1.C
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <altivec.h>
> +
> +void
> +foo2 ()
> +{
> + __vector_quad *dst1;
> + __vector_quad *dst2;
> + vector unsigned char src;
> + __vector_quad acc;
> + vector unsigned char *ptr;
> + __builtin_mma_xvf32ger(&acc, src, ptr[0]);
> + __builtin_mma_xvf32gerpp(&acc, src, ptr[1]);
> + *dst1 = acc;
> + __builtin_mma_xvf32ger(&acc, src, ptr[2]);
> + __builtin_mma_xvf32gerpp(&acc, src, ptr[3]);
> + *dst2 = acc;
> +}
> +/* { dg-final { scan-assembler {\mlxvp\M} } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> index 69ee826e1be..02590216320 100644
> --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c
> @@ -258,8 +258,8 @@ foo13b (__vector_quad *dst, __vector_quad *src, vec_t *vec)
> dst[13] = acc;
> }
>
> -/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */
> -/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */
> +/* { dg-final { scan-assembler-times {\mlxv\M} 12 } } */
> +/* { dg-final { scan-assembler-times {\mlxvp\M} 26 } } */
> /* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */
> /* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */
> /* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */
> --
> 2.39.3
>
>
>
>
>
>
>
>
>
>
>
>
next prev parent reply other threads:[~2024-01-15 9:38 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-14 15:25 Ajit Agarwal
2024-01-15 9:03 ` Ajit Agarwal
2024-01-15 9:33 ` Richard Biener [this message]
2024-01-15 12:44 ` Ajit Agarwal
2024-01-15 12:55 ` Ajit Agarwal
2024-01-15 22:22 ` Ajit Agarwal
2024-01-17 7:02 ` Kewen.Lin
2024-01-17 9:34 ` Ajit Agarwal
2024-01-17 14:28 ` Michael Matz
2024-01-18 12:17 ` Ajit Agarwal
2024-01-19 4:19 ` Michael Meissner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAFiYyc2hUn=MjrsrCExCcZJRp2=AAAidCzx_4YHdCVB20RFm5g@mail.gmail.com' \
--to=richard.guenther@gmail.com \
--cc=aagarwa1@linux.ibm.com \
--cc=bergner@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=linkw@linux.ibm.com \
--cc=meissner@linux.ibm.com \
--cc=richard.sandiford@arm.com \
--cc=segher@kernel.crashing.org \
--cc=vmakarov.gcc@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).