From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22f.google.com (mail-lj1-x22f.google.com [IPv6:2a00:1450:4864:20::22f]) by sourceware.org (Postfix) with ESMTPS id 06B01385802A for ; Mon, 15 Jan 2024 09:38:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 06B01385802A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 06B01385802A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::22f ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705311537; cv=none; b=QKIyDLUEdCaguRAZo7+pXDHqw/4zbVfOPCmpliMTG1lg5ma5gqiYc/k3lP6ubaPDkmBrJYJ2i1BlbKcv4OSYZJ0k6ycilVfaJ7IaSN0DyvbiVmQ5tG9FyEUo9nF/l73EDOz0Be17UEE4l4lWoFHb3mdfXd41dVdYAPchVwed0/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705311537; c=relaxed/simple; bh=chE8TImgUCZM/CSsvR1AbXkDgxbBk6ehxmOFhbVm790=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Y9kLU7NPg2uV0eI1gfgzlKv5OwcCtYtzm8fWj/Jog8N6unBWTjI0mge6pjx4idn2gXV//cYpkBlToZ6eBzx7YxtWw7Z1csDkIYG2J8EjEuwTqu4yA/AKfXpmoMODPM5kY9n86FGrIQcj8Lwe1Oi/fcYO2f1fzeXZHPEGbVgONfI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2cd46e7ae8fso93598641fa.1 for ; Mon, 15 Jan 2024 01:38:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705311530; x=1705916330; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pLbqWcOafDUSeUjz2ecWhuQZxjTnggArHUha+yMSp9I=; b=lUJXc2NXA8OUxU1/7QA+Kl+z2O7+lbHFSCKTLzi1O7gaSvl3moHUR+9+6BW4K7sZSW QmmP+TqFN+wCxKaLv6vWQnW0d+ccqhOl9OLiJLEhscx+Xj2nKLicKrXrzWk6BtDxhygk oAyRkYGtP9URJifGiar7ba6WxeUl0jn5GmgZ40VijxvhzVsA4KboZN5CnT/6AeVfB3hy F066ZSDtgt894Eq+YgS0IBbR20ivOeoBmGWvMQfZ1ZCqKP7Nqd12ydoL+fkCx99ema07 EnE3qDslhIkUQLmxNABcgGsrngXbCGSYuXevW6YRs2EMRW5sti3IVDocayK5RQNXmivm ZhLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705311530; x=1705916330; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pLbqWcOafDUSeUjz2ecWhuQZxjTnggArHUha+yMSp9I=; b=fOZ+CGbEbUmwXZC/k7fPgtNlJXUmLmM5FM1PdYxg/fr0nVpKHBFkTEL8dkEqVvlgSI k5BY2SaQcgDMVKj09NVNsalL6n8Id6O5BxtZKLHqMVxLkcxaSD4QrV7035/xhIqU3miF AAtihCo/axCkMPwCLEvaQb7sQDqAq7WBcQWI4HA5e+S9Wr07ZnBzMed5gMtKJZtCZhVd cOGUvATmbNgfZb1cO6MWLhSIRXh/t4Lpe6rU9L7DUUoRySI6mu1VeYh1etEWYy1g8mqq 2cin5ACbtEIw/1gtRsMaNqTj34HoR3j+lqSCoq3jrG3Gtio0BZvAjC3TeL4bCLyUT89C yTWg== X-Gm-Message-State: AOJu0Yymwx1gcCA4pGg3bZuIlxloRHS4ls6hAtd0rYhRVql+Zzyb1MI2 JyW12ktn2Uqgnjpt1H1plOTN9VOzJd16svshDXY= X-Google-Smtp-Source: AGHT+IHn50dHzKqzS5B/SM8a54BBm1BIQ0l2rw/FYZ4wFm4htBPYSFqHU7UfFb/VrxrmCBDoWY38fh8XoyRZyOamznE= X-Received: by 2002:a2e:7d18:0:b0:2cd:1d5d:3238 with SMTP id y24-20020a2e7d18000000b002cd1d5d3238mr1111994ljc.43.1705311529925; Mon, 15 Jan 2024 01:38:49 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Mon, 15 Jan 2024 10:33:32 +0100 Message-ID: Subject: Re: [PATCH V1] rs6000: New pass for replacement of adjacent (load) lxv with lxvp To: Ajit Agarwal Cc: "Kewen.Lin" , Vladimir Makarov , Michael Meissner , Segher Boessenkool , Peter Bergner , David Edelsohn , gcc-patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_LINEPADDING,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Jan 14, 2024 at 4:29=E2=80=AFPM Ajit Agarwal wrote: > > Hello All: > > This patch add the vecload pass to replace adjacent memory accesses lxv w= ith lxvp > instructions. This pass is added before ira pass. > > vecload pass removes one of the defined adjacent lxv (load) and replace w= ith lxvp. > Due to removal of one of the defined loads the allocno is has only uses b= ut > not defs. > > Due to this IRA pass doesn't assign register pairs like registers in sequ= ence. > Changes are made in IRA register allocator to assign sequential registers= to > adjacent loads. > > Some of the registers are cleared and are not set as profitable registers= due > to zero cost is greater than negative costs and checks are added to compa= re > positive costs. > > LRA register is changed not to reassign them to different register and fo= rm > the sequential register pairs intact. > > > contrib/check_GNU_style.sh run on patch looks good. > > Bootstrapped and regtested for powerpc64-linux-gnu. > > Spec2017 benchmarks are run and I get impressive benefits for some of the= FP > benchmarks. I want to point out the aarch64 target recently got a ld/st fusion pass which sounds related. It would be nice to have at least common infrastructure for this (the aarch64 one also looks quite more powerful) > Thanks & Regards > Ajit > > > rs6000: New pass for replacement of adjacent lxv with lxvp. > > New pass to replace adjacent memory addresses lxv with lxvp. > This pass is registered before ira rtl pass. > > 2024-01-14 Ajit Kumar Agarwal > > gcc/ChangeLog: > > * config/rs6000/rs6000-passes.def: Registered vecload pass. > * config/rs6000/rs6000-vecload-opt.cc: Add new pass. > * config.gcc: Add new executable. > * config/rs6000/rs6000-protos.h: Add new prototype for vecload > pass. > * config/rs6000/rs6000.cc: Add new prototype for vecload pass. > * config/rs6000/t-rs6000: Add new rule. > * ira-color.cc: Form register pair with adjacent loads. > * lra-assigns.cc: Skip modifying register pair assignment. > * lra-int.h: Add pseudo_conflict field in lra_reg_p structure. > * lra.cc: Initialize pseudo_conflict field. > * ira-build.cc: Use of REG_FREQ. > > gcc/testsuite/ChangeLog: > > * g++.target/powerpc/vecload.C: New test. > * g++.target/powerpc/vecload1.C: New test. > * gcc.target/powerpc/mma-builtin-1.c: Modify test. > --- > gcc/config.gcc | 4 +- > gcc/config/rs6000/rs6000-passes.def | 4 + > gcc/config/rs6000/rs6000-protos.h | 5 +- > gcc/config/rs6000/rs6000-vecload-opt.cc | 432 ++++++++++++++++++ > gcc/config/rs6000/rs6000.cc | 8 +- > gcc/config/rs6000/t-rs6000 | 5 + > gcc/ira-color.cc | 220 ++++++++- > gcc/lra-assigns.cc | 118 ++++- > gcc/lra-int.h | 2 + > gcc/lra.cc | 1 + > gcc/testsuite/g++.target/powerpc/vecload.C | 15 + > gcc/testsuite/g++.target/powerpc/vecload1.C | 22 + > .../gcc.target/powerpc/mma-builtin-1.c | 4 +- > 13 files changed, 816 insertions(+), 24 deletions(-) > create mode 100644 gcc/config/rs6000/rs6000-vecload-opt.cc > create mode 100644 gcc/testsuite/g++.target/powerpc/vecload.C > create mode 100644 gcc/testsuite/g++.target/powerpc/vecload1.C > > diff --git a/gcc/config.gcc b/gcc/config.gcc > index f0676c830e8..4cf15e807de 100644 > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -518,7 +518,7 @@ or1k*-*-*) > ;; > powerpc*-*-*) > cpu_type=3Drs6000 > - extra_objs=3D"rs6000-string.o rs6000-p8swap.o rs6000-logue.o" > + extra_objs=3D"rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs60= 00-vecload-opt.o" > extra_objs=3D"${extra_objs} rs6000-call.o rs6000-pcrel-opt.o" > extra_objs=3D"${extra_objs} rs6000-builtins.o rs6000-builtin.o" > extra_headers=3D"ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" > @@ -555,7 +555,7 @@ riscv*) > ;; > rs6000*-*-*) > extra_options=3D"${extra_options} g.opt fused-madd.opt rs6000/rs6= 000-tables.opt" > - extra_objs=3D"rs6000-string.o rs6000-p8swap.o rs6000-logue.o" > + extra_objs=3D"rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs60= 00-vecload-opt.o" > extra_objs=3D"${extra_objs} rs6000-call.o rs6000-pcrel-opt.o" > target_gtfiles=3D"$target_gtfiles \$(srcdir)/config/rs6000/rs6000= -logue.cc \$(srcdir)/config/rs6000/rs6000-call.cc" > target_gtfiles=3D"$target_gtfiles \$(srcdir)/config/rs6000/rs6000= -pcrel-opt.cc" > diff --git a/gcc/config/rs6000/rs6000-passes.def b/gcc/config/rs6000/rs60= 00-passes.def > index ca899d5f7af..8bd172dd779 100644 > --- a/gcc/config/rs6000/rs6000-passes.def > +++ b/gcc/config/rs6000/rs6000-passes.def > @@ -29,6 +29,10 @@ along with GCC; see the file COPYING3. If not see > for loads and stores. */ > INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps); > > + /* Pass to replace adjacent memory addresses lxv instruction with lxvp > + instruction. */ > + INSERT_PASS_BEFORE (pass_ira, 1, pass_analyze_vecload); > + > /* Pass to do the PCREL_OPT optimization that combines the load of an > external symbol's address along with a single load or store using t= hat > address as a base register. */ > diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000= -protos.h > index f70118ea40f..83ee773a6f8 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -343,12 +343,15 @@ namespace gcc { class context; } > class rtl_opt_pass; > > extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *); > +extern rtl_opt_pass *make_pass_analyze_vecload (gcc::context *); > extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *); > extern bool rs6000_sum_of_two_registers_p (const_rtx expr); > extern bool rs6000_quadword_masked_address_p (const_rtx exp); > extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx); > extern rtx rs6000_gen_stvx (enum machine_mode, rtx, rtx); > - > +extern bool mode_supports_dq_form (machine_mode); > +extern bool get_memref_parts (rtx, rtx *, HOST_WIDE_INT *, HOST_WIDE_INT= *); > +extern rtx adjacent_mem_locations (rtx, rtx); > extern void rs6000_emit_xxspltidp_v2df (rtx, long value); > extern gimple *currently_expanding_gimple_stmt; > extern bool rs6000_opaque_type_invalid_use_p (gimple *); > diff --git a/gcc/config/rs6000/rs6000-vecload-opt.cc b/gcc/config/rs6000/= rs6000-vecload-opt.cc > new file mode 100644 > index 00000000000..d9c11a6caf1 > --- /dev/null > +++ b/gcc/config/rs6000/rs6000-vecload-opt.cc > @@ -0,0 +1,432 @@ > +/* Subroutines used to replace lxv with lxvp > + for TARGET_POWER10 and TARGET_VSX, > + > + Copyright (C) 2020-2023 Free Software Foundation, Inc. > + Contributed by Ajit Kumar Agarwal . > + > + This file is part of GCC. > + > + GCC is free software; you can redistribute it and/or modify it > + under the terms of the GNU General Public License as published > + by the Free Software Foundation; either version 3, or (at your > + option) any later version. > + > + GCC is distributed in the hope that it will be useful, but WITHOUT > + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY > + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public > + License for more details. > + > + You should have received a copy of the GNU General Public License > + along with GCC; see the file COPYING3. If not see > + . */ > + > +#define IN_TARGET_CODE 1 > +#include "config.h" > +#include "system.h" > +#include "coretypes.h" > +#include "backend.h" > +#include "target.h" > +#include "rtl.h" > +#include "tree-pass.h" > +#include "df.h" > +#include "dumpfile.h" > +#include "rs6000-internal.h" > +#include "rs6000-protos.h" > + > +/* Return false if dependent rtx LOC is SUBREG. */ > +static bool > +is_feasible (rtx_insn *insn) > +{ > + df_ref use; > + df_insn_info *insn_info =3D DF_INSN_INFO_GET (insn); > + FOR_EACH_INSN_INFO_DEF (use, insn_info) > + { > + struct df_link *def_link =3D DF_REF_CHAIN (use); > + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link-= >ref)) > + continue; > + while (def_link && def_link->ref) > + { > + rtx *loc =3D DF_REF_LOC (def_link->ref); > + if (!loc || *loc =3D=3D NULL_RTX) > + return false; > + if (GET_CODE (*loc) =3D=3D SUBREG) > + return false; > + def_link =3D def_link->next; > + } > + } > + return true; > +} > + > +/* df_scan_rescan the unspec instruction where operands > + are reversed. */ > +void set_rescan_for_unspec (rtx_insn *insn) > +{ > + df_ref use; > + df_insn_info *insn_info =3D DF_INSN_INFO_GET (insn); > + rtx_insn *select_insn2; > + FOR_EACH_INSN_INFO_DEF (use, insn_info) > + { > + struct df_link *def_link =3D DF_REF_CHAIN (use); > + while (def_link && def_link->ref) > + { > + select_insn2 =3D DF_REF_INSN (def_link->ref); > + rtx set =3D single_set (select_insn2); > + > + if (set =3D=3D NULL_RTX) > + return; > + > + if (set !=3D NULL_RTX) > + { > + rtx op0 =3D SET_SRC (set); > + if (GET_CODE (op0) !=3D UNSPEC) > + return; > + > + if (GET_CODE (op0) =3D=3D VEC_SELECT > + && GET_CODE (XEXP (op0, 1)) =3D=3D PARALLEL) > + return; > + > + if (GET_CODE (op0) =3D=3D UNSPEC) > + df_insn_rescan (select_insn2); > + } > + def_link =3D def_link->next; > + } > + } > +} > + > +/* Return dependent UNSPEC instruction. */ > +rtx_insn *get_rtx_UNSPEC (rtx_insn *insn) > +{ > + df_ref use; > + df_insn_info *insn_info =3D DF_INSN_INFO_GET (insn); > + rtx_insn *select_insn2; > + FOR_EACH_INSN_INFO_DEF (use, insn_info) > + { > + struct df_link *def_link =3D DF_REF_CHAIN (use); > + while (def_link && def_link->ref) > + { > + select_insn2 =3D DF_REF_INSN (def_link->ref); > + rtx set =3D single_set (select_insn2); > + > + if (set =3D=3D NULL_RTX) > + return 0; > + > + if (set !=3D NULL_RTX) > + { > + rtx op0 =3D SET_SRC (set); > + > + if (GET_CODE (op0) =3D=3D UNSPEC) > + return select_insn2; > + } > + def_link =3D def_link->next; > + } > + } > + return 0; > +} > + > +/* Replace identified lxv with lxvp. > + Bail out if following condition are true: > + > + - dependent instruction of load is vec_select instruction, > + > + - machine mode of unspec is not same as machine mode > + of lxv instruction. > + > + - dependent instruction is not unspec. > + > + - Source operand of unspec is eq instruction. */ > + > +static bool > +replace_lxv_with_lxvp (rtx_insn *insn1, rtx_insn *insn2) > +{ > + rtx body =3D PATTERN (insn1); > + rtx src_exp =3D SET_SRC (body); > + rtx dest_exp =3D SET_DEST (body); > + rtx lxv; > + rtx insn2_body =3D PATTERN (insn2); > + rtx insn2_dest_exp =3D SET_DEST (insn2_body); > + > + if (GET_MODE (src_exp) !=3D GET_MODE (SET_SRC (insn2_body))) > + return false; > + > + if (GET_MODE (dest_exp) =3D=3D TImode) > + return false; > + > + if (!ALTIVEC_OR_VSX_VECTOR_MODE (GET_MODE (dest_exp))) > + return false; > + > + if (!is_feasible (insn1)) > + return false; > + > + if (!is_feasible (insn2)) > + return false; > + > + for (rtx note =3D REG_NOTES (insn1); note; note =3D XEXP (note, 1)) > + if (REG_NOTE_KIND (note) =3D=3D REG_EQUAL > + || REG_NOTE_KIND (note) =3D=3D REG_EQUIV) > + return false; > + > + int no_dep =3D 0; > + df_ref use; > + df_insn_info *insn_info =3D DF_INSN_INFO_GET (insn1); > + rtx_insn *select_insn2; > + > + FOR_EACH_INSN_INFO_DEF (use, insn_info) > + { > + struct df_link *def_link =3D DF_REF_CHAIN (use); > + while (def_link && def_link->ref) > + { > + select_insn2 =3D DF_REF_INSN (def_link->ref); > + rtx set =3D single_set (select_insn2); > + > + if (set =3D=3D NULL_RTX) > + return false; > + > + if (set !=3D NULL_RTX) > + { > + rtx op0 =3D SET_SRC (set); > + > + if (GET_CODE (op0) !=3D UNSPEC) > + return false; > + > + if (GET_CODE (op0) =3D=3D VEC_SELECT > + && GET_CODE (XEXP (op0, 1)) =3D=3D PARALLEL) > + return false; > + > + if (GET_CODE (op0) =3D=3D UNSPEC) > + { > + if (GET_MODE (op0) !=3D XOmode > + && GET_MODE (op0) !=3D GET_MODE (dest_exp)) > + return false; > + > + int nvecs =3D XVECLEN (op0, 0); > + for (int i =3D 0; i < nvecs; i++) > + { > + rtx op; > + op =3D XVECEXP (op0, 0, i); > + > + if (GET_MODE (op) =3D=3D OOmode) > + return false; > + if (GET_CODE (op) =3D=3D EQ) > + return false; > + } > + } > + ++no_dep; > + } > + def_link =3D def_link->next; > + } > + } > + > + rtx_insn *insn =3D get_rtx_UNSPEC (insn1); > + > + if (insn && insn =3D=3D get_rtx_UNSPEC (insn2) && no_dep =3D=3D 1) > + return false; > + > + > + insn_info =3D DF_INSN_INFO_GET (insn2); > + FOR_EACH_INSN_INFO_DEF (use, insn_info) > + { > + struct df_link *def_link =3D DF_REF_CHAIN (use); > + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link-= >ref)) > + continue; > + while (def_link && def_link->ref) > + { > + rtx *loc =3D DF_REF_LOC (def_link->ref); > + *loc =3D dest_exp; > + def_link =3D def_link->next; > + } > + } > + > + insn_info =3D DF_INSN_INFO_GET (insn1); > + FOR_EACH_INSN_INFO_DEF (use, insn_info) > + { > + struct df_link *def_link =3D DF_REF_CHAIN (use); > + if (!def_link || !def_link->ref || DF_REF_IS_ARTIFICIAL (def_link-= >ref)) > + continue; > + while (def_link && def_link->ref) > + { > + rtx *loc =3D DF_REF_LOC (def_link->ref); > + PUT_MODE_RAW (*loc, OOmode); > + *loc =3D insn2_dest_exp; > + def_link =3D def_link->next; > + } > + } > + > + set_rescan_for_unspec (insn1); > + set_rescan_for_unspec (insn2); > + df_insn_rescan (insn1); > + df_insn_rescan (insn2); > + > + PUT_MODE_RAW (src_exp, OOmode); > + PUT_MODE_RAW (dest_exp, OOmode); > + lxv =3D gen_movoo (dest_exp, src_exp); > + rtx_insn *new_insn =3D emit_insn_before (lxv, insn1); > + set_block_for_insn (new_insn, BLOCK_FOR_INSN (insn1)); > + df_insn_rescan (new_insn); > + > + if (dump_file) > + { > + unsigned int new_uid =3D INSN_UID (new_insn); > + fprintf (dump_file, "Replacing lxv %d with lxvp %d\n", > + INSN_UID (insn1), new_uid); > + print_rtl_single (dump_file, new_insn); > + print_rtl_single (dump_file, insn1); > + print_rtl_single (dump_file, insn2); > + > + } > + > + df_insn_delete (insn1); > + remove_insn (insn1); > + df_insn_delete (insn2); > + remove_insn (insn2); > + insn1->set_deleted (); > + insn2->set_deleted (); > + return true; > +} > + > +/* Identify adjacent memory address lxv instruction and > + replace them with lxvp instruction. */ > +unsigned int > +rs6000_analyze_vecload (function *fun) > +{ > + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); > + df_analyze (); > + df_set_flags (DF_DEFER_INSN_RESCAN); > + > + /* Rebuild ud- and du-chains. */ > + df_remove_problem (df_chain); > + df_process_deferred_rescans (); > + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); > + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); > + df_analyze (); > + df_set_flags (DF_DEFER_INSN_RESCAN); > + > + basic_block bb; > + bool changed =3D false; > + rtx_insn *insn, *curr_insn =3D 0; > + rtx_insn *insn1 =3D 0, *insn2 =3D 0; > + bool first_vec_insn =3D false; > + unsigned int regno =3D 0; > + int index =3D -1; > + FOR_ALL_BB_FN (bb, fun) > + { > + index =3D bb->index; > + FOR_BB_INSNS_SAFE (bb, insn, curr_insn) > + { > + if (LABEL_P (insn)) > + continue; > + > + if (NONDEBUG_INSN_P (insn) && GET_CODE (PATTERN (insn)) =3D=3D= SET) > + { > + rtx set =3D single_set (insn); > + rtx src =3D SET_SRC (set); > + machine_mode mode =3D GET_MODE (SET_DEST (set)); > + > + if (MEM_P (src)) > + { > + if (mem_operand_ds_form (src, mode) > + || (mode_supports_dq_form (mode) > + && quad_address_p (XEXP (src, 0), mode, false))) > + { > + if (first_vec_insn) > + { > + first_vec_insn =3D false; > + rtx addr =3D XEXP (src, 0); > + > + if (REG_P (addr)) > + continue; > + > + insn2 =3D insn; > + rtx insn1_src =3D SET_SRC (PATTERN (insn1)); > + > + int offset =3D 0; > + > + if (GET_CODE (addr) =3D=3D PLUS > + && XEXP (addr, 1) > + && !REG_P (XEXP (addr, 1)) > + && CONST_INT_P (XEXP (addr, 1))) > + { > + rtx off =3D XEXP (addr, 1); > + offset =3D INTVAL (off); > + } > + > + if ((offset % 2 =3D=3D 0) > + && adjacent_mem_locations (insn1_src, sr= c) > + =3D=3D insn1_src) > + { > + rtx op0 =3D XEXP (addr, 0); > + > + if (regno =3D=3D REGNO (op0) > + && index =3D=3D bb->index) > + { > + index =3D -1; > + changed > + =3D replace_lxv_with_lxvp (insn1, i= nsn2); > + } > + } > + } > + > + else if (REG_P (XEXP (src, 0)) > + && GET_CODE (XEXP (src, 0)) !=3D PLUS) > + { > + regno =3D REGNO (XEXP (src,0)); > + first_vec_insn =3D true; > + insn1 =3D insn; > + } > + else if (GET_CODE (XEXP (src, 0)) =3D=3D PLUS) > + { > + rtx addr =3D XEXP (src, 0); > + rtx op0 =3D XEXP (addr, 0); > + > + if (REG_P (op0)) > + regno =3D REGNO (op0); > + > + first_vec_insn =3D true; > + insn1 =3D insn; > + } > + } > + } > + } > + } > + } > + return changed; > +} > + > +const pass_data pass_data_analyze_vecload =3D > +{ > + RTL_PASS, /* type */ > + "vecload", /* name */ > + OPTGROUP_NONE, /* optinfo_flags */ > + TV_NONE, /* tv_id */ > + 0, /* properties_required */ > + 0, /* properties_provided */ > + 0, /* properties_destroyed */ > + 0, /* todo_flags_start */ > + TODO_df_finish, /* todo_flags_finish */ > +}; > + > +class pass_analyze_vecload : public rtl_opt_pass > +{ > +public: > + pass_analyze_vecload(gcc::context *ctxt) > + : rtl_opt_pass(pass_data_analyze_vecload, ctxt) > + {} > + > + /* opt_pass methods: */ > + virtual bool gate (function *) > + { > + return (optimize > 0 && TARGET_VSX && TARGET_POWER10); > + } > + > + virtual unsigned int execute (function *fun) > + { > + return rs6000_analyze_vecload (fun); > + } > +}; // class pass_analyze_vecload > + > +rtl_opt_pass * > +make_pass_analyze_vecload (gcc::context *ctxt) > +{ > + return new pass_analyze_vecload (ctxt); > +} > + > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 6b9a40fcc66..5f0ec8239c1 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -387,7 +387,7 @@ mode_supports_vmx_dform (machine_mode mode) > /* Return true if we have D-form addressing in VSX registers. This addr= essing > is more limited than normal d-form addressing in that the offset must= be > aligned on a 16-byte boundary. */ > -static inline bool > +bool > mode_supports_dq_form (machine_mode mode) > { > return ((reg_addr[mode].addr_mask[RELOAD_REG_ANY] & RELOAD_REG_QUAD_OF= FSET) > @@ -1178,6 +1178,8 @@ static bool rs6000_secondary_reload_move (enum rs60= 00_reg_type, > secondary_reload_info *, > bool); > rtl_opt_pass *make_pass_analyze_swaps (gcc::context*); > +rtl_opt_pass *make_pass_analyze_vecload (gcc::context*); > + > > /* Hash table stuff for keeping track of TOC entries. */ > > @@ -18644,7 +18646,7 @@ set_to_load_agen (rtx_insn *out_insn, rtx_insn *i= n_insn) > This function only looks for REG or REG+CONST address forms. > REG+REG address form will return false. */ > > -static bool > +bool > get_memref_parts (rtx mem, rtx *base, HOST_WIDE_INT *offset, > HOST_WIDE_INT *size) > { > @@ -18676,7 +18678,7 @@ get_memref_parts (rtx mem, rtx *base, HOST_WIDE_I= NT *offset, > adjacent, then return the argument that has the lower address. > Otherwise, return NULL_RTX. */ > > -static rtx > +rtx > adjacent_mem_locations (rtx mem1, rtx mem2) > { > rtx reg1, reg2; > diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 > index f183b42ce1d..0b6852f2d38 100644 > --- a/gcc/config/rs6000/t-rs6000 > +++ b/gcc/config/rs6000/t-rs6000 > @@ -35,6 +35,11 @@ rs6000-p8swap.o: $(srcdir)/config/rs6000/rs6000-p8swap= .cc > $(COMPILE) $< > $(POSTCOMPILE) > > +rs6000-vecload-opt.o: $(srcdir)/config/rs6000/rs6000-vecload-opt.cc > + $(COMPILE) $< > + $(POSTCOMPILE) > + > + > rs6000-d.o: $(srcdir)/config/rs6000/rs6000-d.cc > $(COMPILE) $< > $(POSTCOMPILE) > diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc > index 214a4f16d3c..73e9891a529 100644 > --- a/gcc/ira-color.cc > +++ b/gcc/ira-color.cc > @@ -1047,6 +1047,8 @@ setup_profitable_hard_regs (void) > continue; > data =3D ALLOCNO_COLOR_DATA (a); > if (ALLOCNO_UPDATED_HARD_REG_COSTS (a) =3D=3D NULL > + && ALLOCNO_CLASS_COST (a) > 0 > + && ALLOCNO_MEMORY_COST (a) > 0 > && ALLOCNO_CLASS_COST (a) > ALLOCNO_MEMORY_COST (a) > /* Do not empty profitable regs for static chain pointer > pseudo when non-local goto is used. */ > @@ -1131,6 +1133,8 @@ setup_profitable_hard_regs (void) > hard_regno)) > continue; > if (ALLOCNO_UPDATED_MEMORY_COST (a) < costs[j] > + && ALLOCNO_UPDATED_MEMORY_COST (a) > 0 > + && costs[j] > 0 > /* Do not remove HARD_REGNO for static chain pointer > pseudo when non-local goto is used. */ > && ! non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a= ))) > @@ -1919,6 +1923,181 @@ spill_soft_conflicts (ira_allocno_t a, bitmap all= ocnos_to_spill, > } > } > > +/* Form register pair for adjacent memory addresses access allocno. */ > +static int > +form_register_pairs (ira_allocno_t a, int regno, HARD_REG_SET *conflicti= ng_regs) > +{ > + int n =3D ALLOCNO_NUM_OBJECTS (a); > + int best_hard_regno =3D -1; > + for (int i =3D 0; i < n; i++) > + { > + ira_object_t obj =3D ALLOCNO_OBJECT (a, i); > + ira_object_t conflict_obj; > + ira_object_conflict_iterator oci; > + > + if (OBJECT_CONFLICT_ARRAY (obj) =3D=3D NULL) > + { > + continue; > + } > + FOR_EACH_OBJECT_CONFLICT (obj, conflict_obj, oci) > + { > + ira_allocno_t conflict_a =3D OBJECT_ALLOCNO (conflict_obj); > + > + machine_mode mode =3D ALLOCNO_MODE (a); > + machine_mode confl_mode =3D ALLOCNO_MODE (conflict_a); > + int a_nregs =3D ira_reg_class_max_nregs[ALLOCNO_CLASS(a)][mode]= ; > + int cl =3D ALLOCNO_CLASS (conflict_a); > + int conf_nregs =3D ira_reg_class_max_nregs[cl][confl_mode]; > + HARD_REG_SET profitable_regs > + =3D ALLOCNO_COLOR_DATA (a)->profitable_hard_regs; > + > + if (mode !=3D confl_mode && a_nregs < conf_nregs) > + { > + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) =3D=3D 0) > + { > + enum reg_class aclass =3D ALLOCNO_CLASS (a); > + > + if (regno < ira_class_hard_regs[aclass][0]) > + regno =3D ira_class_hard_regs[aclass][0]; > + > + if (ALLOCNO_HARD_REGNO (conflict_a) > 0) > + best_hard_regno =3D ALLOCNO_HARD_REGNO (conflict_a) += 1; > + else > + best_hard_regno =3D regno + 1; > + > + if (ALLOCNO_HARD_REGNO (conflict_a) < 0) > + { > + if (check_hard_reg_p (a, best_hard_regno, conflicti= ng_regs, > + profitable_regs)) > + { > + if (best_hard_regno % 2 =3D=3D 0) > + { > + int hard_reg =3D ira_class_hard_regs[aclass= ][0]; > + if (best_hard_regno - 1 < hard_reg) > + return best_hard_regno + 1; > + else > + return best_hard_regno - 1; > + } > + return best_hard_regno; > + } > + else return -1; > + } > + else return best_hard_regno; > + } > + > + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) !=3D 0 > + && DF_REG_DEF_COUNT (ALLOCNO_REGNO (conflict_a)) =3D= =3D 0) > + { > + best_hard_regno =3D ALLOCNO_HARD_REGNO (conflict_a) -= 1; > + if (check_hard_reg_p (a, best_hard_regno, conflicting= _regs, > + profitable_regs)) > + { > + return best_hard_regno; > + } > + } > + else if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) !=3D 0) > + { > + best_hard_regno =3D ALLOCNO_HARD_REGNO (conflict_a) += 2; > + > + if (check_hard_reg_p (a, best_hard_regno, conflicting= _regs, > + profitable_regs)) > + { > + return best_hard_regno; > + } > + else if (ira_class_hard_regs[ALLOCNO_CLASS (a)][0] <= =3D (regno + 1) > + && check_hard_reg_p (a, regno + 1, conflictin= g_regs, > + profitable_regs)) > + return regno+1; > + > + else return -1; > + } > + } > + else if (mode !=3D confl_mode && a_nregs > conf_nregs) > + { > + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (conflict_a)) =3D=3D 0) > + { > + enum reg_class aclass =3D ALLOCNO_CLASS (a); > + > + if (regno < ira_class_hard_regs[aclass][0]) > + regno =3D ira_class_hard_regs[aclass][0]; > + if (ALLOCNO_ASSIGNED_P (conflict_a) > + && ALLOCNO_HARD_REGNO (conflict_a) > 0) > + { > + best_hard_regno =3D ALLOCNO_HARD_REGNO (conflict_a)= - 1; > + return best_hard_regno; > + } > + else > + best_hard_regno =3D regno; > + > + if (check_hard_reg_p (a, best_hard_regno, conflicting_r= egs, > + profitable_regs)) > + { > + if (best_hard_regno % 2 !=3D 0) > + { > + return best_hard_regno; > + } > + return best_hard_regno; > + } > + } > + } > + else > + { > + if (ALLOCNO_HARD_REGNO (conflict_a) > 0 > + && DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) !=3D 0 > + && DF_REG_DEF_COUNT (ALLOCNO_REGNO (conflict_a)) =3D= =3D 0) > + { > + if (ALLOCNO_ASSIGNED_P (conflict_a)) > + best_hard_regno =3D ALLOCNO_HARD_REGNO (conflict_a) = + 1; > + else > + best_hard_regno =3D regno; > + > + if (check_hard_reg_p (a, best_hard_regno, conflicting_= regs, > + profitable_regs)) > + { > + if (best_hard_regno % 2 !=3D 0) > + { > + return best_hard_regno ; > + } > + return best_hard_regno; > + } > + > + int i =3D 0; > + enum reg_class aclass =3D ALLOCNO_CLASS (a); > + int class_size =3D ira_class_hard_regs_num[aclass]; > + while (i < best_hard_regno) > + { > + int last_hard_regno > + =3D ira_class_hard_regs[aclass][class_size - 1]; > + if ((i + best_hard_regno) <=3D last_hard_regno > + && check_hard_reg_p (a, best_hard_regno + i, conf= licting_regs, > + profitable_regs)) > + return best_hard_regno + i; > + ++i; > + } > + > + best_hard_regno -=3D 3; > + i =3D 0; > + > + while (i < best_hard_regno) > + { > + int hard_reg > + =3D ira_class_hard_regs[aclass][0]; > + if ((best_hard_regno - i) >=3D hard_reg > + && check_hard_reg_p (a, best_hard_regno - i, con= flicting_regs, > + profitable_regs)) > + return best_hard_regno - i; > + ++i; > + } > + > + return -1; > + > + } > + } > + } > + } > + return -1; > +} > + > /* Choose a hard register for allocno A. If RETRY_P is TRUE, it means > that the function called from function > `ira_reassign_conflict_allocnos' and `allocno_reload_assign'. In > @@ -1974,6 +2153,13 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > #ifdef STACK_REGS > no_stack_reg_p =3D false; > #endif > + int maxim_regno =3D 0; > + for (i =3D 0; i < class_size; i++) > + { > + if (ira_class_hard_regs[aclass][i] > maxim_regno) > + maxim_regno =3D ira_class_hard_regs[aclass][i]; > + } > + > if (! retry_p) > start_update_cost (); > mem_cost +=3D ALLOCNO_UPDATED_MEMORY_COST (a); > @@ -2078,7 +2264,9 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > } > else > { > - if (conflict_nregs =3D=3D n_objects && conflict_nre= gs > 1) > + int num =3D OBJECT_SUBWORD (conflict_obj); > + > + if (conflict_nregs =3D=3D n_objects) > { > int num =3D OBJECT_SUBWORD (conflict_obj); > > @@ -2090,8 +2278,12 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > hard_regno + num); > } > else > - conflicting_regs[word] > - |=3D ira_reg_mode_hard_regset[hard_regno][mode]= ; > + { > + SET_HARD_REG_BIT (conflicting_regs[word], > + hard_regno + num); > + conflicting_regs[word] > + |=3D ira_reg_mode_hard_regset[hard_regno][mod= e]; > + } > if (hard_reg_set_subset_p (profitable_hard_regs, > conflicting_regs[word])) > goto fail; > @@ -2185,6 +2377,20 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > } > if (min_cost > cost) > min_cost =3D cost; > + > + int reg_pair =3D form_register_pairs (a, hard_regno, conflicting_r= egs); > + > + if (reg_pair > 0) > + { > + if (reg_pair >=3D ira_class_hard_regs[aclass][0] > + && reg_pair < maxim_regno) > + { > + min_full_cost =3D full_cost; > + best_hard_regno =3D reg_pair; > + break; > + } > + } > + > if (min_full_cost > full_cost) > { > min_full_cost =3D full_cost; > @@ -2196,7 +2402,7 @@ assign_hard_reg (ira_allocno_t a, bool retry_p) > } > if (internal_flag_ira_verbose > 5 && ira_dump_file !=3D NULL) > fprintf (ira_dump_file, "\n"); > - if (min_full_cost > mem_cost > + if (best_hard_regno < 0 && min_full_cost > mem_cost > /* Do not spill static chain pointer pseudo when non-local goto > is used. */ > && ! non_spilled_static_chain_regno_p (ALLOCNO_REGNO (a))) > @@ -2473,6 +2679,8 @@ init_allocno_threads (void) > /* Set up initial thread data: */ > ALLOCNO_COLOR_DATA (a)->first_thread_allocno > =3D ALLOCNO_COLOR_DATA (a)->next_thread_allocno =3D a; > + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) =3D=3D 0) > + ALLOCNO_FREQ (a) +=3D ALLOCNO_FREQ (a); > ALLOCNO_COLOR_DATA (a)->thread_freq =3D ALLOCNO_FREQ (a); > ALLOCNO_COLOR_DATA (a)->hard_reg_prefs =3D 0; > for (pref =3D ALLOCNO_PREFS (a); pref !=3D NULL; pref =3D pref->ne= xt_pref) > @@ -3315,6 +3523,10 @@ improve_allocation (void) > } > min_cost =3D INT_MAX; > best =3D -1; > + > + if (DF_REG_DEF_COUNT (ALLOCNO_REGNO (a)) =3D=3D 0) > + continue; > + > /* Now we choose hard register for A which results in highest > allocation cost improvement. */ > for (j =3D 0; j < class_size; j++) > diff --git a/gcc/lra-assigns.cc b/gcc/lra-assigns.cc > index 7aa210e986f..332508044f2 100644 > --- a/gcc/lra-assigns.cc > +++ b/gcc/lra-assigns.cc > @@ -1131,6 +1131,95 @@ assign_hard_regno (int hard_regno, int regno) > /* Array used for sorting different pseudos. */ > static int *sorted_pseudos; > > +/* Skip reasign the register assignment with register pair adjacent > + memory access allocno. */ > +static bool > +can_reassign (HARD_REG_SET conflict_set, int hard_regno, > + machine_mode mode, int regno, int max_regno) > +{ > + int end_regno =3D end_hard_regno (mode, hard_regno); > + int reg =3D hard_regno; > + > + while (++reg < end_regno) > + { > + if (TEST_HARD_REG_BIT (conflict_set, reg)) > + { > + for (int k =3D FIRST_PSEUDO_REGISTER ; k < max_regno; k++) > + { > + machine_mode mode =3D lra_reg_info[regno].biggest_mode; > + machine_mode confl_mode =3D lra_reg_info[k].biggest_mode; > + if (reg =3D=3D reg_renumber[k] && mode !=3D confl_mode) > + { > + int nregs =3D hard_regno_nregs (hard_regno, mode); > + int conf_nregs =3D hard_regno_nregs (hard_regno, confl_= mode); > + enum reg_class cl1 =3D lra_get_allocno_class (regno); > + enum reg_class cl2 =3D lra_get_allocno_class (k); > + int cl1_num =3D ira_class_hard_regs_num[cl1]; > + int cl2_num =3D ira_class_hard_regs_num[cl2]; > + > + if (cl1 =3D=3D cl2 && cl1_num =3D=3D cl2_num > + && nregs > conf_nregs) > + { > + lra_reg_info[regno].pseudo_conflict =3D true;; > + return false; > + } > + } > + } > + } > + } > + > + reg =3D hard_regno; > + > + if ((reg - 1) >=3D ira_class_hard_regs[lra_get_allocno_class (regno)][= 0]) > + if (TEST_HARD_REG_BIT (conflict_set, reg-1)) > + { > + for (int k =3D FIRST_PSEUDO_REGISTER ; k < max_regno; k++) > + { > + machine_mode mode =3D lra_reg_info[regno].biggest_mode; > + machine_mode confl_mode =3D lra_reg_info[k].biggest_mode; > + > + if ((reg - 1) =3D=3D reg_renumber[k] && mode !=3D confl_mode) > + { > + machine_mode mode =3D lra_reg_info[regno].biggest_mode; > + machine_mode confl_mode =3D lra_reg_info[k].biggest_mode; > + int nregs =3D hard_regno_nregs (hard_regno, mode); > + int conf_nregs =3D hard_regno_nregs (hard_regno, confl_mo= de); > + enum reg_class cl1 =3D lra_get_allocno_class (regno); > + enum reg_class cl2 =3D lra_get_allocno_class (k); > + int cl1_num =3D ira_class_hard_regs_num[cl1]; > + int cl2_num =3D ira_class_hard_regs_num[cl2]; > + > + if (cl1 =3D=3D cl2 && cl1 !=3D GENERAL_REGS > + && cl1_num =3D=3D cl2_num > + && nregs < conf_nregs) > + { > + bitmap_iterator bi; > + unsigned int uid; > + EXECUTE_IF_SET_IN_BITMAP (&lra_reg_info[regno].insn_b= itmap, > + 0, uid, bi) > + { > + struct lra_insn_reg *ir; > + > + for (ir =3D lra_get_insn_regs (uid); ir !=3D NULL= ; > + ir =3D ir->next) > + if (ir->regno >=3D FIRST_PSEUDO_REGISTER) > + if (ir->regno =3D=3D k) > + { > + if (lra_reg_info[k].pseudo_conflict) > + return false; > + > + lra_reg_info[k].pseudo_conflict =3D true;= ; > + return false; > + } > + } > + } > + } > + } > + } > + > + return true; > +} > + > /* The constraints pass is allowed to create equivalences between > pseudos that make the current allocation "incorrect" (in the sense > that pseudos are assigned to hard registers from their own conflict > @@ -1221,13 +1310,13 @@ setup_live_pseudos_and_spill_after_risky_transfor= ms (bitmap > val =3D lra_reg_info[regno].val; > offset =3D lra_reg_info[regno].offset; > EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict= _regno) > + { > if (!lra_reg_val_equal_p (conflict_regno, val, offset) > /* If it is multi-register pseudos they should start on > the same hard register. */ > || hard_regno !=3D reg_renumber[conflict_regno]) > { > int conflict_hard_regno =3D reg_renumber[conflict_regno]; > - > biggest_mode =3D lra_reg_info[conflict_regno].biggest_mode; > biggest_nregs =3D hard_regno_nregs (conflict_hard_regno, > biggest_mode); > @@ -1240,6 +1329,12 @@ setup_live_pseudos_and_spill_after_risky_transform= s (bitmap > conflict_hard_regno > - (WORDS_BIG_ENDIAN ? nregs_diff : 0)); > } > + } > + bool reassign =3D can_reassign (conflict_set, hard_regno, > + mode, regno, max_regno); > + if (!reassign) > + continue; > + > if (! overlaps_hard_reg_set_p (conflict_set, mode, hard_regno)) > { > update_lives (regno, false); > @@ -1393,7 +1488,9 @@ assign_by_spills (void) > for (n =3D 0, i =3D lra_constraint_new_regno_start; i < max_regno; i++= ) > if (reg_renumber[i] < 0 && lra_reg_info[i].nrefs !=3D 0 > && regno_allocno_class_array[i] !=3D NO_REGS) > + { > sorted_pseudos[n++] =3D i; > + } > bitmap_initialize (&insn_conflict_pseudos, ®_obstack); > bitmap_initialize (&spill_pseudos_bitmap, ®_obstack); > bitmap_initialize (&best_spill_pseudos_bitmap, ®_obstack); > @@ -1415,6 +1512,10 @@ assign_by_spills (void) > for (i =3D 0; i < n; i++) > { > regno =3D sorted_pseudos[i]; > + > + if (lra_reg_info[i].pseudo_conflict) > + continue; > + > if (reg_renumber[regno] >=3D 0) > continue; > if (lra_dump_file !=3D NULL) > @@ -1541,7 +1642,11 @@ assign_by_spills (void) > || bitmap_bit_p (&lra_optional_reload_pseudos, i)) > && reg_renumber[i] < 0 && lra_reg_info[i].nrefs !=3D 0 > && regno_allocno_class_array[i] !=3D NO_REGS) > + { > + if (lra_reg_info[i].pseudo_conflict) > + continue; > sorted_pseudos[n++] =3D i; > + } > bitmap_clear (&do_not_assign_nonreload_pseudos); > if (n !=3D 0 && lra_dump_file !=3D NULL) > fprintf (lra_dump_file, " Reassigning non-reload pseudos\n"); > @@ -1638,17 +1743,6 @@ lra_assign (bool &fails_p) > bitmap_initialize (&all_spilled_pseudos, ®_obstack); > create_live_range_start_chains (); > setup_live_pseudos_and_spill_after_risky_transforms (&all_spilled_pseu= dos); > - if (! lra_hard_reg_split_p && ! lra_asm_error_p && flag_checking) > - /* Check correctness of allocation but only when there are no hard r= eg > - splits and asm errors as in the case of errors explicit insns inv= olving > - hard regs are added or the asm is removed and this can result in > - incorrect allocation. */ > - for (i =3D FIRST_PSEUDO_REGISTER; i < max_regno; i++) > - if (lra_reg_info[i].nrefs !=3D 0 > - && reg_renumber[i] >=3D 0 > - && overlaps_hard_reg_set_p (lra_reg_info[i].conflict_hard_regs, > - PSEUDO_REGNO_MODE (i), reg_renumber= [i])) > - gcc_unreachable (); > /* Setup insns to process on the next constraint pass. */ > bitmap_initialize (&changed_pseudo_bitmap, ®_obstack); > init_live_reload_and_inheritance_pseudos (); > diff --git a/gcc/lra-int.h b/gcc/lra-int.h > index 5cdf92be7fc..9e590d8fb74 100644 > --- a/gcc/lra-int.h > +++ b/gcc/lra-int.h > @@ -95,6 +95,8 @@ public: > *non-debug* insns. */ > int nrefs, freq; > int last_reload; > + /* Skip reasign register pair with adjacent memory access allocno. */ > + bool pseudo_conflict; > /* rtx used to undo the inheritance. It can be non-null only > between subsequent inheritance and undo inheritance passes. */ > rtx restore_rtx; > diff --git a/gcc/lra.cc b/gcc/lra.cc > index 69081a8e025..5cc97ce7506 100644 > --- a/gcc/lra.cc > +++ b/gcc/lra.cc > @@ -1359,6 +1359,7 @@ initialize_lra_reg_info_element (int i) > lra_reg_info[i].nrefs =3D lra_reg_info[i].freq =3D 0; > lra_reg_info[i].last_reload =3D 0; > lra_reg_info[i].restore_rtx =3D NULL_RTX; > + lra_reg_info[i].pseudo_conflict =3D false; > lra_reg_info[i].val =3D get_new_reg_value (); > lra_reg_info[i].offset =3D 0; > lra_reg_info[i].copies =3D NULL; > diff --git a/gcc/testsuite/g++.target/powerpc/vecload.C b/gcc/testsuite/g= ++.target/powerpc/vecload.C > new file mode 100644 > index 00000000000..c523572cf3c > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/vecload.C > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=3Dpower10 -O2" } */ > + > +#include > + > +void > +foo (__vector_quad *dst, vector unsigned char *ptr, vector unsigned char= src) > +{ > + __vector_quad acc; > + __builtin_mma_xvf32ger(&acc, src, ptr[0]); > + __builtin_mma_xvf32gerpp(&acc, src, ptr[1]); > + *dst =3D acc; > +} > +/* { dg-final { scan-assembler {\mlxvp\M} } } */ > diff --git a/gcc/testsuite/g++.target/powerpc/vecload1.C b/gcc/testsuite/= g++.target/powerpc/vecload1.C > new file mode 100644 > index 00000000000..d10ff0cdf36 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/vecload1.C > @@ -0,0 +1,22 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=3Dpower10 -O2" } */ > + > +#include > + > +void > +foo2 () > +{ > + __vector_quad *dst1; > + __vector_quad *dst2; > + vector unsigned char src; > + __vector_quad acc; > + vector unsigned char *ptr; > + __builtin_mma_xvf32ger(&acc, src, ptr[0]); > + __builtin_mma_xvf32gerpp(&acc, src, ptr[1]); > + *dst1 =3D acc; > + __builtin_mma_xvf32ger(&acc, src, ptr[2]); > + __builtin_mma_xvf32gerpp(&acc, src, ptr[3]); > + *dst2 =3D acc; > +} > +/* { dg-final { scan-assembler {\mlxvp\M} } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c b/gcc/tests= uite/gcc.target/powerpc/mma-builtin-1.c > index 69ee826e1be..02590216320 100644 > --- a/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c > +++ b/gcc/testsuite/gcc.target/powerpc/mma-builtin-1.c > @@ -258,8 +258,8 @@ foo13b (__vector_quad *dst, __vector_quad *src, vec_t= *vec) > dst[13] =3D acc; > } > > -/* { dg-final { scan-assembler-times {\mlxv\M} 40 } } */ > -/* { dg-final { scan-assembler-times {\mlxvp\M} 12 } } */ > +/* { dg-final { scan-assembler-times {\mlxv\M} 12 } } */ > +/* { dg-final { scan-assembler-times {\mlxvp\M} 26 } } */ > /* { dg-final { scan-assembler-times {\mstxvp\M} 40 } } */ > /* { dg-final { scan-assembler-times {\mxxmfacc\M} 20 } } */ > /* { dg-final { scan-assembler-times {\mxxmtacc\M} 6 } } */ > -- > 2.39.3 > > > > > > > > > > > >