From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by sourceware.org (Postfix) with ESMTPS id 084DD3858D20; Tue, 30 May 2023 06:54:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 084DD3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1b01d7b3ee8so23816905ad.0; Mon, 29 May 2023 23:54:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685429678; x=1688021678; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/MENHQ8pTz8ADdIaew+4nrWIby6ZuG1sYxMBLRPP/wk=; b=hTUz9+vKom2+ZMSr7YcwqtF6BqXkiYmw1XF12mGAm+1M8GOVAUyQhNSZ18vFqtJbmZ oXCmhMgP3f0pZIfy0lRbLcQZgA/1qUckpKIORzer4aPcYMH3z6889AvD2VmA8jj/mfBc hUnJT+6jBgvPVTmlhDg8LUdBWAE8yZBUrVaKvlEFftAF/9w3PZgZngM4MkZhwUUy/ohm zKySUSFwGWZL/dyIWStoQAL5Eo9WqW8LFy520KOWzxBigpBd20/ayVN/vxwjL4lLwf02 hA1QFmhrEasjmnGJu77xK1uo8SsRThO08uhd9A1cjIiafjitjgeF8Qjtb5TptPUFz3e2 M4kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685429678; x=1688021678; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/MENHQ8pTz8ADdIaew+4nrWIby6ZuG1sYxMBLRPP/wk=; b=iV1aqcXHNxcCp8oGmoOf2P6RE4M5wEtSQp5jtWdmbETeg0v1ac5q1zRS2malnNYrIM kwSQKkFTchOBdH9zZJtRk27dDpCkgOJa1IZTM4GzcJ8xh3KkB9ir5DZrNUaXkP9TOoDv O0d996j4ixTwdNqAWr7X3Ahm7u/RVvEkLjktdfDoWRUhrk0eaX5JwN5JReQrepMl+QRN ytXzH7W7LIkVlkscQebFpNq+FlXikXJYgbS6ohpI40Tni3u6YWEYrcrBJiENENDKYfwj 7CY8GVEXxSUsYZOq7WrirQ4Knk4chwPps/owsEVYXomV6BmaPI9wCR3T/jyLYPrWaqmD aEYA== X-Gm-Message-State: AC+VfDx4DmC1q2hGGN6uSwc6HCKz9KgHuOD7u2ifMUcuZpPdu8kPBEgt MsbwBNSsxomKnXg9cWesFx2FuVW25nOYi5yOt/A= X-Google-Smtp-Source: ACHHUZ5m80Hdfcot5CFQ1tlDNIgLgORe1hs3JLSGgSZRreyAbPT/zDgqzfZYfm3J8Q7fsF8pvGE7SnV9EXZ62WGAe6M= X-Received: by 2002:a17:902:8541:b0:1b0:4727:69d4 with SMTP id d1-20020a170902854100b001b0472769d4mr1286715plo.54.1685429677658; Mon, 29 May 2023 23:54:37 -0700 (PDT) MIME-Version: 1.0 References: <20230529035217.2126346-1-guojiufu@linux.ibm.com> In-Reply-To: <20230529035217.2126346-1-guojiufu@linux.ibm.com> From: Andrew Pinski Date: Mon, 29 May 2023 23:54:25 -0700 Message-ID: Subject: Re: [RFC] light expander sra for parameters and returns To: Jiufu Guo Cc: gcc-patches@gcc.gnu.org, rguenther@suse.de, jeffreyalaw@gmail.com, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, May 28, 2023 at 8:53=E2=80=AFPM Jiufu Guo via Gcc-patches wrote: > > Hi, > > Previously, I was investigating some struct parameters and returns relate= d > PRs 69143/65421/108073. > > Investigating the issues case by case, and drafting patches for each of > them one by one. This would help us to enhance code incrementally. > While, this way, patches would interact with each other and implement > different codes for similar issues (because of the different paths in > gimple/rtl). We may have a common fix for those issues. > > We know a few other related PRs(such as meta-bug PR101926) exist. For tho= se > PRs in different targets with different symptoms (and also different root > cause), I would expect a method could help some of them, but it may > be hard to handle all of them in one fix. First thanks for working on this; this is something which will help improve GCC a lot. What you implemented is similar to some ideas I had. The meta-bug is there more to make it easier to find similar cases of the same issue; I never expected someone to fix all of them in one go. Once the patch gets finally approved, I am willing to help out by going through the bug reports and seeing if the patch fixes it or if more is needed. Thanks, Andrew > > With investigation and check discussion for the issues, I remember a > suggestion from Richard: it would be nice to perform some SRA-like analys= is > for the accesses on the structs (parameter/returns). > https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html > This may be a 'fairly common method' for those issues. With this idea, > I drafted a patch as below in this mail. > > I also thought about directly using tree-sra.cc, e.g. enhance it and reru= n it > at the end of GIMPLE passes. While since some issues are introduced insid= e > the expander, so below patch also co-works with other parts of the expand= er. > And since we already have tree-sra in gimple pass, we only need to take m= ore > care on parameter and return in this patch: other decls could be handled > well in tree-sra. > > The steps of this patch are: > 1. Collect struct type parameters and returns, and then scan the function= to > get the accesses on them. And figure out the accesses which would be prof= itable > to be scalarized (using registers of the parameter/return ). Now, reading= on > parameter and writing on returns are checked in the current patch. > 2. When/after the scalar registers are determined/expanded for the return= or > parameters, compute the corresponding scalar register(s) for each accesse= s of > the return/parameter, and prepare the scalar RTLs for those accesses. > 3. When using/expanding the accesses expression, leverage the computed/pr= epared > scalars directly. > > This patch is tested on ppc64 both LE and BE. > To continue, I would ask for comments and suggestions first. And then I w= ould > update/enhance accordingly. Thanks in advance! > > > BR, > Jeff (Jiufu) > > > --- > gcc/cfgexpand.cc | 567 ++++++++++++++++++- > gcc/expr.cc | 15 +- > gcc/function.cc | 26 +- > gcc/opts.cc | 8 +- > gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- > gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 + > gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + > gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ > 8 files changed, 675 insertions(+), 10 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > index 85a93a547c0..95c29b6b6fe 100644 > --- a/gcc/cfgexpand.cc > +++ b/gcc/cfgexpand.cc > @@ -97,6 +97,564 @@ static bool defer_stack_allocation (tree, bool); > > static void record_alignment_for_reg_var (unsigned int); > > +/* For light SRA in expander about paramaters and returns. */ > +namespace { > + > +struct access > +{ > + /* Each accessing on the aggragate is about OFFSET/SIZE and BASE. */ > + HOST_WIDE_INT offset; > + HOST_WIDE_INT size; > + tree base; > + bool writing; > + > + /* The context expression of this access. */ > + tree expr; > + > + /* The rtx for the access: link to incoming/returning register(s). */ > + rtx rtx_val; > +}; > + > +typedef struct access *access_p; > + > +/* Expr (tree) -> Acess (access_p) map. */ > +static hash_map *expr_access_vec; > + > +/* Base (tree) -> Vector (vec *) map. */ > +static hash_map > *base_access_vec; > + > +/* Return a vector of pointers to accesses for the variable given in BAS= E or > + NULL if there is none. */ > + > +static vec * > +get_base_access_vector (tree base) > +{ > + return base_access_vec->get (base); > +} > + > +/* Remove DECL from candidates for SRA. */ > +static void > +disqualify_candidate (tree decl) > +{ > + decl =3D get_base_address (decl); > + base_access_vec->remove (decl); > +} > + > +/* Create and insert access for EXPR. Return created access, or NULL if = it is > + not possible. */ > +static struct access * > +create_access (tree expr, bool write) > +{ > + poly_int64 poffset, psize, pmax_size; > + bool reverse; > + > + tree base > + =3D get_ref_base_and_extent (expr, &poffset, &psize, &pmax_size, &re= verse); > + > + if (!DECL_P (base)) > + return NULL; > + > + vec *access_vec =3D get_base_access_vector (base); > + if (!access_vec) > + return NULL; > + > + /* TODO: support reverse. */ > + if (reverse) > + { > + disqualify_candidate (expr); > + return NULL; > + } > + > + HOST_WIDE_INT offset, size, max_size; > + if (!poffset.is_constant (&offset) || !psize.is_constant (&size) > + || !pmax_size.is_constant (&max_size)) > + return NULL; > + > + if (size !=3D max_size || size =3D=3D 0 || offset < 0 || size < 0 > + || offset + size > tree_to_shwi (DECL_SIZE (base))) > + return NULL; > + > + struct access *access =3D XNEWVEC (struct access, 1); > + > + memset (access, 0, sizeof (struct access)); > + access->base =3D base; > + access->offset =3D offset; > + access->size =3D size; > + access->expr =3D expr; > + access->writing =3D write; > + access->rtx_val =3D NULL_RTX; > + > + access_vec->safe_push (access); > + > + return access; > +} > + > +/* Return true if VAR is a candidate for SRA. */ > +static bool > +add_sra_candidate (tree var) > +{ > + tree type =3D TREE_TYPE (var); > + > + if (!AGGREGATE_TYPE_P (type) || TREE_THIS_VOLATILE (var) > + || !COMPLETE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) > + || tree_to_shwi (TYPE_SIZE (type)) =3D=3D 0 > + || TYPE_MAIN_VARIANT (type) =3D=3D TYPE_MAIN_VARIANT (va_list_type= _node)) > + return false; > + > + base_access_vec->get_or_insert (var); > + > + return true; > +} > + > +/* Callback of walk_stmt_load_store_addr_ops visit_addr used to remove > + operands with address taken. */ > +static tree > +visit_addr (tree *tp, int *, void *) > +{ > + tree op =3D *tp; > + if (op && DECL_P (op)) > + disqualify_candidate (op); > + > + return NULL; > +} > + > +/* Scan expression EXPR and create access structures for all accesses to > + candidates for scalarization. Return the created access or NULL if n= one is > + created. */ > +static struct access * > +build_access_from_expr (tree expr, bool write) > +{ > + if (TREE_CODE (expr) =3D=3D VIEW_CONVERT_EXPR) > + expr =3D TREE_OPERAND (expr, 0); > + > + if (TREE_CODE (expr) =3D=3D BIT_FIELD_REF || storage_order_barrier_p (= expr) > + || TREE_THIS_VOLATILE (expr)) > + { > + disqualify_candidate (expr); > + return NULL; > + } > + > + switch (TREE_CODE (expr)) > + { > + case MEM_REF: { > + tree op =3D TREE_OPERAND (expr, 0); > + if (TREE_CODE (op) =3D=3D ADDR_EXPR) > + disqualify_candidate (TREE_OPERAND (op, 0)); > + break; > + } > + case ADDR_EXPR: > + case IMAGPART_EXPR: > + case REALPART_EXPR: > + disqualify_candidate (TREE_OPERAND (expr, 0)); > + break; > + case VAR_DECL: > + case PARM_DECL: > + case RESULT_DECL: > + case COMPONENT_REF: > + case ARRAY_REF: > + case ARRAY_RANGE_REF: > + return create_access (expr, write); > + break; > + default: > + break; > + } > + > + return NULL; > +} > + > +/* Scan function and look for interesting expressions and create access > + structures for them. */ > +static void > +scan_function (void) > +{ > + basic_block bb; > + > + FOR_EACH_BB_FN (bb, cfun) > + { > + for (gphi_iterator gsi =3D gsi_start_phis (bb); !gsi_end_p (gsi); > + gsi_next (&gsi)) > + { > + gphi *phi =3D gsi.phi (); > + for (size_t i =3D 0; i < gimple_phi_num_args (phi); i++) > + { > + tree t =3D gimple_phi_arg_def (phi, i); > + walk_tree (&t, visit_addr, NULL, NULL); > + } > + } > + > + for (gimple_stmt_iterator gsi =3D gsi_start_nondebug_after_labels_= bb (bb); > + !gsi_end_p (gsi); gsi_next_nondebug (&gsi)) > + { > + gimple *stmt =3D gsi_stmt (gsi); > + switch (gimple_code (stmt)) > + { > + case GIMPLE_RETURN: { > + tree r =3D gimple_return_retval (as_a (stmt)); > + if (r && VAR_P (r) && r !=3D DECL_RESULT (current_functio= n_decl)) > + build_access_from_expr (r, true); > + } > + break; > + case GIMPLE_ASSIGN: > + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stm= t)) > + { > + tree lhs =3D gimple_assign_lhs (stmt); > + tree rhs =3D gimple_assign_rhs1 (stmt); > + if (TREE_CODE (rhs) =3D=3D CONSTRUCTOR) > + disqualify_candidate (lhs); > + else > + { > + build_access_from_expr (rhs, false); > + build_access_from_expr (lhs, true); > + } > + } > + break; > + default: > + walk_gimple_op (stmt, visit_addr, NULL); > + break; > + } > + } > + } > +} > + > +/* Collect the parameter and returns with type which is suitable for > + * scalarization. */ > +static bool > +collect_light_sra_candidates (void) > +{ > + bool ret =3D false; > + > + /* Collect parameters. */ > + for (tree parm =3D DECL_ARGUMENTS (current_function_decl); parm; > + parm =3D DECL_CHAIN (parm)) > + ret |=3D add_sra_candidate (parm); > + > + /* Collect VARs on returns. */ > + if (DECL_RESULT (current_function_decl)) > + { > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + if (greturn *r =3D safe_dyn_cast (*gsi_last_bb (e->src= ))) > + { > + tree val =3D gimple_return_retval (r); > + if (val && VAR_P (val)) > + ret |=3D add_sra_candidate (val); > + } > + } > + > + return ret; > +} > + > +/* Now, only scalarize the parms only with reading > + or returns only with writing. */ > +bool > +check_access_vec (tree const &base, auto_vec const &access_vec= , > + auto_vec *unqualify_vec) > +{ > + bool read =3D false; > + bool write =3D false; > + for (unsigned int j =3D 0; j < access_vec.length (); j++) > + { > + struct access *access =3D access_vec[j]; > + if (access->writing) > + write =3D true; > + else > + read =3D true; > + > + if (write && read) > + break; > + } > + if ((write && read) || (!write && !read)) > + unqualify_vec->safe_push (base); > + > + return true; > +} > + > +/* Analyze all the accesses, remove those inprofitable candidates. > + And build the expr->access map. */ > +static void > +analyze_accesses () > +{ > + auto_vec unqualify_vec; > + base_access_vec->traverse *, check_access_vec> ( > + &unqualify_vec); > + > + tree base; > + unsigned i; > + FOR_EACH_VEC_ELT (unqualify_vec, i, base) > + disqualify_candidate (base); > +} > + > +static void > +prepare_expander_sra () > +{ > + if (optimize <=3D 0) > + return; > + > + base_access_vec =3D new hash_map >; > + expr_access_vec =3D new hash_map; > + > + if (collect_light_sra_candidates ()) > + { > + scan_function (); > + analyze_accesses (); > + } > +} > + > +static void > +free_expander_sra () > +{ > + if (optimize <=3D 0 || !expr_access_vec) > + return; > + delete expr_access_vec; > + expr_access_vec =3D 0; > + delete base_access_vec; > + base_access_vec =3D 0; > +} > +} /* namespace */ > + > +/* Check If there is an sra access for the expr. > + Return the correspond scalar sym for the access. */ > +rtx > +get_scalar_rtx_for_aggregate_expr (tree expr) > +{ > + if (!expr_access_vec) > + return NULL_RTX; > + access_p *access =3D expr_access_vec->get (expr); > + return access ? (*access)->rtx_val : NULL_RTX; > +} > + > +extern rtx > +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); > + > +/* Compute/Set RTX registers for those accesses on BASE. */ > +void > +set_scalar_rtx_for_aggregate_access (tree base, rtx regs) > +{ > + if (!base_access_vec) > + return; > + vec *access_vec =3D get_base_access_vector (base); > + if (!access_vec) > + return; > + > + /* Go through each access, compute corresponding rtx(regs or subregs) > + for the expression. */ > + int n =3D access_vec->length (); > + int cur_access_index =3D 0; > + for (; cur_access_index < n; cur_access_index++) > + { > + access_p acc =3D (*access_vec)[cur_access_index]; > + machine_mode expr_mode =3D TYPE_MODE (TREE_TYPE (acc->expr)); > + /* non BLK in mult registers*/ > + if (expr_mode !=3D BLKmode > + && known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) > + break; > + > + int start_index =3D -1; > + int end_index =3D -1; > + HOST_WIDE_INT left_margin_bits =3D 0; > + HOST_WIDE_INT right_margin_bits =3D 0; > + int cur_index =3D XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; > + for (; cur_index < XVECLEN (regs, 0); cur_index++) > + { > + rtx slot =3D XVECEXP (regs, 0, cur_index); > + HOST_WIDE_INT off =3D UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; > + HOST_WIDE_INT size > + =3D GET_MODE_BITSIZE (GET_MODE (XEXP (slot, 0))).to_constant = (); > + if (off <=3D acc->offset && off + size > acc->offset) > + { > + start_index =3D cur_index; > + left_margin_bits =3D acc->offset - off; > + } > + if (off + size >=3D acc->offset + acc->size) > + { > + end_index =3D cur_index; > + right_margin_bits =3D off + size - (acc->offset + acc->size= ); > + break; > + } > + } > + /* accessing pading and outof bound. */ > + if (start_index < 0 || end_index < 0) > + break; > + > + /* Need a parallel for possible multi-registers. */ > + if (expr_mode =3D=3D BLKmode || end_index > start_index) > + { > + /* Can not support start from middle of a register. */ > + if (left_margin_bits !=3D 0) > + break; > + > + int len =3D end_index - start_index + 1; > + const int margin =3D 3; /* more space for SI, HI, QI. */ > + rtx *tmps =3D XALLOCAVEC (rtx, len + (right_margin_bits ? margi= n : 0)); > + > + HOST_WIDE_INT start_off > + =3D UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); > + int pos =3D 0; > + for (; pos < len - (right_margin_bits ? 1 : 0); pos++) > + { > + int index =3D start_index + pos; > + rtx orig_reg =3D XEXP (XVECEXP (regs, 0, index), 0); > + machine_mode mode =3D GET_MODE (orig_reg); > + rtx reg =3D NULL_RTX; > + if (HARD_REGISTER_P (orig_reg)) > + { > + /* Reading from param hard reg need to be moved to a te= mp. */ > + gcc_assert (!acc->writing); > + reg =3D gen_reg_rtx (mode); > + emit_move_insn (reg, orig_reg); > + } > + else > + reg =3D orig_reg; > + > + HOST_WIDE_INT off =3D UINTVAL (XEXP (XVECEXP (regs, 0, inde= x), 1)); > + tmps[pos] > + =3D gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off - start_of= f)); > + } > + > + /* There are some fields are in part of registers. */ > + if (right_margin_bits !=3D 0) > + { > + if (acc->writing) > + break; > + > + gcc_assert ((right_margin_bits % BITS_PER_UNIT) =3D=3D 0); > + HOST_WIDE_INT off_byte > + =3D UINTVAL (XEXP (XVECEXP (regs, 0, end_index), 1)) - st= art_off; > + rtx orig_reg =3D XEXP (XVECEXP (regs, 0, end_index), 0); > + machine_mode orig_mode =3D GET_MODE (orig_reg); > + gcc_assert (GET_MODE_CLASS (orig_mode) =3D=3D MODE_INT); > + > + machine_mode mode_aux[] =3D {SImode, HImode, QImode}; > + HOST_WIDE_INT reg_size > + =3D GET_MODE_BITSIZE (orig_mode).to_constant (); > + HOST_WIDE_INT off_bits =3D 0; > + for (unsigned long j =3D 0; > + j < sizeof (mode_aux) / sizeof (mode_aux[0]); j++) > + { > + HOST_WIDE_INT submode_bitsize > + =3D GET_MODE_BITSIZE (mode_aux[j]).to_constant (); > + if (reg_size - right_margin_bits - off_bits > + >=3D submode_bitsize) > + { > + rtx reg =3D gen_reg_rtx (orig_mode); > + emit_move_insn (reg, orig_reg); > + > + poly_uint64 lowpart_off > + =3D subreg_lowpart_offset (mode_aux[j], orig_mode= ); > + int lowpart_off_bits > + =3D lowpart_off.to_constant () * BITS_PER_UNIT; > + int shift_bits =3D lowpart_off_bits >=3D off_bits > + ? (lowpart_off_bits - off_bits) > + : (off_bits - lowpart_off_bits); > + if (shift_bits > 0) > + reg =3D expand_shift (RSHIFT_EXPR, orig_mode, reg= , > + shift_bits, NULL, 1); > + rtx subreg =3D gen_lowpart (mode_aux[j], reg); > + rtx off =3D GEN_INT (off_byte); > + tmps[pos++] > + =3D gen_rtx_EXPR_LIST (mode_aux[j], subreg, off); > + off_byte +=3D submode_bitsize / BITS_PER_UNIT; > + off_bits +=3D submode_bitsize; > + } > + } > + } > + > + /* Currently, PARALLELs with register elements for param/return= s > + are using BLKmode. */ > + acc->rtx_val =3D gen_rtx_PARALLEL (TYPE_MODE (TREE_TYPE (acc->e= xpr)), > + gen_rtvec_v (pos, tmps)); > + continue; > + } > + > + /* The access corresponds to one reg. */ > + if (end_index =3D=3D start_index && left_margin_bits =3D=3D 0 > + && right_margin_bits =3D=3D 0) > + { > + rtx orig_reg =3D XEXP (XVECEXP (regs, 0, start_index), 0); > + rtx reg =3D NULL_RTX; > + if (HARD_REGISTER_P (orig_reg)) > + { > + /* Reading from param hard reg need to be moved to a temp. = */ > + gcc_assert (!acc->writing); > + reg =3D gen_reg_rtx (GET_MODE (orig_reg)); > + emit_move_insn (reg, orig_reg); > + } > + else > + reg =3D orig_reg; > + if (GET_MODE (orig_reg) !=3D expr_mode) > + reg =3D gen_lowpart (expr_mode, reg); > + > + acc->rtx_val =3D reg; > + continue; > + } > + > + /* It is accessing a filed which is part of a register. */ > + scalar_int_mode imode; > + if (!acc->writing && end_index =3D=3D start_index > + && int_mode_for_size (acc->size, 1).exists (&imode)) > + { > + /* get and copy original register inside the param. */ > + rtx orig_reg =3D XEXP (XVECEXP (regs, 0, start_index), 0); > + machine_mode mode =3D GET_MODE (orig_reg); > + gcc_assert (GET_MODE_CLASS (mode) =3D=3D MODE_INT); > + rtx reg =3D gen_reg_rtx (mode); > + emit_move_insn (reg, orig_reg); > + > + /* shift to expect part. */ > + poly_uint64 lowpart_off =3D subreg_lowpart_offset (imode, mode)= ; > + int lowpart_off_bits =3D lowpart_off.to_constant () * BITS_PER_= UNIT; > + int shift_bits =3D lowpart_off_bits >=3D left_margin_bits > + ? (lowpart_off_bits - left_margin_bits) > + : (left_margin_bits - lowpart_off_bits); > + if (shift_bits > 0) > + reg =3D expand_shift (RSHIFT_EXPR, mode, reg, shift_bits, NUL= L, 1); > + > + /* move corresond part subreg to result. */ > + rtx subreg =3D gen_lowpart (imode, reg); > + rtx result =3D gen_reg_rtx (imode); > + emit_move_insn (result, subreg); > + > + if (expr_mode !=3D imode) > + result =3D gen_lowpart (expr_mode, result); > + > + acc->rtx_val =3D result; > + continue; > + } > + > + break; > + } > + > + /* Some access expr(s) are not scalarized. */ > + if (cur_access_index !=3D n) > + disqualify_candidate (base); > + else > + { > + /* Add elements to expr->access map. */ > + for (int j =3D 0; j < n; j++) > + { > + access_p access =3D (*access_vec)[j]; > + expr_access_vec->put (access->expr, access); > + } > + } > +} > + > +void > +set_scalar_rtx_for_returns () > +{ > + tree res =3D DECL_RESULT (current_function_decl); > + gcc_assert (res); > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + if (greturn *r =3D safe_dyn_cast (*gsi_last_bb (e->src))) > + { > + tree val =3D gimple_return_retval (r); > + if (val && VAR_P (val)) > + set_scalar_rtx_for_aggregate_access (val, DECL_RTL (res)); > + } > +} > + > /* Return an expression tree corresponding to the RHS of GIMPLE > statement STMT. */ > > @@ -3778,7 +4336,8 @@ expand_return (tree retval) > > /* If we are returning the RESULT_DECL, then the value has already > been stored into it, so we don't have to do anything special. */ > - if (TREE_CODE (retval_rhs) =3D=3D RESULT_DECL) > + if (TREE_CODE (retval_rhs) =3D=3D RESULT_DECL > + || get_scalar_rtx_for_aggregate_expr (retval_rhs)) > expand_value_return (result_rtl); > > /* If the result is an aggregate that is being returned in one (or mor= e) > @@ -4422,6 +4981,9 @@ expand_debug_expr (tree exp) > int unsignedp =3D TYPE_UNSIGNED (TREE_TYPE (exp)); > addr_space_t as; > scalar_int_mode op0_mode, op1_mode, addr_mode; > + rtx x =3D get_scalar_rtx_for_aggregate_expr (exp); > + if (x) > + return NULL_RTX;/* optimized out. */ > > switch (TREE_CODE_CLASS (TREE_CODE (exp))) > { > @@ -6630,6 +7192,8 @@ pass_expand::execute (function *fun) > avoid_deep_ter_for_debug (gsi_stmt (gsi), 0); > } > > + prepare_expander_sra (); > + > /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE= . */ > auto_bitmap forced_stack_vars; > discover_nonconstant_array_refs (forced_stack_vars); > @@ -7062,6 +7626,7 @@ pass_expand::execute (function *fun) > loop_optimizer_finalize (); > } > > + free_expander_sra (); > timevar_pop (TV_POST_EXPAND); > > return 0; > diff --git a/gcc/expr.cc b/gcc/expr.cc > index 56b51876f80..b970f98e689 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -100,6 +100,7 @@ static void do_tablejump (rtx, machine_mode, rtx, rtx= , rtx, > static rtx const_vector_from_tree (tree); > static tree tree_expr_size (const_tree); > static void convert_mode_scalar (rtx, rtx, int); > +rtx get_scalar_rtx_for_aggregate_expr (tree); > > > /* This is run to set up which modes can be used > @@ -5623,11 +5624,12 @@ expand_assignment (tree to, tree from, bool nonte= mporal) > Assignment of an array element at a constant index, and assignment = of > an array element in an unaligned packed structure field, has the sa= me > problem. Same for (partially) storing into a non-memory object. *= / > - if (handled_component_p (to) > - || (TREE_CODE (to) =3D=3D MEM_REF > - && (REF_REVERSE_STORAGE_ORDER (to) > - || mem_ref_refers_to_non_mem_p (to))) > - || TREE_CODE (TREE_TYPE (to)) =3D=3D ARRAY_TYPE) > + if (!get_scalar_rtx_for_aggregate_expr (to) > + && (handled_component_p (to) > + || (TREE_CODE (to) =3D=3D MEM_REF > + && (REF_REVERSE_STORAGE_ORDER (to) > + || mem_ref_refers_to_non_mem_p (to))) > + || TREE_CODE (TREE_TYPE (to)) =3D=3D ARRAY_TYPE)) > { > machine_mode mode1; > poly_int64 bitsize, bitpos; > @@ -8995,6 +8997,9 @@ expand_expr_real (tree exp, rtx target, machine_mod= e tmode, > ret =3D CONST0_RTX (tmode); > return ret ? ret : const0_rtx; > } > + rtx x =3D get_scalar_rtx_for_aggregate_expr (exp); > + if (x) > + return x; > > ret =3D expand_expr_real_1 (exp, target, tmode, modifier, alt_rtl, > inner_reference_p); > diff --git a/gcc/function.cc b/gcc/function.cc > index 82102ed78d7..262d3f17e72 100644 > --- a/gcc/function.cc > +++ b/gcc/function.cc > @@ -2742,6 +2742,9 @@ assign_parm_find_stack_rtl (tree parm, struct assig= n_parm_data_one *data) > data->stack_parm =3D stack_parm; > } > > +extern void > +set_scalar_rtx_for_aggregate_access (tree, rtx); > + > /* A subroutine of assign_parms. Adjust DATA->ENTRY_RTL such that it's > always valid and contiguous. */ > > @@ -3117,8 +3120,21 @@ assign_parm_setup_block (struct assign_parm_data_a= ll *all, > emit_move_insn (mem, entry_parm); > } > else > - move_block_from_reg (REGNO (entry_parm), mem, > - size_stored / UNITS_PER_WORD); > + { > + int regno =3D REGNO (entry_parm); > + int nregs =3D size_stored / UNITS_PER_WORD; > + move_block_from_reg (regno, mem, nregs); > + > + rtx *tmps =3D XALLOCAVEC (rtx, nregs); > + machine_mode mode =3D word_mode; > + for (int i =3D 0; i < nregs; i++) > + tmps[i] =3D gen_rtx_EXPR_LIST ( > + VOIDmode, gen_rtx_REG (mode, regno + i), > + GEN_INT (GET_MODE_SIZE (mode).to_constant () * i)); > + > + rtx regs =3D gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmp= s)); > + set_scalar_rtx_for_aggregate_access (parm, regs); > + } > } > else if (data->stack_parm =3D=3D 0 && !TYPE_EMPTY_P (data->arg.type)) > { > @@ -3718,6 +3734,10 @@ assign_parms (tree fndecl) > else > set_decl_incoming_rtl (parm, data.entry_parm, false); > > + rtx incoming =3D DECL_INCOMING_RTL (parm); > + if (GET_CODE (incoming) =3D=3D PARALLEL) > + set_scalar_rtx_for_aggregate_access (parm, incoming); > + > assign_parm_adjust_stack_rtl (&data); > > if (assign_parm_setup_block_p (&data)) > @@ -5037,6 +5057,7 @@ stack_protect_epilogue (void) > the function's parameters, which must be run at any return statement.= */ > > bool currently_expanding_function_start; > +extern void set_scalar_rtx_for_returns (); > void > expand_function_start (tree subr) > { > @@ -5138,6 +5159,7 @@ expand_function_start (tree subr) > { > gcc_assert (GET_CODE (hard_reg) =3D=3D PARALLEL); > set_parm_rtl (res, gen_group_rtx (hard_reg)); > + set_scalar_rtx_for_returns (); > } > } > > diff --git a/gcc/opts.cc b/gcc/opts.cc > index 86b94d62b58..5e129a1cc49 100644 > --- a/gcc/opts.cc > +++ b/gcc/opts.cc > @@ -1559,6 +1559,10 @@ public: > vec m_values; > }; > > +#ifdef __GNUC__ > +#pragma GCC diagnostic push > +#pragma GCC diagnostic ignored "-Wformat-truncation" > +#endif > /* Print help for a specific front-end, etc. */ > static void > print_filtered_help (unsigned int include_flags, > @@ -1913,7 +1917,9 @@ print_filtered_help (unsigned int include_flags, > printf ("\n\n"); > } > } > - > +#ifdef __GNUC__ > +#pragma GCC diagnostic pop > +#endif > /* Display help for a specified type of option. > The options must have ALL of the INCLUDE_FLAGS set > ANY of the flags in the ANY_FLAGS set > diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/= g++.target/powerpc/pr102024.C > index 769585052b5..c8995cae707 100644 > --- a/gcc/testsuite/g++.target/powerpc/pr102024.C > +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C > @@ -5,7 +5,7 @@ > // Test that a zero-width bit field in an otherwise homogeneous aggregat= e > // generates a psabi warning and passes arguments in GPRs. > > -// { dg-final { scan-assembler-times {\mstd\M} 4 } } > +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } > > struct a_thing > { > diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/= gcc.target/powerpc/pr108073.c > new file mode 100644 > index 00000000000..7dd1a4a326a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c > @@ -0,0 +1,29 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -save-temps" } */ > + > +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; = } DF; > +typedef struct SF {float a[4]; int i1; int i2; } SF; > + > +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_p= pc64 && has_arch_pwr8 } } } } */ > +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 &&= has_arch_pwr8 } } } } */ > +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 &&= has_arch_pwr8 } } } } */ > +short __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag =3D=3D 2= )return a.s2+a.s3;return 0;} > +int __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag =3D=3D 2)r= eturn a.i2+a.i1;return 0;} > +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag =3D=3D= 2)return arg.a[3];else return 0.0;} > +float __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag =3D=3D= 2)return arg.a[2]; return 0;} > +float __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag =3D= =3D 2)return arg.a[1];return 0;} > + > +DF gdf =3D {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4}; > +SF gsf =3D {{1.0f,2.0f,3.0f,4.0f}, 1, 2}; > + > +int main() > +{ > + if (!(foo_hi (gdf, 2) =3D=3D 5 && foo_si (gsf, 2) =3D=3D 3 && foo_df (= gdf, 2) =3D=3D 4.0 > + && foo_sf (gsf, 2) =3D=3D 3.0 && foo_sf1 (gsf, 2) =3D=3D 2.0)) > + __builtin_abort (); > + if (!(foo_hi (gdf, 1) =3D=3D 0 && foo_si (gsf, 1) =3D=3D 0 && foo_df (= gdf, 1) =3D=3D 0 > + && foo_sf (gsf, 1) =3D=3D 0 && foo_sf1 (gsf, 1) =3D=3D 0)) > + __builtin_abort (); > + return 0; > +} > + > diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite= /gcc.target/powerpc/pr65421-1.c > new file mode 100644 > index 00000000000..4e1f87f7939 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c > @@ -0,0 +1,6 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > + > +typedef struct LARGE {double a[4]; int arr[32];} LARGE; > +LARGE foo (LARGE a){return a;} > +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite= /gcc.target/powerpc/pr65421-2.c > new file mode 100644 > index 00000000000..8a8e1a0e996 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c > @@ -0,0 +1,32 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > +/* { dg-require-effective-target powerpc_elfv2 } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +typedef struct FLOATS > +{ > + double a[3]; > +} FLOATS; > + > +/* 3 lfd after returns also optimized */ > +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */ > + > +/* 3 stfd */ > +void st_arg (FLOATS a, FLOATS *p) {*p =3D a;} > +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ > + > +/* blr */ > +FLOATS ret_arg (FLOATS a) {return a;} > + > +typedef struct MIX > +{ > + double a[2]; > + long l; > +} MIX; > + > +/* std 3 param regs to return slot */ > +MIX ret_arg1 (MIX a) {return a;} > +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ > + > +/* count insns */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */ > -- > 2.39.1 >