From: Martin Jambor <mjambor@suse.cz>
To: Richard Biener <rguenther@suse.de>, Jiufu Guo <guojiufu@linux.ibm.com>
Cc: gcc-patches@gcc.gnu.org, jeffreyalaw@gmail.com,
segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org
Subject: Re: [RFC] light expander sra for parameters and returns
Date: Thu, 01 Jun 2023 17:27:34 +0200 [thread overview]
Message-ID: <ri6cz2f8aop.fsf@suse.cz> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2305300639310.4723@jbgna.fhfr.qr>
Hi,
On Tue, May 30 2023, Richard Biener wrote:
> On Mon, 29 May 2023, Jiufu Guo wrote:
>
>> Hi,
>>
>> Previously, I was investigating some struct parameters and returns related
>> PRs 69143/65421/108073.
>>
>> Investigating the issues case by case, and drafting patches for each of
>> them one by one. This would help us to enhance code incrementally.
>> While, this way, patches would interact with each other and implement
>> different codes for similar issues (because of the different paths in
>> gimple/rtl). We may have a common fix for those issues.
>>
>> We know a few other related PRs(such as meta-bug PR101926) exist. For those
>> PRs in different targets with different symptoms (and also different root
>> cause), I would expect a method could help some of them, but it may
>> be hard to handle all of them in one fix.
>>
>> With investigation and check discussion for the issues, I remember a
>> suggestion from Richard: it would be nice to perform some SRA-like analysis
>> for the accesses on the structs (parameter/returns).
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605117.html
>> This may be a 'fairly common method' for those issues. With this idea,
>> I drafted a patch as below in this mail.
>>
>> I also thought about directly using tree-sra.cc, e.g. enhance it and rerun it
>> at the end of GIMPLE passes. While since some issues are introduced inside
>> the expander, so below patch also co-works with other parts of the expander.
>> And since we already have tree-sra in gimple pass, we only need to take more
>> care on parameter and return in this patch: other decls could be handled
>> well in tree-sra.
>>
>> The steps of this patch are:
>> 1. Collect struct type parameters and returns, and then scan the function to
>> get the accesses on them. And figure out the accesses which would be profitable
>> to be scalarized (using registers of the parameter/return ). Now, reading on
>> parameter and writing on returns are checked in the current patch.
>> 2. When/after the scalar registers are determined/expanded for the return or
>> parameters, compute the corresponding scalar register(s) for each accesses of
>> the return/parameter, and prepare the scalar RTLs for those accesses.
>> 3. When using/expanding the accesses expression, leverage the computed/prepared
>> scalars directly.
>>
>> This patch is tested on ppc64 both LE and BE.
>> To continue, I would ask for comments and suggestions first. And then I would
>> update/enhance accordingly. Thanks in advance!
>
> Thanks for working on this - the description above sounds exactly like
> what should be done.
>
> Now - I'd like the code to re-use the access tree data structure from
> SRA plus at least the worker creating the accesses from a stmt.
I have had a first look at the patch but still need to look into it more
to understand how it uses the information it gathers.
My plan is to make the access-tree infrastructure of IPA-SRA more
generic and hopefully usable even for this purpose, rather than the one
in tree-sra.cc. But that really builds a tree of accesses, bailing out
on any partial overlaps, for example, which may not be the right thing
here since I don't see any tree-building here. But I still need to
properly read set_scalar_rtx_for_aggregate_access function in the patch,
which I plan to do next week.
Thanks,
Martin
>
> The RTL expansion code already does a sweep over stmts in
> discover_nonconstant_array_refs which makes sure RTL expansion doesn't
> scalarize (aka assign non-stack) to variables which have accesses
> that would later eventually FAIL to expand when operating on registers.
> That's very much related to the task at hand so we should try to
> at least merge the CFG walks of both (it produces a forced_stack_vars
> bitmap).
>
> Can you work together with Martin to split out the access tree
> data structure and share it?
>
> I didn't look in detail as of how you make use of the information
> yet.
>
> Thanks,
> Richard.
>
>>
>> BR,
>> Jeff (Jiufu)
>>
>>
>> ---
>> gcc/cfgexpand.cc | 567 ++++++++++++++++++-
>> gcc/expr.cc | 15 +-
>> gcc/function.cc | 26 +-
>> gcc/opts.cc | 8 +-
>> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +-
>> gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 +
>> gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 +
>> gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++
>> 8 files changed, 675 insertions(+), 10 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>>
>> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> index 85a93a547c0..95c29b6b6fe 100644
>> --- a/gcc/cfgexpand.cc
>> +++ b/gcc/cfgexpand.cc
>> @@ -97,6 +97,564 @@ static bool defer_stack_allocation (tree, bool);
>>
>> static void record_alignment_for_reg_var (unsigned int);
>>
>> +/* For light SRA in expander about paramaters and returns. */
>> +namespace {
>> +
>> +struct access
>> +{
>> + /* Each accessing on the aggragate is about OFFSET/SIZE and BASE. */
>> + HOST_WIDE_INT offset;
>> + HOST_WIDE_INT size;
>> + tree base;
>> + bool writing;
>> +
>> + /* The context expression of this access. */
>> + tree expr;
>> +
>> + /* The rtx for the access: link to incoming/returning register(s). */
>> + rtx rtx_val;
>> +};
>> +
>> +typedef struct access *access_p;
>> +
>> +/* Expr (tree) -> Acess (access_p) map. */
>> +static hash_map<tree, access_p> *expr_access_vec;
>> +
>> +/* Base (tree) -> Vector (vec<access_p> *) map. */
>> +static hash_map<tree, auto_vec<access_p> > *base_access_vec;
>> +
>> +/* Return a vector of pointers to accesses for the variable given in BASE or
>> + NULL if there is none. */
>> +
>> +static vec<access_p> *
>> +get_base_access_vector (tree base)
>> +{
>> + return base_access_vec->get (base);
>> +}
>> +
>> +/* Remove DECL from candidates for SRA. */
>> +static void
>> +disqualify_candidate (tree decl)
>> +{
>> + decl = get_base_address (decl);
>> + base_access_vec->remove (decl);
>> +}
>> +
>> +/* Create and insert access for EXPR. Return created access, or NULL if it is
>> + not possible. */
>> +static struct access *
>> +create_access (tree expr, bool write)
>> +{
>> + poly_int64 poffset, psize, pmax_size;
>> + bool reverse;
>> +
>> + tree base
>> + = get_ref_base_and_extent (expr, &poffset, &psize, &pmax_size, &reverse);
>> +
>> + if (!DECL_P (base))
>> + return NULL;
>> +
>> + vec<access_p> *access_vec = get_base_access_vector (base);
>> + if (!access_vec)
>> + return NULL;
>> +
>> + /* TODO: support reverse. */
>> + if (reverse)
>> + {
>> + disqualify_candidate (expr);
>> + return NULL;
>> + }
>> +
>> + HOST_WIDE_INT offset, size, max_size;
>> + if (!poffset.is_constant (&offset) || !psize.is_constant (&size)
>> + || !pmax_size.is_constant (&max_size))
>> + return NULL;
>> +
>> + if (size != max_size || size == 0 || offset < 0 || size < 0
>> + || offset + size > tree_to_shwi (DECL_SIZE (base)))
>> + return NULL;
>> +
>> + struct access *access = XNEWVEC (struct access, 1);
>> +
>> + memset (access, 0, sizeof (struct access));
>> + access->base = base;
>> + access->offset = offset;
>> + access->size = size;
>> + access->expr = expr;
>> + access->writing = write;
>> + access->rtx_val = NULL_RTX;
>> +
>> + access_vec->safe_push (access);
>> +
>> + return access;
>> +}
>> +
>> +/* Return true if VAR is a candidate for SRA. */
>> +static bool
>> +add_sra_candidate (tree var)
>> +{
>> + tree type = TREE_TYPE (var);
>> +
>> + if (!AGGREGATE_TYPE_P (type) || TREE_THIS_VOLATILE (var)
>> + || !COMPLETE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type))
>> + || tree_to_shwi (TYPE_SIZE (type)) == 0
>> + || TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (va_list_type_node))
>> + return false;
>> +
>> + base_access_vec->get_or_insert (var);
>> +
>> + return true;
>> +}
>> +
>> +/* Callback of walk_stmt_load_store_addr_ops visit_addr used to remove
>> + operands with address taken. */
>> +static tree
>> +visit_addr (tree *tp, int *, void *)
>> +{
>> + tree op = *tp;
>> + if (op && DECL_P (op))
>> + disqualify_candidate (op);
>> +
>> + return NULL;
>> +}
>> +
>> +/* Scan expression EXPR and create access structures for all accesses to
>> + candidates for scalarization. Return the created access or NULL if none is
>> + created. */
>> +static struct access *
>> +build_access_from_expr (tree expr, bool write)
>> +{
>> + if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
>> + expr = TREE_OPERAND (expr, 0);
>> +
>> + if (TREE_CODE (expr) == BIT_FIELD_REF || storage_order_barrier_p (expr)
>> + || TREE_THIS_VOLATILE (expr))
>> + {
>> + disqualify_candidate (expr);
>> + return NULL;
>> + }
>> +
>> + switch (TREE_CODE (expr))
>> + {
>> + case MEM_REF: {
>> + tree op = TREE_OPERAND (expr, 0);
>> + if (TREE_CODE (op) == ADDR_EXPR)
>> + disqualify_candidate (TREE_OPERAND (op, 0));
>> + break;
>> + }
>> + case ADDR_EXPR:
>> + case IMAGPART_EXPR:
>> + case REALPART_EXPR:
>> + disqualify_candidate (TREE_OPERAND (expr, 0));
>> + break;
>> + case VAR_DECL:
>> + case PARM_DECL:
>> + case RESULT_DECL:
>> + case COMPONENT_REF:
>> + case ARRAY_REF:
>> + case ARRAY_RANGE_REF:
>> + return create_access (expr, write);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +/* Scan function and look for interesting expressions and create access
>> + structures for them. */
>> +static void
>> +scan_function (void)
>> +{
>> + basic_block bb;
>> +
>> + FOR_EACH_BB_FN (bb, cfun)
>> + {
>> + for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
>> + gsi_next (&gsi))
>> + {
>> + gphi *phi = gsi.phi ();
>> + for (size_t i = 0; i < gimple_phi_num_args (phi); i++)
>> + {
>> + tree t = gimple_phi_arg_def (phi, i);
>> + walk_tree (&t, visit_addr, NULL, NULL);
>> + }
>> + }
>> +
>> + for (gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
>> + !gsi_end_p (gsi); gsi_next_nondebug (&gsi))
>> + {
>> + gimple *stmt = gsi_stmt (gsi);
>> + switch (gimple_code (stmt))
>> + {
>> + case GIMPLE_RETURN: {
>> + tree r = gimple_return_retval (as_a<greturn *> (stmt));
>> + if (r && VAR_P (r) && r != DECL_RESULT (current_function_decl))
>> + build_access_from_expr (r, true);
>> + }
>> + break;
>> + case GIMPLE_ASSIGN:
>> + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt))
>> + {
>> + tree lhs = gimple_assign_lhs (stmt);
>> + tree rhs = gimple_assign_rhs1 (stmt);
>> + if (TREE_CODE (rhs) == CONSTRUCTOR)
>> + disqualify_candidate (lhs);
>> + else
>> + {
>> + build_access_from_expr (rhs, false);
>> + build_access_from_expr (lhs, true);
>> + }
>> + }
>> + break;
>> + default:
>> + walk_gimple_op (stmt, visit_addr, NULL);
>> + break;
>> + }
>> + }
>> + }
>> +}
>> +
>> +/* Collect the parameter and returns with type which is suitable for
>> + * scalarization. */
>> +static bool
>> +collect_light_sra_candidates (void)
>> +{
>> + bool ret = false;
>> +
>> + /* Collect parameters. */
>> + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
>> + parm = DECL_CHAIN (parm))
>> + ret |= add_sra_candidate (parm);
>> +
>> + /* Collect VARs on returns. */
>> + if (DECL_RESULT (current_function_decl))
>> + {
>> + edge_iterator ei;
>> + edge e;
>> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> + if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
>> + {
>> + tree val = gimple_return_retval (r);
>> + if (val && VAR_P (val))
>> + ret |= add_sra_candidate (val);
>> + }
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +/* Now, only scalarize the parms only with reading
>> + or returns only with writing. */
>> +bool
>> +check_access_vec (tree const &base, auto_vec<access_p> const &access_vec,
>> + auto_vec<tree> *unqualify_vec)
>> +{
>> + bool read = false;
>> + bool write = false;
>> + for (unsigned int j = 0; j < access_vec.length (); j++)
>> + {
>> + struct access *access = access_vec[j];
>> + if (access->writing)
>> + write = true;
>> + else
>> + read = true;
>> +
>> + if (write && read)
>> + break;
>> + }
>> + if ((write && read) || (!write && !read))
>> + unqualify_vec->safe_push (base);
>> +
>> + return true;
>> +}
>> +
>> +/* Analyze all the accesses, remove those inprofitable candidates.
>> + And build the expr->access map. */
>> +static void
>> +analyze_accesses ()
>> +{
>> + auto_vec<tree> unqualify_vec;
>> + base_access_vec->traverse<auto_vec<tree> *, check_access_vec> (
>> + &unqualify_vec);
>> +
>> + tree base;
>> + unsigned i;
>> + FOR_EACH_VEC_ELT (unqualify_vec, i, base)
>> + disqualify_candidate (base);
>> +}
>> +
>> +static void
>> +prepare_expander_sra ()
>> +{
>> + if (optimize <= 0)
>> + return;
>> +
>> + base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>> + expr_access_vec = new hash_map<tree, access_p>;
>> +
>> + if (collect_light_sra_candidates ())
>> + {
>> + scan_function ();
>> + analyze_accesses ();
>> + }
>> +}
>> +
>> +static void
>> +free_expander_sra ()
>> +{
>> + if (optimize <= 0 || !expr_access_vec)
>> + return;
>> + delete expr_access_vec;
>> + expr_access_vec = 0;
>> + delete base_access_vec;
>> + base_access_vec = 0;
>> +}
>> +} /* namespace */
>> +
>> +/* Check If there is an sra access for the expr.
>> + Return the correspond scalar sym for the access. */
>> +rtx
>> +get_scalar_rtx_for_aggregate_expr (tree expr)
>> +{
>> + if (!expr_access_vec)
>> + return NULL_RTX;
>> + access_p *access = expr_access_vec->get (expr);
>> + return access ? (*access)->rtx_val : NULL_RTX;
>> +}
>> +
>> +extern rtx
>> +expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int);
>> +
>> +/* Compute/Set RTX registers for those accesses on BASE. */
>> +void
>> +set_scalar_rtx_for_aggregate_access (tree base, rtx regs)
>> +{
>> + if (!base_access_vec)
>> + return;
>> + vec<access_p> *access_vec = get_base_access_vector (base);
>> + if (!access_vec)
>> + return;
>> +
>> + /* Go through each access, compute corresponding rtx(regs or subregs)
>> + for the expression. */
>> + int n = access_vec->length ();
>> + int cur_access_index = 0;
>> + for (; cur_access_index < n; cur_access_index++)
>> + {
>> + access_p acc = (*access_vec)[cur_access_index];
>> + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr));
>> + /* non BLK in mult registers*/
>> + if (expr_mode != BLKmode
>> + && known_gt (acc->size, GET_MODE_BITSIZE (word_mode)))
>> + break;
>> +
>> + int start_index = -1;
>> + int end_index = -1;
>> + HOST_WIDE_INT left_margin_bits = 0;
>> + HOST_WIDE_INT right_margin_bits = 0;
>> + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1;
>> + for (; cur_index < XVECLEN (regs, 0); cur_index++)
>> + {
>> + rtx slot = XVECEXP (regs, 0, cur_index);
>> + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT;
>> + HOST_WIDE_INT size
>> + = GET_MODE_BITSIZE (GET_MODE (XEXP (slot, 0))).to_constant ();
>> + if (off <= acc->offset && off + size > acc->offset)
>> + {
>> + start_index = cur_index;
>> + left_margin_bits = acc->offset - off;
>> + }
>> + if (off + size >= acc->offset + acc->size)
>> + {
>> + end_index = cur_index;
>> + right_margin_bits = off + size - (acc->offset + acc->size);
>> + break;
>> + }
>> + }
>> + /* accessing pading and outof bound. */
>> + if (start_index < 0 || end_index < 0)
>> + break;
>> +
>> + /* Need a parallel for possible multi-registers. */
>> + if (expr_mode == BLKmode || end_index > start_index)
>> + {
>> + /* Can not support start from middle of a register. */
>> + if (left_margin_bits != 0)
>> + break;
>> +
>> + int len = end_index - start_index + 1;
>> + const int margin = 3; /* more space for SI, HI, QI. */
>> + rtx *tmps = XALLOCAVEC (rtx, len + (right_margin_bits ? margin : 0));
>> +
>> + HOST_WIDE_INT start_off
>> + = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1));
>> + int pos = 0;
>> + for (; pos < len - (right_margin_bits ? 1 : 0); pos++)
>> + {
>> + int index = start_index + pos;
>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, index), 0);
>> + machine_mode mode = GET_MODE (orig_reg);
>> + rtx reg = NULL_RTX;
>> + if (HARD_REGISTER_P (orig_reg))
>> + {
>> + /* Reading from param hard reg need to be moved to a temp. */
>> + gcc_assert (!acc->writing);
>> + reg = gen_reg_rtx (mode);
>> + emit_move_insn (reg, orig_reg);
>> + }
>> + else
>> + reg = orig_reg;
>> +
>> + HOST_WIDE_INT off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1));
>> + tmps[pos]
>> + = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off - start_off));
>> + }
>> +
>> + /* There are some fields are in part of registers. */
>> + if (right_margin_bits != 0)
>> + {
>> + if (acc->writing)
>> + break;
>> +
>> + gcc_assert ((right_margin_bits % BITS_PER_UNIT) == 0);
>> + HOST_WIDE_INT off_byte
>> + = UINTVAL (XEXP (XVECEXP (regs, 0, end_index), 1)) - start_off;
>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, end_index), 0);
>> + machine_mode orig_mode = GET_MODE (orig_reg);
>> + gcc_assert (GET_MODE_CLASS (orig_mode) == MODE_INT);
>> +
>> + machine_mode mode_aux[] = {SImode, HImode, QImode};
>> + HOST_WIDE_INT reg_size
>> + = GET_MODE_BITSIZE (orig_mode).to_constant ();
>> + HOST_WIDE_INT off_bits = 0;
>> + for (unsigned long j = 0;
>> + j < sizeof (mode_aux) / sizeof (mode_aux[0]); j++)
>> + {
>> + HOST_WIDE_INT submode_bitsize
>> + = GET_MODE_BITSIZE (mode_aux[j]).to_constant ();
>> + if (reg_size - right_margin_bits - off_bits
>> + >= submode_bitsize)
>> + {
>> + rtx reg = gen_reg_rtx (orig_mode);
>> + emit_move_insn (reg, orig_reg);
>> +
>> + poly_uint64 lowpart_off
>> + = subreg_lowpart_offset (mode_aux[j], orig_mode);
>> + int lowpart_off_bits
>> + = lowpart_off.to_constant () * BITS_PER_UNIT;
>> + int shift_bits = lowpart_off_bits >= off_bits
>> + ? (lowpart_off_bits - off_bits)
>> + : (off_bits - lowpart_off_bits);
>> + if (shift_bits > 0)
>> + reg = expand_shift (RSHIFT_EXPR, orig_mode, reg,
>> + shift_bits, NULL, 1);
>> + rtx subreg = gen_lowpart (mode_aux[j], reg);
>> + rtx off = GEN_INT (off_byte);
>> + tmps[pos++]
>> + = gen_rtx_EXPR_LIST (mode_aux[j], subreg, off);
>> + off_byte += submode_bitsize / BITS_PER_UNIT;
>> + off_bits += submode_bitsize;
>> + }
>> + }
>> + }
>> +
>> + /* Currently, PARALLELs with register elements for param/returns
>> + are using BLKmode. */
>> + acc->rtx_val = gen_rtx_PARALLEL (TYPE_MODE (TREE_TYPE (acc->expr)),
>> + gen_rtvec_v (pos, tmps));
>> + continue;
>> + }
>> +
>> + /* The access corresponds to one reg. */
>> + if (end_index == start_index && left_margin_bits == 0
>> + && right_margin_bits == 0)
>> + {
>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0);
>> + rtx reg = NULL_RTX;
>> + if (HARD_REGISTER_P (orig_reg))
>> + {
>> + /* Reading from param hard reg need to be moved to a temp. */
>> + gcc_assert (!acc->writing);
>> + reg = gen_reg_rtx (GET_MODE (orig_reg));
>> + emit_move_insn (reg, orig_reg);
>> + }
>> + else
>> + reg = orig_reg;
>> + if (GET_MODE (orig_reg) != expr_mode)
>> + reg = gen_lowpart (expr_mode, reg);
>> +
>> + acc->rtx_val = reg;
>> + continue;
>> + }
>> +
>> + /* It is accessing a filed which is part of a register. */
>> + scalar_int_mode imode;
>> + if (!acc->writing && end_index == start_index
>> + && int_mode_for_size (acc->size, 1).exists (&imode))
>> + {
>> + /* get and copy original register inside the param. */
>> + rtx orig_reg = XEXP (XVECEXP (regs, 0, start_index), 0);
>> + machine_mode mode = GET_MODE (orig_reg);
>> + gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
>> + rtx reg = gen_reg_rtx (mode);
>> + emit_move_insn (reg, orig_reg);
>> +
>> + /* shift to expect part. */
>> + poly_uint64 lowpart_off = subreg_lowpart_offset (imode, mode);
>> + int lowpart_off_bits = lowpart_off.to_constant () * BITS_PER_UNIT;
>> + int shift_bits = lowpart_off_bits >= left_margin_bits
>> + ? (lowpart_off_bits - left_margin_bits)
>> + : (left_margin_bits - lowpart_off_bits);
>> + if (shift_bits > 0)
>> + reg = expand_shift (RSHIFT_EXPR, mode, reg, shift_bits, NULL, 1);
>> +
>> + /* move corresond part subreg to result. */
>> + rtx subreg = gen_lowpart (imode, reg);
>> + rtx result = gen_reg_rtx (imode);
>> + emit_move_insn (result, subreg);
>> +
>> + if (expr_mode != imode)
>> + result = gen_lowpart (expr_mode, result);
>> +
>> + acc->rtx_val = result;
>> + continue;
>> + }
>> +
>> + break;
>> + }
>> +
>> + /* Some access expr(s) are not scalarized. */
>> + if (cur_access_index != n)
>> + disqualify_candidate (base);
>> + else
>> + {
>> + /* Add elements to expr->access map. */
>> + for (int j = 0; j < n; j++)
>> + {
>> + access_p access = (*access_vec)[j];
>> + expr_access_vec->put (access->expr, access);
>> + }
>> + }
>> +}
>> +
>> +void
>> +set_scalar_rtx_for_returns ()
>> +{
>> + tree res = DECL_RESULT (current_function_decl);
>> + gcc_assert (res);
>> + edge_iterator ei;
>> + edge e;
>> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> + if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
>> + {
>> + tree val = gimple_return_retval (r);
>> + if (val && VAR_P (val))
>> + set_scalar_rtx_for_aggregate_access (val, DECL_RTL (res));
>> + }
>> +}
>> +
>> /* Return an expression tree corresponding to the RHS of GIMPLE
>> statement STMT. */
>>
>> @@ -3778,7 +4336,8 @@ expand_return (tree retval)
>>
>> /* If we are returning the RESULT_DECL, then the value has already
>> been stored into it, so we don't have to do anything special. */
>> - if (TREE_CODE (retval_rhs) == RESULT_DECL)
>> + if (TREE_CODE (retval_rhs) == RESULT_DECL
>> + || get_scalar_rtx_for_aggregate_expr (retval_rhs))
>> expand_value_return (result_rtl);
>>
>> /* If the result is an aggregate that is being returned in one (or more)
>> @@ -4422,6 +4981,9 @@ expand_debug_expr (tree exp)
>> int unsignedp = TYPE_UNSIGNED (TREE_TYPE (exp));
>> addr_space_t as;
>> scalar_int_mode op0_mode, op1_mode, addr_mode;
>> + rtx x = get_scalar_rtx_for_aggregate_expr (exp);
>> + if (x)
>> + return NULL_RTX;/* optimized out. */
>>
>> switch (TREE_CODE_CLASS (TREE_CODE (exp)))
>> {
>> @@ -6630,6 +7192,8 @@ pass_expand::execute (function *fun)
>> avoid_deep_ter_for_debug (gsi_stmt (gsi), 0);
>> }
>>
>> + prepare_expander_sra ();
>> +
>> /* Mark arrays indexed with non-constant indices with TREE_ADDRESSABLE. */
>> auto_bitmap forced_stack_vars;
>> discover_nonconstant_array_refs (forced_stack_vars);
>> @@ -7062,6 +7626,7 @@ pass_expand::execute (function *fun)
>> loop_optimizer_finalize ();
>> }
>>
>> + free_expander_sra ();
>> timevar_pop (TV_POST_EXPAND);
>>
>> return 0;
>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> index 56b51876f80..b970f98e689 100644
>> --- a/gcc/expr.cc
>> +++ b/gcc/expr.cc
>> @@ -100,6 +100,7 @@ static void do_tablejump (rtx, machine_mode, rtx, rtx, rtx,
>> static rtx const_vector_from_tree (tree);
>> static tree tree_expr_size (const_tree);
>> static void convert_mode_scalar (rtx, rtx, int);
>> +rtx get_scalar_rtx_for_aggregate_expr (tree);
>>
>> \f
>> /* This is run to set up which modes can be used
>> @@ -5623,11 +5624,12 @@ expand_assignment (tree to, tree from, bool nontemporal)
>> Assignment of an array element at a constant index, and assignment of
>> an array element in an unaligned packed structure field, has the same
>> problem. Same for (partially) storing into a non-memory object. */
>> - if (handled_component_p (to)
>> - || (TREE_CODE (to) == MEM_REF
>> - && (REF_REVERSE_STORAGE_ORDER (to)
>> - || mem_ref_refers_to_non_mem_p (to)))
>> - || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE)
>> + if (!get_scalar_rtx_for_aggregate_expr (to)
>> + && (handled_component_p (to)
>> + || (TREE_CODE (to) == MEM_REF
>> + && (REF_REVERSE_STORAGE_ORDER (to)
>> + || mem_ref_refers_to_non_mem_p (to)))
>> + || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE))
>> {
>> machine_mode mode1;
>> poly_int64 bitsize, bitpos;
>> @@ -8995,6 +8997,9 @@ expand_expr_real (tree exp, rtx target, machine_mode tmode,
>> ret = CONST0_RTX (tmode);
>> return ret ? ret : const0_rtx;
>> }
>> + rtx x = get_scalar_rtx_for_aggregate_expr (exp);
>> + if (x)
>> + return x;
>>
>> ret = expand_expr_real_1 (exp, target, tmode, modifier, alt_rtl,
>> inner_reference_p);
>> diff --git a/gcc/function.cc b/gcc/function.cc
>> index 82102ed78d7..262d3f17e72 100644
>> --- a/gcc/function.cc
>> +++ b/gcc/function.cc
>> @@ -2742,6 +2742,9 @@ assign_parm_find_stack_rtl (tree parm, struct assign_parm_data_one *data)
>> data->stack_parm = stack_parm;
>> }
>>
>> +extern void
>> +set_scalar_rtx_for_aggregate_access (tree, rtx);
>> +
>> /* A subroutine of assign_parms. Adjust DATA->ENTRY_RTL such that it's
>> always valid and contiguous. */
>>
>> @@ -3117,8 +3120,21 @@ assign_parm_setup_block (struct assign_parm_data_all *all,
>> emit_move_insn (mem, entry_parm);
>> }
>> else
>> - move_block_from_reg (REGNO (entry_parm), mem,
>> - size_stored / UNITS_PER_WORD);
>> + {
>> + int regno = REGNO (entry_parm);
>> + int nregs = size_stored / UNITS_PER_WORD;
>> + move_block_from_reg (regno, mem, nregs);
>> +
>> + rtx *tmps = XALLOCAVEC (rtx, nregs);
>> + machine_mode mode = word_mode;
>> + for (int i = 0; i < nregs; i++)
>> + tmps[i] = gen_rtx_EXPR_LIST (
>> + VOIDmode, gen_rtx_REG (mode, regno + i),
>> + GEN_INT (GET_MODE_SIZE (mode).to_constant () * i));
>> +
>> + rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps));
>> + set_scalar_rtx_for_aggregate_access (parm, regs);
>> + }
>> }
>> else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type))
>> {
>> @@ -3718,6 +3734,10 @@ assign_parms (tree fndecl)
>> else
>> set_decl_incoming_rtl (parm, data.entry_parm, false);
>>
>> + rtx incoming = DECL_INCOMING_RTL (parm);
>> + if (GET_CODE (incoming) == PARALLEL)
>> + set_scalar_rtx_for_aggregate_access (parm, incoming);
>> +
>> assign_parm_adjust_stack_rtl (&data);
>>
>> if (assign_parm_setup_block_p (&data))
>> @@ -5037,6 +5057,7 @@ stack_protect_epilogue (void)
>> the function's parameters, which must be run at any return statement. */
>>
>> bool currently_expanding_function_start;
>> +extern void set_scalar_rtx_for_returns ();
>> void
>> expand_function_start (tree subr)
>> {
>> @@ -5138,6 +5159,7 @@ expand_function_start (tree subr)
>> {
>> gcc_assert (GET_CODE (hard_reg) == PARALLEL);
>> set_parm_rtl (res, gen_group_rtx (hard_reg));
>> + set_scalar_rtx_for_returns ();
>> }
>> }
>>
>> diff --git a/gcc/opts.cc b/gcc/opts.cc
>> index 86b94d62b58..5e129a1cc49 100644
>> --- a/gcc/opts.cc
>> +++ b/gcc/opts.cc
>> @@ -1559,6 +1559,10 @@ public:
>> vec<const char *> m_values;
>> };
>>
>> +#ifdef __GNUC__
>> +#pragma GCC diagnostic push
>> +#pragma GCC diagnostic ignored "-Wformat-truncation"
>> +#endif
>> /* Print help for a specific front-end, etc. */
>> static void
>> print_filtered_help (unsigned int include_flags,
>> @@ -1913,7 +1917,9 @@ print_filtered_help (unsigned int include_flags,
>> printf ("\n\n");
>> }
>> }
>> -
>> +#ifdef __GNUC__
>> +#pragma GCC diagnostic pop
>> +#endif
>> /* Display help for a specified type of option.
>> The options must have ALL of the INCLUDE_FLAGS set
>> ANY of the flags in the ANY_FLAGS set
>> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> index 769585052b5..c8995cae707 100644
>> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
>> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
>> @@ -5,7 +5,7 @@
>> // Test that a zero-width bit field in an otherwise homogeneous aggregate
>> // generates a psabi warning and passes arguments in GPRs.
>>
>> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
>> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>>
>> struct a_thing
>> {
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> new file mode 100644
>> index 00000000000..7dd1a4a326a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
>> @@ -0,0 +1,29 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -save-temps" } */
>> +
>> +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF;
>> +typedef struct SF {float a[4]; int i1; int i2; } SF;
>> +
>> +/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
>> +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
>> +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */
>> +short __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return a.s2+a.s3;return 0;}
>> +int __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return a.i2+a.i1;return 0;}
>> +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 2)return arg.a[3];else return 0.0;}
>> +float __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 2)return arg.a[2]; return 0;}
>> +float __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 2)return arg.a[1];return 0;}
>> +
>> +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4};
>> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2};
>> +
>> +int main()
>> +{
>> + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0
>> + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
>> + __builtin_abort ();
>> + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
>> + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
>> + __builtin_abort ();
>> + return 0;
>> +}
>> +
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>> new file mode 100644
>> index 00000000000..4e1f87f7939
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>> @@ -0,0 +1,6 @@
>> +/* PR target/65421 */
>> +/* { dg-options "-O2" } */
>> +
>> +typedef struct LARGE {double a[4]; int arr[32];} LARGE;
>> +LARGE foo (LARGE a){return a;}
>> +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>> new file mode 100644
>> index 00000000000..8a8e1a0e996
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>> @@ -0,0 +1,32 @@
>> +/* PR target/65421 */
>> +/* { dg-options "-O2" } */
>> +/* { dg-require-effective-target powerpc_elfv2 } */
>> +/* { dg-require-effective-target has_arch_ppc64 } */
>> +
>> +typedef struct FLOATS
>> +{
>> + double a[3];
>> +} FLOATS;
>> +
>> +/* 3 lfd after returns also optimized */
>> +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */
>> +
>> +/* 3 stfd */
>> +void st_arg (FLOATS a, FLOATS *p) {*p = a;}
>> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
>> +
>> +/* blr */
>> +FLOATS ret_arg (FLOATS a) {return a;}
>> +
>> +typedef struct MIX
>> +{
>> + double a[2];
>> + long l;
>> +} MIX;
>> +
>> +/* std 3 param regs to return slot */
>> +MIX ret_arg1 (MIX a) {return a;}
>> +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
>> +
>> +/* count insns */
>> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
next prev parent reply other threads:[~2023-06-01 15:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-29 3:52 Jiufu Guo
2023-05-30 6:46 ` Richard Biener
2023-06-01 15:27 ` Martin Jambor [this message]
2023-06-02 3:50 ` Jiufu Guo
2023-07-13 13:26 ` Jiufu Guo
2023-07-24 5:29 ` Jiufu Guo
2023-07-31 9:06 ` Richard Biener
2023-08-01 6:55 ` Jiufu Guo
2023-08-01 12:40 ` Jiufu Guo
2023-08-02 12:41 ` Richard Biener
2023-08-02 16:11 ` guojiufu
2023-08-03 6:22 ` Jiufu Guo
2023-08-03 11:31 ` Richard Biener
2023-08-05 12:14 ` Jiufu Guo
2023-08-07 7:11 ` Richard Biener
2023-06-02 3:56 ` Jiufu Guo
2023-05-30 6:54 ` Andrew Pinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ri6cz2f8aop.fsf@suse.cz \
--to=mjambor@suse.cz \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=guojiufu@linux.ibm.com \
--cc=jeffreyalaw@gmail.com \
--cc=linkw@gcc.gnu.org \
--cc=rguenther@suse.de \
--cc=segher@kernel.crashing.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).