From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id A15CC3858C41; Fri, 17 Nov 2023 08:21:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A15CC3858C41 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A15CC3858C41 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700209296; cv=none; b=B6CBBVevKIt/kILY/Awnpy+459PLhjtemT42AtksSjEEjDhCIyjC1ic6S7v3O6kxzQdBBYisz6P4HdwtdIh9KSN2RMgWY/VJ4MG2mSVZudiPsZbhZKZTDUyvfW+P6BoI7fQK2hi3akHMtRr0d9Iwjz0RvE0/MAtShtZbyfD2W/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700209296; c=relaxed/simple; bh=kGXbSu9EtLNr0JYnTSxeX39exIqo98HbF7P0TXOZJAY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=BD4hpPV3BlwLNm31v8Wn+a0avZVhA7MT7mbbuV4xz67WbilLBukvw3WLBZmqjgczWF2s0f2RQETl6Cl6Atz2bli8QPufugCfAvGeDqQ/n0RjZZPYW0cxRiZ5+7oztTXTDAd4W6VEMMygh/pxghXiHz5ZA+lYw3ejKLCfOsDXB2k= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AH8L537001015; Fri, 17 Nov 2023 08:21:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : content-type : mime-version; s=pp1; bh=XC+okHg+AKCD+1x+wQHdlHicAXbRTy6r4ZBxjzW9AmA=; b=WJNu36qZH1iDecxF6GKO1AR822E3A1XCSqwXUDOTehmqMVn5L2g4Et3k/rnFAoMNDyrp ravqGBBZUHW2zz1bNDyFZC0mzfzFWspu9q3WBXN03sgs6GbNihlZvi/5L8RjDl6wv1xi EszhjopG17J6WxBLF8QSm3725r/cfzwog1gMIvQ2EkpSHrq56DjJYFIdUEpGh8O0aPQm 5/3pqSL3Gnc36qRnsjrDdzzuD+PVNwNd6cm+f0dFhrK1zgGweMJyCZVayPYMn+SCPp2d 4BnhHneoJnWH6x/GYLwOSeAnxz83I4mAmm+KKW17yP/CAY67NiPqZzH3KZrw1Ts9F3Yq BQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ue43fru23-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Nov 2023 08:21:27 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AH8LBLQ001972; Fri, 17 Nov 2023 08:21:27 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3ue43fru1b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Nov 2023 08:21:27 +0000 Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3AH6KCX4026260; Fri, 17 Nov 2023 08:21:25 GMT Received: from smtprelay04.wdc07v.mail.ibm.com ([172.16.1.71]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 3uapn24142-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Nov 2023 08:21:25 +0000 Received: from smtpav02.wdc07v.mail.ibm.com (smtpav02.wdc07v.mail.ibm.com [10.39.53.229]) by smtprelay04.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3AH8LOWB45548118 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 17 Nov 2023 08:21:24 GMT Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AFC3158058; Fri, 17 Nov 2023 08:21:24 +0000 (GMT) Received: from smtpav02.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 00E8D5805D; Fri, 17 Nov 2023 08:21:24 +0000 (GMT) Received: from genoa (unknown [9.40.192.157]) by smtpav02.wdc07v.mail.ibm.com (Postfix) with ESMTPS; Fri, 17 Nov 2023 08:21:23 +0000 (GMT) From: Jiufu Guo To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, steven@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com Subject: Ping: [PATCH V2] introduce light expander sra References: <20231027015036.3868319-1-guojiufu@linux.ibm.com> Date: Fri, 17 Nov 2023 16:21:20 +0800 In-Reply-To: <20231027015036.3868319-1-guojiufu@linux.ibm.com> (Jiufu Guo's message of "Fri, 27 Oct 2023 09:50:36 +0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: svrIysMfoxOmcyEjx37LMo5Wa87NbN0U X-Proofpoint-GUID: 7DBH65ztQ9gxsw_zNbkIXl6T_Fvdmm8S X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-17_06,2023-11-16_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 bulkscore=0 spamscore=0 mlxscore=0 suspectscore=0 malwarescore=0 mlxlogscore=999 clxscore=1011 impostorscore=0 phishscore=0 adultscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311170060 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, I would like to have a ping for this patch. There are some aspects(TODOs) that can be improved for this patch. I had some statistics to see if these aspects occur often, by checking the gcc source code(including test suite) and spec2017. - Reverse storage order. This only occurs in very few tests that are using the attribute. - Writing to parameters. In this kind of case only ~12 hits in the gcc source code. (some hits in one Go file.) - Overlapping access to parameters. For overlapping reading, it is supported already. For writing overlapping reading, it is not very common, because writing to parameter is rare. - Bitfields extracting from parameter mult-registers. This occurs twice only in gcc code (and hit in one 'go' file:h2_bundle.go), 6 files in the test suite, and 1 hit in spec2017. I'm thinking of enhancing the patch incrementally. Thanks for the comments in advance. BR, Jeff (Jiufu Guo) Jiufu Guo writes: > Hi, > > Compare with previous version: > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html > This verion supports TI/VEC mode of the access. > > There are a few PRs (meta-bug PR101926) on various targets. > The root causes of them are similar: the aggeragte param/ > returns are passed by multi-registers, but they are stored > to stack from registers first; and then, access the > parameter through stack slot. > > A general idea to enhance this: accessing the aggregate > parameters/returns directly through incoming/outgoing > scalar registers. This idea would be a kind of SRA. > > This experimental patch for light-expander-sra contains > below parts: > > a. Check if the parameters/returns are ok/profitable to > scalarize, and set the incoming/outgoing registers( > pseudos) for the parameter/return. > - This is done in "expand_function_start", after the > incoming/outgoing hard registers are determined for the > paramter/return. > The scalarized registers are recorded in DECL_RTL for > the parameter/return in parallel form. > - At the time when setting DECL_RTL, "scalarizable_aggregate" > is called to check the accesses are ok/profitable to > scalarize. > We can continue to enhance this function, to support > more cases. For example: > - 'reverse storage order'. > - 'writing to parameter'/'overlap accesses'. > > b. When expanding the accesses of the parameters/returns, > according to the info of the access(e.g. bitpos,bitsize, > mode), the scalar(pseudos) can be figured out to expand > the access. This may happen when expand below accesses: > - The component access of a parameter: "_1 = arg.f1". > Or whole parameter access: rhs of "_2 = arg" > - The assignment to a return val: > "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return > stmt. > - This is mainly done in expr.cc(expand_expr_real_1, and > expand_assignment). Function "extract_sub_member" is > used to figure out the scalar rtxs(pseudos). > > Besides the above two parts, some work are done in the GIMPLE > tree: collect sra candidates for parameters/returns, and > collect the SRA access info. > This is mainly done at the beginning of the expander pass. > Below are two major items of this part. > - Collect light-expand-sra candidates. > Each parameter is checked if it has the proper aggregate > type. Collect return val (VAR_P) on each return stmts if > the function is returning via registers. > This is implemented in expand_sra::collect_sra_candidates. > > - Build/collect/manage all the access on the candidates. > The function "scan_function" is used to do this work, it > goes through all basicblocks, and all interesting stmts ( > phi, return, assign, call, asm) are checked. > If there is an interesting expression (e.g. COMPONENT_REF > or PARM_DECL), then record the required info for the access > (e.g. pos, size, type, base). > And if it is risky to do SRA, the candidates may be removed. > e.g. address-taken and accessed via memory. > "foo(struct S arg) {bar (&arg);}" > > This patch is tested on ppc64{,le} and x86_64. > Is this ok for trunk? > > BR, > Jeff (Jiufu Guo) > > PR target/65421 > > gcc/ChangeLog: > > * cfgexpand.cc (struct access): New class. > (struct expand_sra): New class. > (expand_sra::collect_sra_candidates): New member function. > (expand_sra::add_sra_candidate): Likewise. > (expand_sra::build_access): Likewise. > (expand_sra::analyze_phi): Likewise. > (expand_sra::analyze_assign): Likewise. > (expand_sra::visit_base): Likewise. > (expand_sra::protect_mem_access_in_stmt): Likewise. > (expand_sra::expand_sra): Class constructor. > (expand_sra::~expand_sra): Class destructor. > (expand_sra::scalarizable_access): New member function. > (expand_sra::scalarizable_accesses): Likewise. > (scalarizable_aggregate): New function. > (set_scalar_rtx_for_returns): New function. > (expand_value_return): Updated. > (expand_debug_expr): Updated. > (pass_expand::execute): Updated to use expand_sra. > * cfgexpand.h (scalarizable_aggregate): New declare. > (set_scalar_rtx_for_returns): New declare. > * expr.cc (expand_assignment): Updated. > (expand_constructor): Updated. > (query_position_in_parallel): New function. > (extract_sub_member): New function. > (expand_expr_real_1): Updated. > * expr.h (query_position_in_parallel): New declare. > * function.cc (assign_parm_setup_block): Updated. > (assign_parms): Updated. > (expand_function_start): Updated. > * tree-sra.h (struct sra_base_access): New class. > (struct sra_default_analyzer): New class. > (scan_function): New template function. > * var-tracking.cc (track_loc_p): Updated. > > gcc/testsuite/ChangeLog: > > * g++.target/powerpc/pr102024.C: Updated > * gcc.target/powerpc/pr108073.c: New test. > * gcc.target/powerpc/pr65421-1.c: New test. > * gcc.target/powerpc/pr65421-2.c: New test. > > --- > gcc/cfgexpand.cc | 352 ++++++++++++++++++- > gcc/cfgexpand.h | 2 + > gcc/expr.cc | 179 +++++++++- > gcc/expr.h | 3 + > gcc/function.cc | 36 +- > gcc/tree-sra.h | 76 ++++ > gcc/var-tracking.cc | 3 +- > gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- > gcc/testsuite/gcc.target/i386/pr20020-2.c | 5 + > gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 ++ > gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + > gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ > 12 files changed, 718 insertions(+), 7 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c > > diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > index 4262703a138..ef99ca8ac13 100644 > --- a/gcc/cfgexpand.cc > +++ b/gcc/cfgexpand.cc > @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. If not see > #include "output.h" > #include "builtins.h" > #include "opts.h" > +#include "tree-sra.h" > > /* Some systems use __main in a way incompatible with its use in gcc, in these > cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to > @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool); > > static void record_alignment_for_reg_var (unsigned int); > > +/* For light SRA in expander about paramaters and returns. */ > +struct access : public sra_base_access > +{ > +}; > + > +typedef struct access *access_p; > + > +struct expand_sra : public sra_default_analyzer > +{ > + /* Construct/destruct resources, e.g. sra candidates. */ > + expand_sra (); > + ~expand_sra (); > + > + /* No actions for pre_analyze_stmt, analyze_return. */ > + > + /* Overwrite phi,call,asm analyzations. */ > + void analyze_phi (gphi *s); > + > + /* TODO: Check accesses on call/asm. */ > + void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); }; > + void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); }; > + > + /* Check access of SRA on assignment. */ > + void analyze_assign (gassign *); > + > + /* Check if the accesses of BASE(parameter or return) are > + scalarizable, according to the incoming/outgoing REGS. */ > + bool scalarizable_accesses (tree base, rtx regs); > + > +private: > + /* Collect the parameter and returns to check if they are suitable for > + scalarization. */ > + bool collect_sra_candidates (void); > + > + /* Return true if VAR is added as a candidate for SRA. */ > + bool add_sra_candidate (tree var); > + > + /* Return true if EXPR has interesting sra access, and created access, > + return false otherwise. */ > + access_p build_access (tree expr, bool write); > + > + /* Check if the access ACC is scalarizable. REGS is the incoming/outgoing > + registers which the access is based on. */ > + bool scalarizable_access (access_p acc, rtx regs, bool is_parm); > + > + /* If there is risk (stored/loaded or addr taken), > + disqualify the sra candidates in the un-interesting STMT. */ > + void protect_mem_access_in_stmt (gimple *stmt); > + > + /* Callback of walk_stmt_load_store_addr_ops, used to remove > + unscalarizable accesses. */ > + static bool visit_base (gimple *, tree op, tree, void *data); > + > + /* Base (tree) -> Vector (vec *) map. */ > + hash_map > *base_access_vec; > +}; > + > +bool > +expand_sra::collect_sra_candidates (void) > +{ > + bool ret = false; > + > + /* Collect parameters. */ > + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; > + parm = DECL_CHAIN (parm)) > + ret |= add_sra_candidate (parm); > + > + /* Collect VARs on returns. */ > + if (DECL_RESULT (current_function_decl)) > + { > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) > + { > + tree val = gimple_return_retval (r); > + /* To sclaraized the return, the return value should be only > + writen (except this return stmt). Using 'true(write)' to > + pretend the access only be 'writen'. */ > + if (val && VAR_P (val)) > + ret |= add_sra_candidate (val) && build_access (val, true); > + } > + } > + > + return ret; > +} > + > +bool > +expand_sra::add_sra_candidate (tree var) > +{ > + tree type = TREE_TYPE (var); > + > + if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) > + || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var) > + || is_va_list_type (type)) > + return false; > + > + base_access_vec->get_or_insert (var); > + > + return true; > +} > + > +access_p > +expand_sra::build_access (tree expr, bool write) > +{ > + enum tree_code code = TREE_CODE (expr); > + if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF > + && code != ARRAY_REF && code != ARRAY_RANGE_REF) > + return NULL; > + > + HOST_WIDE_INT offset, size; > + bool reverse; > + tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, &reverse); > + if (!base || !DECL_P (base)) > + return NULL; > + > + vec *access_vec = base_access_vec->get (base); > + if (!access_vec) > + return NULL; > + > + /* TODO: support reverse. */ > + if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE (base))) > + { > + base_access_vec->remove (base); > + return NULL; > + } > + > + struct access *access = XNEWVEC (struct access, 1); > + > + memset (access, 0, sizeof (struct access)); > + access->offset = offset; > + access->size = size; > + access->expr = expr; > + access->write = write; > + access->reverse = reverse; > + > + access_vec->safe_push (access); > + > + return access; > +} > + > +/* Function protect_mem_access_in_stmt removes the SRA candidates if > + there is addr-taken on the candidate in the STMT. */ > + > +void > +expand_sra::analyze_phi (gphi *stmt) > +{ > + if (base_access_vec && !base_access_vec->is_empty ()) > + walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base); > +} > + > +void > +expand_sra::analyze_assign (gassign *stmt) > +{ > + if (!base_access_vec || base_access_vec->is_empty ()) > + return; > + > + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt)) > + { > + tree rhs = gimple_assign_rhs1 (stmt); > + tree lhs = gimple_assign_lhs (stmt); > + bool res_r = build_access (rhs, false); > + bool res_l = build_access (lhs, true); > + > + if (res_l || res_r) > + return; > + } > + > + protect_mem_access_in_stmt (stmt); > +} > + > +/* Callback of walk_stmt_load_store_addr_ops, used to remove > + unscalarizable accesses. Called by protect_mem_access_in_stmt. */ > + > +bool > +expand_sra::visit_base (gimple *, tree op, tree, void *data) > +{ > + op = get_base_address (op); > + if (op && DECL_P (op)) > + { > + expand_sra *p = (expand_sra *) data; > + p->base_access_vec->remove (op); > + } > + return false; > +} > + > +/* Function protect_mem_access_in_stmt removes the SRA candidates if > + there is store/load/addr-taken on the candidate in the STMT. > + > + For some statements, which SRA does not care about, if there are > + possible memory operation on the SRA candidates, it would be risky > + to scalarize it. */ > + > +void > +expand_sra::protect_mem_access_in_stmt (gimple *stmt) > +{ > + if (base_access_vec && !base_access_vec->is_empty ()) > + walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base, > + visit_base); > +} > + > +expand_sra::expand_sra () : base_access_vec (NULL) > +{ > + if (optimize <= 0) > + return; > + > + base_access_vec = new hash_map >; > + collect_sra_candidates (); > +} > + > +expand_sra::~expand_sra () > +{ > + if (optimize <= 0) > + return; > + > + delete base_access_vec; > +} > + > +bool > +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm) > +{ > + /* Now only support reading from parms > + or writing to returns. */ > + if (is_parm && acc->write) > + return false; > + if (!is_parm && !acc->write) > + return false; > + > + /* Compute the position of the access in the parallel regs. */ > + int start_index = -1; > + int end_index = -1; > + HOST_WIDE_INT left_bits = 0; > + HOST_WIDE_INT right_bits = 0; > + query_position_in_parallel (acc->offset, acc->size, regs, start_index, > + end_index, left_bits, right_bits); > + > + /* Invalid access possition: padding or outof bound. */ > + if (start_index < 0 || end_index < 0) > + return false; > + > + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr)); > + /* Need multi-registers in a parallel for the access. */ > + if (expr_mode == BLKmode || end_index > start_index) > + { > + if (left_bits || right_bits) > + return false; > + if (expr_mode == BLKmode) > + return true; > + > + /* For large modes, only support TI/VECTOR in mult-registers. */ > + if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) > + return expr_mode == TImode || VECTOR_MODE_P (expr_mode); > + return true; > + } > + > + gcc_assert (end_index == start_index); > + > + /* Just need one reg for the access. */ > + if (left_bits == 0 && right_bits == 0) > + return true; > + > + scalar_int_mode imode; > + /* Need to extract bits from the reg for the access. */ > + return !acc->write && int_mode_for_mode (expr_mode).exists (&imode); > +} > + > +/* Now, the base (parm/return) is scalarizable, only if all > + accesses of the BASE are scalariable. > + > + This function need to be updated, to support more complicate > + cases, like: > + - Some access are scalarizable, but some are not. > + - Access is writing to a parameter. > + - Writing accesses are overlap with multi-accesses. */ > + > +bool > +expand_sra::scalarizable_accesses (tree base, rtx regs) > +{ > + if (!base_access_vec) > + return false; > + vec *access_vec = base_access_vec->get (base); > + if (!access_vec) > + return false; > + if (access_vec->is_empty ()) > + return false; > + > + bool is_parm = TREE_CODE (base) == PARM_DECL; > + int n = access_vec->length (); > + int cur_access_index = 0; > + for (; cur_access_index < n; cur_access_index++) > + if (!scalarizable_access ((*access_vec)[cur_access_index], regs, is_parm)) > + break; > + > + /* It is ok if all access are scalarizable. */ > + if (cur_access_index == n) > + return true; > + > + base_access_vec->remove (base); > + return false; > +} > + > +static expand_sra *current_sra = NULL; > + > +/* Check if the PARAM (or return) is scalarizable. > + > + This interface is used in expand_function_start > + to check sra possiblity for parmeters. */ > + > +bool > +scalarizable_aggregate (tree parm, rtx regs) > +{ > + if (!current_sra) > + return false; > + return current_sra->scalarizable_accesses (parm, regs); > +} > + > +/* Check if interesting returns, and if they are scalarizable, > + set DECL_RTL as scalar registers. > + > + This interface is used in expand_function_start > + when outgoing registers are determinded for DECL_RESULT. */ > + > +void > +set_scalar_rtx_for_returns () > +{ > + rtx res = DECL_RTL (DECL_RESULT (current_function_decl)); > + edge_iterator ei; > + edge e; > + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) > + { > + tree val = gimple_return_retval (r); > + if (val && VAR_P (val) && scalarizable_aggregate (val, res)) > + SET_DECL_RTL (val, res); > + } > +} > + > /* Return an expression tree corresponding to the RHS of GIMPLE > statement STMT. */ > > @@ -3707,7 +4045,7 @@ expand_value_return (rtx val) > > tree decl = DECL_RESULT (current_function_decl); > rtx return_reg = DECL_RTL (decl); > - if (return_reg != val) > + if (!rtx_equal_p (return_reg, val)) > { > tree funtype = TREE_TYPE (current_function_decl); > tree type = TREE_TYPE (decl); > @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp) > addr_space_t as; > scalar_int_mode op0_mode, op1_mode, addr_mode; > > + /* TODO: Enable to debug expand-sra optimized parm/returns. */ > + tree base = get_base_address (exp); > + if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P (base))) > + && GET_CODE (DECL_RTL (base)) == PARALLEL) > + return NULL_RTX; > + > switch (TREE_CODE_CLASS (TREE_CODE (exp))) > { > case tcc_expression: > @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun) > auto_bitmap forced_stack_vars; > discover_nonconstant_array_refs (forced_stack_vars); > > + /* Enable light-expander-sra. */ > + current_sra = new expand_sra; > + scan_function (cfun, *current_sra); > + > /* Make sure all values used by the optimization passes have sane > defaults. */ > reg_renumber = 0; > @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun) > loop_optimizer_finalize (); > } > > + delete current_sra; > + current_sra = NULL; > timevar_pop (TV_POST_EXPAND); > > return 0; > diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h > index 0e551f6cfd3..3415c217708 100644 > --- a/gcc/cfgexpand.h > +++ b/gcc/cfgexpand.h > @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *); > extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *); > extern void set_parm_rtl (tree, rtx); > > +extern bool scalarizable_aggregate (tree, rtx); > +extern void set_scalar_rtx_for_returns (); > > #endif /* GCC_CFGEXPAND_H */ > diff --git a/gcc/expr.cc b/gcc/expr.cc > index 763bd82c59f..5ba26e0ef52 100644 > --- a/gcc/expr.cc > +++ b/gcc/expr.cc > @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool nontemporal) > Assignment of an array element at a constant index, and assignment of > an array element in an unaligned packed structure field, has the same > problem. Same for (partially) storing into a non-memory object. */ > - if (handled_component_p (to) > + if ((handled_component_p (to) > + && !(VAR_P (get_base_address (to)) > + && DECL_RTL_SET_P (get_base_address (to)) > + && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL)) > || (TREE_CODE (to) == MEM_REF > && (REF_REVERSE_STORAGE_ORDER (to) > || mem_ref_refers_to_non_mem_p (to))) > @@ -8909,6 +8912,19 @@ expand_constructor (tree exp, rtx target, enum expand_modifier modifier, > && ! mostly_zeros_p (exp)) > return NULL_RTX; > > + if (target && GET_CODE (target) == PARALLEL && all_zeros_p (exp)) > + { > + int length = XVECLEN (target, 0); > + int start = XEXP (XVECEXP (target, 0, 0), 0) ? 0 : 1; > + for (int i = start; i < length; i++) > + { > + rtx dst = XEXP (XVECEXP (target, 0, i), 0); > + rtx zero = CONST0_RTX (GET_MODE (dst)); > + emit_move_insn (dst, zero); > + } > + return target; > + } > + > /* Handle calls that pass values in multiple non-contiguous > locations. The Irix 6 ABI has examples of this. */ > if (target == 0 || ! safe_from_p (target, exp, 1) > @@ -10621,6 +10637,157 @@ stmt_is_replaceable_p (gimple *stmt) > return false; > } > > +/* In the parallel rtx register series REGS, compute the position for given > + {BITPOS, BITSIZE}. > + START_INDEX, END_INDEX, LEFT_BITS and RIGHT_BITS are computed outputs. */ > + > +void > +query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, > + rtx regs, int &start_index, int &end_index, > + HOST_WIDE_INT &left_bits, HOST_WIDE_INT &right_bits) > +{ > + int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1; > + for (; cur_index < XVECLEN (regs, 0); cur_index++) > + { > + rtx slot = XVECEXP (regs, 0, cur_index); > + HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT; > + machine_mode mode = GET_MODE (XEXP (slot, 0)); > + HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant (); > + if (off <= bitpos && off + size > bitpos) > + { > + start_index = cur_index; > + left_bits = bitpos - off; > + } > + if (off + size >= bitpos + bitsize) > + { > + end_index = cur_index; > + right_bits = off + size - (bitpos + bitsize); > + break; > + } > + } > +} > + > +static rtx > +extract_sub_member (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize, > + tree expr) > +{ > + int start_index = -1; > + int end_index = -1; > + HOST_WIDE_INT left_bits = 0; > + HOST_WIDE_INT right_bits = 0; > + query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index, > + left_bits, right_bits); > + > + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr)); > + if (end_index > start_index || expr_mode == BLKmode) > + { > + /* TImode in multi-registers. */ > + if (expr_mode == TImode) > + { > + rtx res = gen_reg_rtx (expr_mode); > + HOST_WIDE_INT start; > + start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); > + for (int index = start_index; index <= end_index; index++) > + { > + rtx reg = XEXP (XVECEXP (regs, 0, index), 0); > + machine_mode mode = GET_MODE (reg); > + HOST_WIDE_INT off; > + off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start; > + rtx sub = simplify_gen_subreg (mode, res, expr_mode, off); > + emit_move_insn (sub, reg); > + } > + return res; > + } > + > + /* Vector in multi-registers. */ > + if (VECTOR_MODE_P (expr_mode)) > + { > + rtvec vector = rtvec_alloc (end_index - start_index + 1); > + machine_mode emode; > + emode = GET_MODE (XEXP (XVECEXP (regs, 0, start_index), 0)); > + for (int index = start_index; index <= end_index; index++) > + { > + rtx reg = XEXP (XVECEXP (regs, 0, index), 0); > + gcc_assert (emode == GET_MODE (reg)); > + RTVEC_ELT (vector, index - start_index) = reg; > + } > + scalar_int_mode imode; > + machine_mode vmode; > + int nunits = end_index - start_index + 1; > + if (!(int_mode_for_mode (emode).exists (&imode) > + && mode_for_vector (imode, nunits).exists (&vmode))) > + gcc_unreachable (); > + > + insn_code icode; > + icode = convert_optab_handler (vec_init_optab, vmode, imode); > + rtx res = gen_reg_rtx (vmode); > + emit_insn (GEN_FCN (icode) (res, gen_rtx_PARALLEL (vmode, vector))); > + if (expr_mode == vmode) > + return res; > + return simplify_gen_subreg (expr_mode, res, vmode, 0); > + } > + > + /* Need multi-registers in a parallel for the access. */ > + int num_words = end_index - start_index + 1; > + rtx *tmps = XALLOCAVEC (rtx, num_words); > + > + int pos = 0; > + HOST_WIDE_INT start; > + start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1)); > + /* Extract whole registers. */ > + for (; pos < num_words; pos++) > + { > + int index = start_index + pos; > + rtx reg = XEXP (XVECEXP (regs, 0, index), 0); > + machine_mode mode = GET_MODE (reg); > + HOST_WIDE_INT off; > + off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start; > + tmps[pos] = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off)); > + } > + > + rtx res = gen_rtx_PARALLEL (expr_mode, gen_rtvec_v (pos, tmps)); > + return res; > + } > + > + gcc_assert (end_index == start_index); > + > + /* Just need one reg for the access. */ > + if (left_bits == 0 && right_bits == 0) > + { > + rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0); > + if (GET_MODE (reg) != expr_mode) > + reg = gen_lowpart (expr_mode, reg); > + return reg; > + } > + > + /* Need to extract bitfield part reg for the access. > + left_bits != 0 or right_bits != 0 */ > + rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0); > + bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr)); > + scalar_int_mode imode; > + if (!int_mode_for_mode (expr_mode).exists (&imode)) > + { > + gcc_assert (false); > + return NULL_RTX; > + } > + > + machine_mode mode = GET_MODE (reg); > + bool reverse = false; > + rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode, > + imode, reverse, NULL); > + > + if (GET_MODE (bfld) != imode) > + bfld = gen_lowpart (imode, bfld); > + > + if (expr_mode == imode) > + return bfld; > + > + /* expr_mode != imode, e.g. SF != SI. */ > + rtx result = gen_reg_rtx (imode); > + emit_move_insn (result, bfld); > + return gen_lowpart (expr_mode, result); > +} > + > rtx > expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, > enum expand_modifier modifier, rtx *alt_rtl, > @@ -11498,6 +11665,16 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, > = expand_expr_real (tem, tem_target, VOIDmode, tem_modifier, NULL, > true); > > + /* It is scalarizable access on param which is passed by registers. */ > + if (GET_CODE (op0) == PARALLEL > + && (TREE_CODE (tem) == PARM_DECL || VAR_P (tem))) > + { > + HOST_WIDE_INT pos, size; > + size = bitsize.to_constant (); > + pos = bitpos.to_constant (); > + return extract_sub_member (op0, pos, size, exp); > + } > + > /* If the field has a mode, we want to access it in the > field's mode, not the computed mode. > If a MEM has VOIDmode (external with incomplete type), > diff --git a/gcc/expr.h b/gcc/expr.h > index 2a172867fdb..8a9332aaad6 100644 > --- a/gcc/expr.h > +++ b/gcc/expr.h > @@ -362,5 +362,8 @@ extern rtx expr_size (tree); > > extern bool mem_ref_refers_to_non_mem_p (tree); > extern bool non_mem_decl_p (tree); > +extern void query_position_in_parallel (HOST_WIDE_INT, HOST_WIDE_INT, rtx, > + int &, int &, HOST_WIDE_INT &, > + HOST_WIDE_INT &); > > #endif /* GCC_EXPR_H */ > diff --git a/gcc/function.cc b/gcc/function.cc > index afb0b33da9e..518250b2728 100644 > --- a/gcc/function.cc > +++ b/gcc/function.cc > @@ -3107,8 +3107,29 @@ assign_parm_setup_block (struct assign_parm_data_all *all, > emit_move_insn (mem, entry_parm); > } > else > - move_block_from_reg (REGNO (entry_parm), mem, > - size_stored / UNITS_PER_WORD); > + { > + int regno = REGNO (entry_parm); > + int nregs = size_stored / UNITS_PER_WORD; > + rtx *tmps = XALLOCAVEC (rtx, nregs); > + machine_mode mode = word_mode; > + HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant (); > + for (int i = 0; i < nregs; i++) > + { > + rtx reg = gen_rtx_REG (mode, regno + i); > + rtx off = GEN_INT (word_size * i); > + tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off); > + } > + > + rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps)); > + if (scalarizable_aggregate (parm, regs)) > + { > + rtx pseudos = gen_group_rtx (regs); > + emit_group_move (pseudos, regs); > + stack_parm = pseudos; > + } > + else > + move_block_from_reg (regno, mem, nregs); > + } > } > else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type)) > { > @@ -3710,7 +3731,15 @@ assign_parms (tree fndecl) > > assign_parm_adjust_stack_rtl (&data); > > - if (assign_parm_setup_block_p (&data)) > + rtx incoming = DECL_INCOMING_RTL (parm); > + if (GET_CODE (incoming) == PARALLEL > + && scalarizable_aggregate (parm, incoming)) > + { > + rtx pseudos = gen_group_rtx (incoming); > + emit_group_move (pseudos, incoming); > + set_parm_rtl (parm, pseudos); > + } > + else if (assign_parm_setup_block_p (&data)) > assign_parm_setup_block (&all, parm, &data); > else if (data.arg.pass_by_reference || use_register_for_decl (parm)) > assign_parm_setup_reg (&all, parm, &data); > @@ -5128,6 +5157,7 @@ expand_function_start (tree subr) > { > gcc_assert (GET_CODE (hard_reg) == PARALLEL); > set_parm_rtl (res, gen_group_rtx (hard_reg)); > + set_scalar_rtx_for_returns (); > } > } > > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h > index f20266c4622..df3071ccf6e 100644 > --- a/gcc/tree-sra.h > +++ b/gcc/tree-sra.h > @@ -19,6 +19,82 @@ You should have received a copy of the GNU General Public License > along with GCC; see the file COPYING3. If not see > . */ > > +struct sra_base_access > +{ > + /* Values returned by get_ref_base_and_extent, indicates the > + OFFSET, SIZE and BASE of the access. */ > + HOST_WIDE_INT offset; > + HOST_WIDE_INT size; > + > + /* The context expression of this access. */ > + tree expr; > + > + /* Indicates this is a write access. */ > + bool write : 1; > + > + /* Indicates if this access is made in reverse storage order. */ > + bool reverse : 1; > +}; > + > +/* Default template for sra_scan_function. */ > + > +struct sra_default_analyzer > +{ > + /* Template analyze functions. */ > + void analyze_phi (gphi *){}; > + void pre_analyze_stmt (gimple *){}; > + void analyze_return (greturn *){}; > + void analyze_assign (gassign *){}; > + void analyze_call (gcall *){}; > + void analyze_asm (gasm *){}; > + void analyze_default_stmt (gimple *){}; > +}; > + > +/* Scan function and look for interesting expressions. */ > + > +template > +void > +scan_function (struct function *fun, analyzer &a) > +{ > + basic_block bb; > + FOR_EACH_BB_FN (bb, fun) > + { > + for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi); > + gsi_next (&gsi)) > + a.analyze_phi (gsi.phi ()); > + > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); > + gsi_next (&gsi)) > + { > + gimple *stmt = gsi_stmt (gsi); > + a.pre_analyze_stmt (stmt); > + > + switch (gimple_code (stmt)) > + { > + case GIMPLE_RETURN: > + a.analyze_return (as_a (stmt)); > + break; > + > + case GIMPLE_ASSIGN: > + a.analyze_assign (as_a (stmt)); > + break; > + > + case GIMPLE_CALL: > + a.analyze_call (as_a (stmt)); > + break; > + > + case GIMPLE_ASM: > + a.analyze_asm (as_a (stmt)); > + break; > + > + default: > + a.analyze_default_stmt (stmt); > + break; > + } > + } > + } > +} > + > bool type_internals_preclude_sra_p (tree type, const char **msg); > > /* Return true iff TYPE is stdarg va_list type (which early SRA and IPA-SRA > diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc > index d8dafa5481a..7fc801f2612 100644 > --- a/gcc/var-tracking.cc > +++ b/gcc/var-tracking.cc > @@ -5352,7 +5352,8 @@ track_loc_p (rtx loc, tree expr, poly_int64 offset, bool store_reg_p, > because the real and imaginary parts are represented as separate > pseudo registers, even if the whole complex value fits into one > hard register. */ > - if ((paradoxical_subreg_p (mode, DECL_MODE (expr)) > + if (((DECL_MODE (expr) != BLKmode > + && paradoxical_subreg_p (mode, DECL_MODE (expr))) > || (store_reg_p > && !COMPLEX_MODE_P (DECL_MODE (expr)) > && hard_regno_nregs (REGNO (loc), DECL_MODE (expr)) == 1)) > diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C b/gcc/testsuite/g++.target/powerpc/pr102024.C > index 769585052b5..c8995cae707 100644 > --- a/gcc/testsuite/g++.target/powerpc/pr102024.C > +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C > @@ -5,7 +5,7 @@ > // Test that a zero-width bit field in an otherwise homogeneous aggregate > // generates a psabi warning and passes arguments in GPRs. > > -// { dg-final { scan-assembler-times {\mstd\M} 4 } } > +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } } > > struct a_thing > { > diff --git a/gcc/testsuite/gcc.target/i386/pr20020-2.c b/gcc/testsuite/gcc.target/i386/pr20020-2.c > index fa8cb2528c5..723f1826630 100644 > --- a/gcc/testsuite/gcc.target/i386/pr20020-2.c > +++ b/gcc/testsuite/gcc.target/i386/pr20020-2.c > @@ -15,10 +15,15 @@ struct shared_ptr_struct > }; > typedef struct shared_ptr_struct sptr_t; > > +void foo (sptr_t *); > + > void > copy_sptr (sptr_t *dest, sptr_t src) > { > *dest = src; > + > + /* Prevent 'src' to be scalarized as registers. */ > + foo (&src); > } > > /* { dg-final { scan-rtl-dump "\\\(set \\\(reg:TI \[0-9\]*" "expand" } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c b/gcc/testsuite/gcc.target/powerpc/pr108073.c > new file mode 100644 > index 00000000000..293bf93fb9a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c > @@ -0,0 +1,29 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -save-temps" } */ > + > +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF; > +typedef struct SF {float a[4]; int i1; int i2; } SF; > + > +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ > +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ > +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && has_arch_pwr8 } } } } */ > +short __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return a.s2+a.s3;return 0;} > +int __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return a.i2+a.i1;return 0;} > +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 2)return arg.a[3];else return 0.0;} > +float __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 2)return arg.a[2]; return 0;} > +float __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 2)return arg.a[1];return 0;} > + > +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4}; > +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2}; > + > +int main() > +{ > + if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 4.0 > + && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0)) > + __builtin_abort (); > + if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0 > + && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0)) > + __builtin_abort (); > + return 0; > +} > + > diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c > new file mode 100644 > index 00000000000..4e1f87f7939 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c > @@ -0,0 +1,6 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > + > +typedef struct LARGE {double a[4]; int arr[32];} LARGE; > +LARGE foo (LARGE a){return a;} > +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c > new file mode 100644 > index 00000000000..8a8e1a0e996 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c > @@ -0,0 +1,32 @@ > +/* PR target/65421 */ > +/* { dg-options "-O2" } */ > +/* { dg-require-effective-target powerpc_elfv2 } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ > + > +typedef struct FLOATS > +{ > + double a[3]; > +} FLOATS; > + > +/* 3 lfd after returns also optimized */ > +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */ > + > +/* 3 stfd */ > +void st_arg (FLOATS a, FLOATS *p) {*p = a;} > +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */ > + > +/* blr */ > +FLOATS ret_arg (FLOATS a) {return a;} > + > +typedef struct MIX > +{ > + double a[2]; > + long l; > +} MIX; > + > +/* std 3 param regs to return slot */ > +MIX ret_arg1 (MIX a) {return a;} > +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */ > + > +/* count insns */ > +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */