From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id C4F4D3857B83; Tue, 28 Nov 2023 10:52:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C4F4D3857B83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C4F4D3857B83 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701168748; cv=none; b=vzfFkKytSbK+O6uB2GE6MOpXfUUSvWIUn/FDQN4Q39aHIGJCd2z6/iFbzT02pLRH6RfIm6w3SPhVZwPOlX7aFGUVy4m1O/nfJ7pNs7uvomjJbDXdDcBIKVB9VrSqzp1bnQBNHEKsPUo8XX3BJVPvb0dEil2NfyrMYlG0xjKkvtg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701168748; c=relaxed/simple; bh=ShWNFwgZWvRp6pg0EDaFExaS6cilyxGBqQJPxa8iyME=; h=DKIM-Signature:DKIM-Signature:Date:From:To:Subject:Message-ID: MIME-Version; b=MFSvvSIDbSKlLLA2j3OkKgDbUN/7jdNx2m0p6NMG6hxVhyyUdy3K83702GNQDLC4G0snpZY3bjw9tjm1a1UDA9EmVyUOYkEi0KeDLjsITyPPpiLadGLIdUcGvn1BmspiOfeG+xLVrfaDXOLHwjOyH+8olA2+nd2vW8uO/xmL5Iw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from relay2.suse.de (unknown [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id B9AD51F74B; Tue, 28 Nov 2023 10:52:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1701168743; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=c/f+0k0FLp5W4rj18on1C4qgf2UW0YSob3k6yhtZ8oc=; b=XzLmGZSGw0TMQ8o7Yo/ZAJ/0dsmmA1phnwVAq8hySIfDyxsUTd1GlWIFa/ICd2glj6PE2u WVrbO1IkPB9GZmV6x2tZh0CDsJDco4cC2v7W9VT6tD154fPzkO34DRwUlJG2wkx3y6IQpy 3WHLjE+0l1/yoKGHGeg5wtl8yftM2T4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1701168743; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=c/f+0k0FLp5W4rj18on1C4qgf2UW0YSob3k6yhtZ8oc=; b=kAQmYn6NiCf0f58PQLnpNYAiqZzzlPacx4p3IFU2yJX4kWjMJ9DIxwoaufGYgRIqImDJoc 1d7jxWIeskyHSDDg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id A43B12C15A; Tue, 28 Nov 2023 10:52:21 +0000 (UTC) Date: Tue, 28 Nov 2023 10:52:21 +0000 (UTC) From: Richard Biener To: Jiufu Guo cc: Richard Biener , gcc-patches@gcc.gnu.org, jeffreyalaw@gmail.com, richard.sandiford@arm.com, steven@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, bergner@linux.ibm.com Subject: Re: [PATCH V2] introduce light expander sra In-Reply-To: Message-ID: References: <20231027015036.3868319-1-guojiufu@linux.ibm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spamd-Bar: ++++++++++++++++++++ X-Spam-Score: 20.19 X-Rspamd-Server: rspamd1 Authentication-Results: smtp-out2.suse.de; dkim=none; spf=softfail (smtp-out2.suse.de: 149.44.160.134 is neither permitted nor denied by domain of rguenther@suse.de) smtp.mailfrom=rguenther@suse.de; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=suse.de (policy=none) X-Rspamd-Queue-Id: B9AD51F74B X-Spamd-Result: default: False [20.19 / 50.00]; RDNS_NONE(1.00)[]; RCVD_COUNT_TWO(0.00)[2]; SPAMHAUS_XBL(0.00)[149.44.160.134:from]; TO_DN_SOME(0.00)[]; RWL_MAILSPIKE_GOOD(0.00)[149.44.160.134:from]; HFILTER_HELO_IP_A(1.00)[relay2.suse.de]; HFILTER_HELO_NORES_A_OR_MX(0.30)[relay2.suse.de]; R_SPF_SOFTFAIL(4.60)[~all:c]; R_RATELIMIT(0.00)[rip(RLa6h5sh378tcam5q78u)]; MX_GOOD(-0.01)[]; RCPT_COUNT_SEVEN(0.00)[10]; RCVD_NO_TLS_LAST(0.10)[]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(2.20)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%]; RDNS_DNSFAIL(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(3.00)[1.000]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; VIOLATED_DIRECT_SPF(3.50)[]; NEURAL_SPAM_LONG(3.50)[1.000]; FUZZY_BLOCKED(0.00)[rspamd.com]; FREEMAIL_CC(0.00)[gmail.com,gcc.gnu.org,arm.com,kernel.crashing.org,linux.ibm.com]; HFILTER_HOSTNAME_UNKNOWN(2.50)[]; SUSPICIOUS_RECIPS(1.50)[]; DMARC_POLICY_SOFTFAIL(0.10)[suse.de : No valid SPF, No valid DKIM,none] X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 28 Nov 2023, Jiufu Guo wrote: > > Hi, > > Thanks so much for your helpful review! > > Richard Biener writes: > > > On Fri, Oct 27, 2023 at 3:51?AM Jiufu Guo wrote: > >> > >> Hi, > >> > >> Compare with previous version: > >> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html > >> This verion supports TI/VEC mode of the access. > >> > >> There are a few PRs (meta-bug PR101926) on various targets. > >> The root causes of them are similar: the aggeragte param/ > >> returns are passed by multi-registers, but they are stored > >> to stack from registers first; and then, access the > >> parameter through stack slot. > >> > >> A general idea to enhance this: accessing the aggregate > >> parameters/returns directly through incoming/outgoing > >> scalar registers. This idea would be a kind of SRA. > >> > >> This experimental patch for light-expander-sra contains > >> below parts: > >> > >> a. Check if the parameters/returns are ok/profitable to > >> scalarize, and set the incoming/outgoing registers( > >> pseudos) for the parameter/return. > >> - This is done in "expand_function_start", after the > >> incoming/outgoing hard registers are determined for the > >> paramter/return. > >> The scalarized registers are recorded in DECL_RTL for > >> the parameter/return in parallel form. > >> - At the time when setting DECL_RTL, "scalarizable_aggregate" > >> is called to check the accesses are ok/profitable to > >> scalarize. > >> We can continue to enhance this function, to support > >> more cases. For example: > >> - 'reverse storage order'. > >> - 'writing to parameter'/'overlap accesses'. > >> > >> b. When expanding the accesses of the parameters/returns, > >> according to the info of the access(e.g. bitpos,bitsize, > >> mode), the scalar(pseudos) can be figured out to expand > >> the access. This may happen when expand below accesses: > >> - The component access of a parameter: "_1 = arg.f1". > >> Or whole parameter access: rhs of "_2 = arg" > >> - The assignment to a return val: > >> "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return > >> stmt. > >> - This is mainly done in expr.cc(expand_expr_real_1, and > >> expand_assignment). Function "extract_sub_member" is > >> used to figure out the scalar rtxs(pseudos). > >> > >> Besides the above two parts, some work are done in the GIMPLE > >> tree: collect sra candidates for parameters/returns, and > >> collect the SRA access info. > >> This is mainly done at the beginning of the expander pass. > >> Below are two major items of this part. > >> - Collect light-expand-sra candidates. > >> Each parameter is checked if it has the proper aggregate > >> type. Collect return val (VAR_P) on each return stmts if > >> the function is returning via registers. > >> This is implemented in expand_sra::collect_sra_candidates. > >> > >> - Build/collect/manage all the access on the candidates. > >> The function "scan_function" is used to do this work, it > >> goes through all basicblocks, and all interesting stmts ( > >> phi, return, assign, call, asm) are checked. > >> If there is an interesting expression (e.g. COMPONENT_REF > >> or PARM_DECL), then record the required info for the access > >> (e.g. pos, size, type, base). > >> And if it is risky to do SRA, the candidates may be removed. > >> e.g. address-taken and accessed via memory. > >> "foo(struct S arg) {bar (&arg);}" > >> > >> This patch is tested on ppc64{,le} and x86_64. > >> Is this ok for trunk? > >> > >> BR, > >> Jeff (Jiufu Guo) > >> > >> PR target/65421 > >> > >> gcc/ChangeLog: > >> > >> * cfgexpand.cc (struct access): New class. > >> (struct expand_sra): New class. > >> (expand_sra::collect_sra_candidates): New member function. > >> (expand_sra::add_sra_candidate): Likewise. > >> (expand_sra::build_access): Likewise. > >> (expand_sra::analyze_phi): Likewise. > >> (expand_sra::analyze_assign): Likewise. > >> (expand_sra::visit_base): Likewise. > >> (expand_sra::protect_mem_access_in_stmt): Likewise. > >> (expand_sra::expand_sra): Class constructor. > >> (expand_sra::~expand_sra): Class destructor. > >> (expand_sra::scalarizable_access): New member function. > >> (expand_sra::scalarizable_accesses): Likewise. > >> (scalarizable_aggregate): New function. > >> (set_scalar_rtx_for_returns): New function. > >> (expand_value_return): Updated. > >> (expand_debug_expr): Updated. > >> (pass_expand::execute): Updated to use expand_sra. > >> * cfgexpand.h (scalarizable_aggregate): New declare. > >> (set_scalar_rtx_for_returns): New declare. > >> * expr.cc (expand_assignment): Updated. > >> (expand_constructor): Updated. > >> (query_position_in_parallel): New function. > >> (extract_sub_member): New function. > >> (expand_expr_real_1): Updated. > >> * expr.h (query_position_in_parallel): New declare. > >> * function.cc (assign_parm_setup_block): Updated. > >> (assign_parms): Updated. > >> (expand_function_start): Updated. > >> * tree-sra.h (struct sra_base_access): New class. > >> (struct sra_default_analyzer): New class. > >> (scan_function): New template function. > >> * var-tracking.cc (track_loc_p): Updated. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * g++.target/powerpc/pr102024.C: Updated > >> * gcc.target/powerpc/pr108073.c: New test. > >> * gcc.target/powerpc/pr65421-1.c: New test. > >> * gcc.target/powerpc/pr65421-2.c: New test. > >> > >> --- > >> gcc/cfgexpand.cc | 352 ++++++++++++++++++- > >> gcc/cfgexpand.h | 2 + > >> gcc/expr.cc | 179 +++++++++- > >> gcc/expr.h | 3 + > >> gcc/function.cc | 36 +- > >> gcc/tree-sra.h | 76 ++++ > >> gcc/var-tracking.cc | 3 +- > >> gcc/testsuite/g++.target/powerpc/pr102024.C | 2 +- > >> gcc/testsuite/gcc.target/i386/pr20020-2.c | 5 + > >> gcc/testsuite/gcc.target/powerpc/pr108073.c | 29 ++ > >> gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 6 + > >> gcc/testsuite/gcc.target/powerpc/pr65421-2.c | 32 ++ > >> 12 files changed, 718 insertions(+), 7 deletions(-) > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c > >> create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c > >> > >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc > >> index 4262703a138..ef99ca8ac13 100644 > >> --- a/gcc/cfgexpand.cc > >> +++ b/gcc/cfgexpand.cc > >> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3. If not see > >> #include "output.h" > >> #include "builtins.h" > >> #include "opts.h" > >> +#include "tree-sra.h" > >> > >> /* Some systems use __main in a way incompatible with its use in gcc, in these > >> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN to > >> @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool); > >> > >> static void record_alignment_for_reg_var (unsigned int); > >> > >> +/* For light SRA in expander about paramaters and returns. */ > >> +struct access : public sra_base_access > >> +{ > >> +}; > >> + > >> +typedef struct access *access_p; > >> + > >> +struct expand_sra : public sra_default_analyzer > >> +{ > >> + /* Construct/destruct resources, e.g. sra candidates. */ > >> + expand_sra (); > >> + ~expand_sra (); > >> + > >> + /* No actions for pre_analyze_stmt, analyze_return. */ > >> + > >> + /* Overwrite phi,call,asm analyzations. */ > >> + void analyze_phi (gphi *s); > >> + > >> + /* TODO: Check accesses on call/asm. */ > >> + void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); }; > >> + void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); }; > >> + > >> + /* Check access of SRA on assignment. */ > >> + void analyze_assign (gassign *); > >> + > >> + /* Check if the accesses of BASE(parameter or return) are > >> + scalarizable, according to the incoming/outgoing REGS. */ > >> + bool scalarizable_accesses (tree base, rtx regs); > >> + > >> +private: > >> + /* Collect the parameter and returns to check if they are suitable for > >> + scalarization. */ > >> + bool collect_sra_candidates (void); > >> + > >> + /* Return true if VAR is added as a candidate for SRA. */ > >> + bool add_sra_candidate (tree var); > >> + > >> + /* Return true if EXPR has interesting sra access, and created access, > >> + return false otherwise. */ > >> + access_p build_access (tree expr, bool write); > >> + > >> + /* Check if the access ACC is scalarizable. REGS is the incoming/outgoing > >> + registers which the access is based on. */ > >> + bool scalarizable_access (access_p acc, rtx regs, bool is_parm); > >> + > >> + /* If there is risk (stored/loaded or addr taken), > >> + disqualify the sra candidates in the un-interesting STMT. */ > >> + void protect_mem_access_in_stmt (gimple *stmt); > >> + > >> + /* Callback of walk_stmt_load_store_addr_ops, used to remove > >> + unscalarizable accesses. */ > >> + static bool visit_base (gimple *, tree op, tree, void *data); > >> + > >> + /* Base (tree) -> Vector (vec *) map. */ > >> + hash_map > *base_access_vec; > >> +}; > >> + > >> +bool > >> +expand_sra::collect_sra_candidates (void) > >> +{ > >> + bool ret = false; > >> + > >> + /* Collect parameters. */ > >> + for (tree parm = DECL_ARGUMENTS (current_function_decl); parm; > >> + parm = DECL_CHAIN (parm)) > >> + ret |= add_sra_candidate (parm); > >> + > >> + /* Collect VARs on returns. */ > >> + if (DECL_RESULT (current_function_decl)) > >> + { > >> + edge_iterator ei; > >> + edge e; > >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > >> + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) > >> + { > >> + tree val = gimple_return_retval (r); > >> + /* To sclaraized the return, the return value should be only > >> + writen (except this return stmt). Using 'true(write)' to > >> + pretend the access only be 'writen'. */ > >> + if (val && VAR_P (val)) > >> + ret |= add_sra_candidate (val) && build_access (val, true); > >> + } > >> + } > >> + > >> + return ret; > >> +} > >> + > >> +bool > >> +expand_sra::add_sra_candidate (tree var) > >> +{ > >> + tree type = TREE_TYPE (var); > >> + > >> + if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type)) > >> + || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var) > >> + || is_va_list_type (type)) > >> + return false; > >> + > >> + base_access_vec->get_or_insert (var); > >> + > >> + return true; > >> +} > >> + > >> +access_p > >> +expand_sra::build_access (tree expr, bool write) > >> +{ > >> + enum tree_code code = TREE_CODE (expr); > >> + if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF > >> + && code != ARRAY_REF && code != ARRAY_RANGE_REF) > >> + return NULL; > >> + > >> + HOST_WIDE_INT offset, size; > >> + bool reverse; > >> + tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, &reverse); > >> + if (!base || !DECL_P (base)) > >> + return NULL; > >> + > >> + vec *access_vec = base_access_vec->get (base); > >> + if (!access_vec) > >> + return NULL; > >> + > >> + /* TODO: support reverse. */ > >> + if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE (base))) > >> + { > >> + base_access_vec->remove (base); > >> + return NULL; > >> + } > >> + > >> + struct access *access = XNEWVEC (struct access, 1); > >> + > >> + memset (access, 0, sizeof (struct access)); > >> + access->offset = offset; > >> + access->size = size; > >> + access->expr = expr; > >> + access->write = write; > >> + access->reverse = reverse; > >> + > >> + access_vec->safe_push (access); > >> + > >> + return access; > >> +} > > > > I still think there's SRA code we can share for producing > > accesses from a stmt. > > Yeap, there is similar code in existing SRAs. To common the code is a > todo item. > One potential issue, for the example: "analyze/scan expr and > build/create access", there are some small/scatter (e.g. the order of > if conditions) difference may cause different behavior. This may cause > the common code un-pretty. > > > > >> + > >> +/* Function protect_mem_access_in_stmt removes the SRA candidates if > >> + there is addr-taken on the candidate in the STMT. */ > >> + > >> +void > >> +expand_sra::analyze_phi (gphi *stmt) > >> +{ > >> + if (base_access_vec && !base_access_vec->is_empty ()) > >> + walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base); > >> +} > >> + > >> +void > >> +expand_sra::analyze_assign (gassign *stmt) > >> +{ > >> + if (!base_access_vec || base_access_vec->is_empty ()) > >> + return; > >> + > >> + if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt)) > >> + { > >> + tree rhs = gimple_assign_rhs1 (stmt); > >> + tree lhs = gimple_assign_lhs (stmt); > >> + bool res_r = build_access (rhs, false); > >> + bool res_l = build_access (lhs, true); > >> + > >> + if (res_l || res_r) > >> + return; > >> + } > >> + > >> + protect_mem_access_in_stmt (stmt); > >> +} > >> + > >> +/* Callback of walk_stmt_load_store_addr_ops, used to remove > >> + unscalarizable accesses. Called by protect_mem_access_in_stmt. */ > >> + > >> +bool > >> +expand_sra::visit_base (gimple *, tree op, tree, void *data) > >> +{ > >> + op = get_base_address (op); > >> + if (op && DECL_P (op)) > >> + { > >> + expand_sra *p = (expand_sra *) data; > >> + p->base_access_vec->remove (op); > >> + } > >> + return false; > >> +} > >> + > >> +/* Function protect_mem_access_in_stmt removes the SRA candidates if > >> + there is store/load/addr-taken on the candidate in the STMT. > >> + > >> + For some statements, which SRA does not care about, if there are > >> + possible memory operation on the SRA candidates, it would be risky > >> + to scalarize it. */ > >> + > >> +void > >> +expand_sra::protect_mem_access_in_stmt (gimple *stmt) > >> +{ > >> + if (base_access_vec && !base_access_vec->is_empty ()) > >> + walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base, > >> + visit_base); > >> +} > >> + > >> +expand_sra::expand_sra () : base_access_vec (NULL) > >> +{ > >> + if (optimize <= 0) > >> + return; > >> + > >> + base_access_vec = new hash_map >; > >> + collect_sra_candidates (); > >> +} > >> + > >> +expand_sra::~expand_sra () > >> +{ > >> + if (optimize <= 0) > >> + return; > >> + > >> + delete base_access_vec; > >> +} > >> + > >> +bool > >> +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm) > >> +{ > >> + /* Now only support reading from parms > >> + or writing to returns. */ > >> + if (is_parm && acc->write) > >> + return false; > >> + if (!is_parm && !acc->write) > >> + return false; > >> + > >> + /* Compute the position of the access in the parallel regs. */ > >> + int start_index = -1; > >> + int end_index = -1; > >> + HOST_WIDE_INT left_bits = 0; > >> + HOST_WIDE_INT right_bits = 0; > >> + query_position_in_parallel (acc->offset, acc->size, regs, start_index, > >> + end_index, left_bits, right_bits); > >> + > >> + /* Invalid access possition: padding or outof bound. */ > >> + if (start_index < 0 || end_index < 0) > >> + return false; > >> + > >> + machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr)); > >> + /* Need multi-registers in a parallel for the access. */ > >> + if (expr_mode == BLKmode || end_index > start_index) > >> + { > >> + if (left_bits || right_bits) > >> + return false; > >> + if (expr_mode == BLKmode) > >> + return true; > >> + > >> + /* For large modes, only support TI/VECTOR in mult-registers. */ > >> + if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode))) > >> + return expr_mode == TImode || VECTOR_MODE_P (expr_mode); > >> + return true; > >> + } > >> + > >> + gcc_assert (end_index == start_index); > >> + > >> + /* Just need one reg for the access. */ > >> + if (left_bits == 0 && right_bits == 0) > >> + return true; > >> + > >> + scalar_int_mode imode; > >> + /* Need to extract bits from the reg for the access. */ > >> + return !acc->write && int_mode_for_mode (expr_mode).exists (&imode); > >> +} > >> + > >> +/* Now, the base (parm/return) is scalarizable, only if all > >> + accesses of the BASE are scalariable. > >> + > >> + This function need to be updated, to support more complicate > >> + cases, like: > >> + - Some access are scalarizable, but some are not. > >> + - Access is writing to a parameter. > >> + - Writing accesses are overlap with multi-accesses. */ > >> + > >> +bool > >> +expand_sra::scalarizable_accesses (tree base, rtx regs) > >> +{ > >> + if (!base_access_vec) > >> + return false; > >> + vec *access_vec = base_access_vec->get (base); > >> + if (!access_vec) > >> + return false; > >> + if (access_vec->is_empty ()) > >> + return false; > >> + > >> + bool is_parm = TREE_CODE (base) == PARM_DECL; > >> + int n = access_vec->length (); > >> + int cur_access_index = 0; > >> + for (; cur_access_index < n; cur_access_index++) > >> + if (!scalarizable_access ((*access_vec)[cur_access_index], regs, is_parm)) > >> + break; > >> + > >> + /* It is ok if all access are scalarizable. */ > >> + if (cur_access_index == n) > >> + return true; > >> + > >> + base_access_vec->remove (base); > >> + return false; > >> +} > >> + > >> +static expand_sra *current_sra = NULL; > >> + > >> +/* Check if the PARAM (or return) is scalarizable. > >> + > >> + This interface is used in expand_function_start > >> + to check sra possiblity for parmeters. */ > >> + > >> +bool > >> +scalarizable_aggregate (tree parm, rtx regs) > >> +{ > >> + if (!current_sra) > >> + return false; > >> + return current_sra->scalarizable_accesses (parm, regs); > >> +} > >> + > >> +/* Check if interesting returns, and if they are scalarizable, > >> + set DECL_RTL as scalar registers. > >> + > >> + This interface is used in expand_function_start > >> + when outgoing registers are determinded for DECL_RESULT. */ > >> + > >> +void > >> +set_scalar_rtx_for_returns () > >> +{ > >> + rtx res = DECL_RTL (DECL_RESULT (current_function_decl)); > >> + edge_iterator ei; > >> + edge e; > >> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) > >> + if (greturn *r = safe_dyn_cast (*gsi_last_bb (e->src))) > >> + { > >> + tree val = gimple_return_retval (r); > >> + if (val && VAR_P (val) && scalarizable_aggregate (val, res)) > >> + SET_DECL_RTL (val, res); > >> + } > >> +} > >> + > >> /* Return an expression tree corresponding to the RHS of GIMPLE > >> statement STMT. */ > >> > >> @@ -3707,7 +4045,7 @@ expand_value_return (rtx val) > >> > >> tree decl = DECL_RESULT (current_function_decl); > >> rtx return_reg = DECL_RTL (decl); > >> - if (return_reg != val) > >> + if (!rtx_equal_p (return_reg, val)) > >> { > >> tree funtype = TREE_TYPE (current_function_decl); > >> tree type = TREE_TYPE (decl); > >> @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp) > >> addr_space_t as; > >> scalar_int_mode op0_mode, op1_mode, addr_mode; > >> > >> + /* TODO: Enable to debug expand-sra optimized parm/returns. */ > >> + tree base = get_base_address (exp); > >> + if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P (base))) > >> + && GET_CODE (DECL_RTL (base)) == PARALLEL) > >> + return NULL_RTX; > >> + > >> switch (TREE_CODE_CLASS (TREE_CODE (exp))) > >> { > >> case tcc_expression: > >> @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun) > >> auto_bitmap forced_stack_vars; > >> discover_nonconstant_array_refs (forced_stack_vars); > >> > >> + /* Enable light-expander-sra. */ > >> + current_sra = new expand_sra; > >> + scan_function (cfun, *current_sra); > >> + > >> /* Make sure all values used by the optimization passes have sane > >> defaults. */ > >> reg_renumber = 0; > >> @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun) > >> loop_optimizer_finalize (); > >> } > >> > >> + delete current_sra; > >> + current_sra = NULL; > >> timevar_pop (TV_POST_EXPAND); > >> > >> return 0; > >> diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h > >> index 0e551f6cfd3..3415c217708 100644 > >> --- a/gcc/cfgexpand.h > >> +++ b/gcc/cfgexpand.h > >> @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *); > >> extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *); > >> extern void set_parm_rtl (tree, rtx); > >> > >> +extern bool scalarizable_aggregate (tree, rtx); > >> +extern void set_scalar_rtx_for_returns (); > >> > >> #endif /* GCC_CFGEXPAND_H */ > >> diff --git a/gcc/expr.cc b/gcc/expr.cc > >> index 763bd82c59f..5ba26e0ef52 100644 > >> --- a/gcc/expr.cc > >> +++ b/gcc/expr.cc > >> @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool nontemporal) > >> Assignment of an array element at a constant index, and assignment of > >> an array element in an unaligned packed structure field, has the same > >> problem. Same for (partially) storing into a non-memory object. */ > >> - if (handled_component_p (to) > >> + if ((handled_component_p (to) > >> + && !(VAR_P (get_base_address (to)) > > > > get_base_address is pretty expensive so please cache the result. > > Thanks for pointing out this! > > > > >> + && DECL_RTL_SET_P (get_base_address (to)) > >> + && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL)) > > > > I think that's possibly fragile in case a PARALLEL can also appear by > > other means here. I also think it's the wrong thing to map a structural > > Understand your concern. The current trunk code uses PARALLEL to carry > the incoming/outgoing registers for aggregate, so this patch reuse this > character. While it is possible that PARALLEL is used by other means. > To make it safe, adding an additional flag may be an idea. (While this > would not needed if using your suggestions below). > > > GIMPLE accordingly with artificial decls which would also be the way > > to preserve debug info. > > > > In principle "expander SRA" isn't different from regular SRA apart from > > the fact that it handles parameter and return variables and heuristically > > includes the ABI (does it?) in decomposition. > > > > So the only important difference is that we cannot represent the > > change for the incoming arguments or the outgoing regs in GIMPLE > > but we should only have to special case _those_ bits, not all the > > other accesses we can rewrite. > > > > We could even use the actual SRA pass in a special mode right > > before RTL expansion (or close to it) and invent a special (but ugly) > > representation for the incoming/outgoing part, say by using > > new internal functions present between the SRA pass and RTL > > expansion: > > > Previously, I tried the method to rerun tree-sra just before the > "optimized" pass once, and find that some enhancement woudl be needed to > make it work: for example, the parameters analyzing seems not work for > some cases. While the two major reasons for me to combine the > 'light-sra' inside expander: > 1. We may just use the reg/pseudos based on incoming/outgoing registers > to access the struct. Then, no need to create the replacements like > tree-sra and ipa-sra. > > 2. For small structures which may have int mode, RTL passes (start from > expander) are using the int reg to access the struct. It would be > better to avoid struct/member access at the beggining of expeander. > It seems these issues can also be relieved by the idea of introducing > "new internal function". We just need to expand the internal function > in any expected way. > > > struct X foo (struct X x) > > { > > int D.1234 = .ARG_PART (x, offset, size); > > int D.1235 = .ARG_PART (x, offset, size); > > D.2345 = .SET_ARG_PARTS (D.1234, offset, size, D.1235, offset, size); > > return D.2345; > > } > > This code expresses two major functionalities: One is extracting fields > from aggregate (mainly for parameters); and another one is constructing > an aggregate from the value of elements (mainly for returns). Yes. In principle BIT_FIELD_REF could be used for the former and a CONSTRUCTOR used for the latter but I've used internal functions to tightly control how we expand these bits - in fact I think for all practical cases we care about the SRA pass should have made sure they expand to plain (register) moves, that is, the pieces 1:1 correspond to incoming argument registers or outgoing return value registers. Richard.