From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x536.google.com (mail-pg1-x536.google.com [IPv6:2607:f8b0:4864:20::536]) by sourceware.org (Postfix) with ESMTPS id 9034F384F6F3; Mon, 28 Nov 2022 17:01:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9034F384F6F3 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pg1-x536.google.com with SMTP id w37so5325895pga.5; Mon, 28 Nov 2022 09:01:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Jl1RmfxR6qC6qAvrvqK8Mxh7qeHw9DiSYJQb3apVKMw=; b=Jbs3lzxYs0LgQSXwjngnIjERAIAHLwpnK3AmRL5vqtKulrx3aTkFFgNJxqn3gDgCTS Vq1s9tLOh8ouiy3L3+0LbInqzDH6Pkhzrwa6tUQ29sWwap/qeFhTfPTGyAfhvVdynzBq EqX13a+US6rzWS/NN6R2VfAcLG9w5El/E1zm4cZFqQJc0qPvGJMFtg5DS27FRmmX/jfH DRSGPx+wnSigqr8qsndcrRncv0ozu+jN6eW/21Y07h/GtRMFuVGIgcGGLw+rYuLqR+FA vOcVifod06O0H044zN0eJAOH19UuX2jyhM/tQCROXS7rb5w+1XqrAGLMZwVTTwMqmcuU 4QJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Jl1RmfxR6qC6qAvrvqK8Mxh7qeHw9DiSYJQb3apVKMw=; b=v31zN/HLNsLC0nQB7Mpqs2s2IPBTAvqAufHkdX99Atdp6DUjMEZrGiEa2tGc0buYEl LMrZ0c0i/1rLaELxhleZdhG0FocTsPN4gXTGiEl0AWqtThJZ69BnpCY6pcNmtp1ewR5B cUChSTIrNAnVuhTw2PUJPs1BGWsFGRPhEAHpYakwGSKUbypIjrA+qke30LdifeEYVrAK Fb7eXPM/uih7FErr59FyUi9CE2DpmU10BrcD7Jm3GS/AlvmeD4IWvGliOV/t/7mKTL/q JPFQp9ovkodNBI0u+imno9M545JV7iQVVkIPTAmo/eC1AUJ3qB+dgS+LPtNcmVl5jUCZ 84GQ== X-Gm-Message-State: ANoB5plLvdl8Ylaf8atUHhA2TwxjWc1m737apPjlkWasit3ca9oVKRC1 w3zahqiQrJstS7Nemc/f2oS8wnxe+LY= X-Google-Smtp-Source: AA0mqf7MfRYgLSJbnvLS3dZWbezlQqLhF74gOSNPi/g3tea8G2YEIBumg22ICYzATK5r2tY2cMtdkw== X-Received: by 2002:a63:f301:0:b0:45f:88b2:1762 with SMTP id l1-20020a63f301000000b0045f88b21762mr45119365pgh.341.1669654859443; Mon, 28 Nov 2022 09:00:59 -0800 (PST) Received: from ?IPV6:2601:681:8600:13d0::f0a? ([2601:681:8600:13d0::f0a]) by smtp.gmail.com with ESMTPSA id h7-20020a632107000000b0044046aec036sm7008831pgh.81.2022.11.28.09.00.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 28 Nov 2022 09:00:58 -0800 (PST) Message-ID: <0e432b50-d500-ca2f-0db5-9e9cf099f26c@gmail.com> Date: Mon, 28 Nov 2022 10:00:57 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH V2] Use subscalar mode to move struct block for parameter Content-Language: en-US To: Jiufu Guo Cc: gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com, linkw@gcc.gnu.org, rguenther@suse.de References: <20221117061549.178481-1-guojiufu@linux.ibm.com> <7ea64lroo6.fsf@pike.rch.stglabs.ibm.com> <9424d98e-a95f-58ae-9764-bcf8b4f503dc@gmail.com> <7efseas7f1.fsf@pike.rch.stglabs.ibm.com> From: Jeff Law In-Reply-To: <7efseas7f1.fsf@pike.rch.stglabs.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 11/22/22 19:58, Jiufu Guo wrote: > Hi Jeff, > > Thanks a lot for your comments! > > Jeff Law writes: > >> On 11/20/22 20:07, Jiufu Guo wrote: >>> Jiufu Guo writes: >>> >>>> Hi, >>>> >>>> As mentioned in the previous version patch: >>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html >>>> The suboptimal code is generated for "assigning from parameter" or >>>> "assigning to return value". >>>> This patch enhances the assignment from parameters like the below >>>> cases: >>>> /////case1.c >>>> typedef struct SA {double a[3];long l; } A; >>>> A ret_arg (A a) {return a;} >>>> void st_arg (A a, A *p) {*p = a;} >>>> >>>> ////case2.c >>>> typedef struct SA {double a[3];} A; >>>> A ret_arg (A a) {return a;} >>>> void st_arg (A a, A *p) {*p = a;} >>>> >>>> For this patch, bootstrap and regtest pass on ppc64{,le} >>>> and x86_64. >>>> * Besides asking for help reviewing this patch, I would like to >>>> consult comments about enhancing for "assigning to returns". >>> I updated the patch to fix the issue for returns. This patch >>> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used >>> by a return stmt. This patch fix the issue in expand pass only, >>> so, we would try to update the patch to avoid this flag. >>> >>> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc >>> index dd29ffffc03..09b8ec64cea 100644 >>> --- a/gcc/cfgexpand.cc >>> +++ b/gcc/cfgexpand.cc >>> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars) >>> frame_phase = off ? align - off : 0; >>> } >>> + /* Collect VARs on returns. */ >>> + if (DECL_RESULT (current_function_decl)) >>> + { >>> + edge_iterator ei; >>> + edge e; >>> + FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds) >>> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src))) >>> + { >>> + tree val = gimple_return_retval (ret); >>> + if (val && VAR_P (val)) >>> + DECL_USEDBY_RETURN_P (val) = 1; >>> + } >>> + } >>> + >>> /* Set TREE_USED on all variables in the local_decls. */ >>> FOR_EACH_LOCAL_DECL (cfun, i, var) >>> TREE_USED (var) = 1; >>> diff --git a/gcc/expr.cc b/gcc/expr.cc >>> index d9407432ea5..20973649963 100644 >>> --- a/gcc/expr.cc >>> +++ b/gcc/expr.cc >>> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool nontemporal) >>> return; >>> } >>> + if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from) >>> + && TYPE_MODE (TREE_TYPE (from)) == BLKmode >>> + && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL >>> + || REG_P (DECL_INCOMING_RTL (from)))) >>> + || (VAR_P (to) && DECL_USEDBY_RETURN_P (to) >>> + && TYPE_MODE (TREE_TYPE (to)) == BLKmode >>> + && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl))) >>> + == PARALLEL)) >>> + { >>> + push_temp_slots (); >>> + rtx par_ret; >>> + machine_mode mode; >>> + par_ret = TREE_CODE (from) == PARM_DECL >>> + ? DECL_INCOMING_RTL (from) >>> + : DECL_RTL (DECL_RESULT (current_function_decl)); >>> + mode = GET_CODE (par_ret) == PARALLEL >>> + ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0)) >>> + : word_mode; >>> + int mode_size = GET_MODE_SIZE (mode).to_constant (); >>> + int size = INTVAL (expr_size (from)); >>> + >>> + /* If/How the parameter using submode, it dependes on the size and >>> + position of the parameter. Here using heurisitic number. */ >>> + int hurstc_num = 8; >> Where did this come from and what does it mean? > Sorry for does not make this clear. We know that an aggregate arg may be > on registers partially or totally, as assign_parm_adjust_entry_rtl. > For an example, if a parameter with 12 words and the target/ABI only > allow 8 gprs for arguments, then the parameter could use 8 regs at most > and left part in stack. Right, but the number of registers is target dependent, so I don't see how using "8" or any number of that matter is correct here. > >> >> Note that BLKmode subword values passed in registers can be either >> right or left justified.  I think you also need to worry about >> endianness here. > Since the subword is used to move block(read from source mem and then > store to destination mem with register mode), and this would keep to use > the same endianness on reg like move_block_from_reg. So, the patch does > not check the endianness. Hmm, I was clear enough here, particularly using the endianness term.  Don't you need to query the ABI to ensure that you're not changing left vs right justification for a partially in register argument.     On the PA we have: /* Specify padding for the last element of a block move between registers    and memory.    The 64-bit runtime specifies that objects need to be left justified    (i.e., the normal justification for a big endian target).  The 32-bit    runtime specifies right justification for objects smaller than 64 bits.    We use a DImode register in the parallel for 5 to 7 byte structures    so that there is only one element.  This allows the object to be    correctly padded.  */ #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \   targetm.calls.function_arg_padding ((MODE), (TYPE)) Jeff