From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id 149873858D20 for ; Tue, 2 Apr 2024 06:12:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 149873858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 149873858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712038338; cv=none; b=AaVmtXHLH+eeTGXIuA39XS02mSPxVNxUh1//2vXnAk/MJpe/7utLXXsEZ8S3iI8Rn+bneWoj6uhOn0Pa99DDokvfOfCju1h1Fp96+62Xr90eFvs/a4oGpkaJ/rAmenL9ErAkXRlIR7aMCe3Ie3Ec5dupcv9RnF8il5YyzYk0pQY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712038338; c=relaxed/simple; bh=qMVLXO0fKYV2bCIUH81adkvL0oueTCI16GIwv7ngseg=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=wcrvq1ndskUORdu3ZnzjhidCjqs8heufT3rBg4zjHvGkTClG+QQ9zyU32/3XdQsRVwxYpFonoYdQRrP+fl+G9CcWPQT1m0XzP6msdcztVx6YtsdAJGYaAG13rR4sUyetif2Ru+B6cyE6gGbg76ufPBepXl0kzM70DkhbCpzNFmc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4326BjMC026703; Tue, 2 Apr 2024 06:12:16 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : cc : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=6wAT1nfOGP6iSyRXGSsq+t1nb+dDeYmvMpZLVUVE7qE=; b=kI6nk7wI6Ahoiw1VHnMAJJEXVTA+4zH2L/ijgspD78z/n1Bru590glipTQYAi8q4qqKf R8YpnNH0AlU9iHKMS/4LvXjcUO++IEJJ8w9y58JpEBRj05zQ+h1pXOBfBLISK3fz2kJM h+0gLjrtx18YYxYMKnlYtC/rD7A6wZZNrHvjW2UZsT0bqX7yp3Sn0WD1PRUkpdDaF5DS 9u4Sc+0+7gKFDLzFAxORbUwGpnyko5lrdX94heAnPEWlrhT7d1BpJ+sau/tA9DvSspv7 aMGWinEMdCTHLVxgOv47qCcWJRLSCPSOxsLcFatoWYprwnjWFlhL8OgHeRhI3a+5Mra7 Fw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x8ayhg55j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 02 Apr 2024 06:12:16 +0000 Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4326CFbs027815; Tue, 2 Apr 2024 06:12:15 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x8ayhg55f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 02 Apr 2024 06:12:15 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4323bRmg027138; Tue, 2 Apr 2024 06:12:14 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3x6wf04wty-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 02 Apr 2024 06:12:14 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4326C8Si50266618 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 2 Apr 2024 06:12:10 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AB4852004E; Tue, 2 Apr 2024 06:12:08 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 310962004D; Tue, 2 Apr 2024 06:12:06 +0000 (GMT) Received: from [9.200.57.148] (unknown [9.200.57.148]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 2 Apr 2024 06:12:05 +0000 (GMT) Message-ID: Date: Tue, 2 Apr 2024 14:12:04 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799] Content-Language: en-US To: Ajit Agarwal , Peter Bergner References: <8e8dad73-43a6-4764-a496-b600e6a220e1@linux.ibm.com> <76307976-77b0-48f0-90b9-6dcec02e3c8f@linux.ibm.com> <6a592b9f-d536-4a0e-aa00-ee8d4a778afc@linux.ibm.com> <5a2e511a-3a29-4cdd-88b1-12a6274d5a79@linux.ibm.com> Cc: Jakub Jelinek , Segher Boessenkool , David Edelsohn , Michael Meissner , gcc-patches From: "Kewen.Lin" In-Reply-To: <5a2e511a-3a29-4cdd-88b1-12a6274d5a79@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AB6MB-PacroBU3fC_ESl2s6au0hQkPOl X-Proofpoint-ORIG-GUID: ZDOVp5tfjsUAAiF32cQhrc49epSgxfZ1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-04-02_02,2024-04-01_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 mlxlogscore=999 impostorscore=0 bulkscore=0 malwarescore=0 mlxscore=0 clxscore=1015 adultscore=0 spamscore=0 priorityscore=1501 phishscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2404020042 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP,URIBL_BLACK autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! on 2024/3/24 02:37, Ajit Agarwal wrote: > > > On 23/03/24 9:33 pm, Peter Bergner wrote: >> On 3/23/24 4:33 AM, Ajit Agarwal wrote: >>>>> - else if (align_words < GP_ARG_NUM_REG) >>>>> + else if (align_words < GP_ARG_NUM_REG >>>>> + || (cum->hidden_string_length >>>>> + && cum->actual_parm_length <= GP_ARG_NUM_REG)) >>>> { >>>> if (TARGET_32BIT && TARGET_POWERPC64) >>>> return rs6000_mixed_function_arg (mode, type, align_words); >>>> >>>> return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words); >>>> } >>>> else >>>> return NULL_RTX; >>>> >>>> The old code for the unused hidden parameter (which was the 9th param) would >>>> fall thru to the "return NULL_RTX;" which would make the callee assume there >>>> was a parameter save area allocated. Now instead, we'll return a reg rtx, >>>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now >>>> see a copy of r11 into a pseudo like we do for the other param regs. >>>> Is that a problem? Given it's an unused parameter, it'll probably get deleted >>>> as dead code, but could it cause any issues? What if we have more than one I think Peter raised one good point, not sure it would really cause some issues, but the assigned reg goes beyond GP_ARG_MAX_REG, at least it is confusing to people especially without DCE like at -O0. Can we aggressively remove these candidates from DECL_ARGUMENTS chain? Does it cause any assertion to fail? BR, Kewen >>>> unused hidden parameter and we return r12 and r13 which have specific uses >>>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead. >>>> Have you verified what the callee RTL looks like after expand for these >>>> unused hidden parameters? Is there a rtx we can return that isn't a NULL_RTX >>>> which triggers the assumption of a parameter save area, but isn't a reg rtx >>>> which might lead to some rtl being generated? Would a (const_int 0) or >>>> something else work? >>>> >>>> >>> For the above use case it will return >>> >>> (reg:DI 5 %r5) and below check entry_parm = >>> (reg:DI 5 %r5) and the following check will not return TRUE and hence >>> parameter save area will not be allocated. >> >> Why r5?!?! The 8th (integer) param would return r10, so I'd assume if >> the next param was a hidden param, then it'd get the next gpr, so r11. >> How does it jump back to r5 which may have been used by the 3rd param? >> >> > My mistake its r11 only for hidden param. >> >> >> >>> It will not generate any rtx in the callee rtl code but it just used to >>> check whether to allocate parameter save area or not when number of args > 8. >>> >>> /* If there is no incoming register, we need a stack. */ >>> entry_parm = rs6000_function_arg (args_so_far, arg); >>> if (entry_parm == NULL) >>> return true; >>> >>> /* Likewise if we need to pass both in registers and on the stack. */ >>> if (GET_CODE (entry_parm) == PARALLEL >>> && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX) >>> return true; >> >> Yes, this code in rs6000_parm_needs_stack() uses the rs6000_function_arg() >> return value as a boolean to tell us whether a parameter save area is required >> so what we return is unimportant other than to know it's not NULL_RTX. >> >> I'm more concerned about the use of the target hook targetm.calls.function_arg >> used in the generic parts of the compiler. What will that code do differently >> now that we return a reg rtx rather than NULL_RTX? Might that code use >> the reg rtx to emit something? I'd feel better if you could verify what >> happens in that code when we return a reg rtx for that 9th hidden param which >> isn't really being passed in a register. >> > > As per my understanding and debugging openBLAS code testcase I see that reg_rtx returned inside the below IF condition is used for check whether paramter save area is needed or not. > > In the generic code where targetm.calls.function_arg is called > in calls.cc returned rtx is used for PARALLEL case so that we can > check if we need to pass both in registers and stack then they emit > store with respect to return rtx. If we identify that we need only > registers for argument then it emits nothing. > > Thanks & Regards > Ajit >> >> Peter >> >>