From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 900A63858D28 for ; Sat, 23 Mar 2024 09:33:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 900A63858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 900A63858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711186427; cv=none; b=pCrCzvuIUQi+avyB/VRAmC3LakhEO0BzlKcjcRaDNwax0CVCY5gXqhs+l9b7CvMyopT+4BAqSsIq01lKtqnhx7O31yvz82mMbiZV2g1al2dY0XQttHKJrcsrMP2ZdqKs4QMOkmFDBWhlVl1SsH1HO7C8TOeo2YUFAVWNf1s6eG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711186427; c=relaxed/simple; bh=gA+UxTMxg0NrpY7wQ4+/gwXR7B/NJ3QBiLN6sygXqos=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=tOUkypkRyqac7+8Kk6WKl+dJCUPAsy3aWsZFYpbz7+EepvOASUEHuEMglBmbKwqtXJLRGj7mJwTE0sR5gnMLY5BdENoi7uPwZYkzIKECH8rhr2BtUzCbOXczaMKz37zXxZwgAfy9sUBG/MPeobV2hwb9wK6JahNeqE/TqiPKJr0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42N9TFcY031971; Sat, 23 Mar 2024 09:33:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=KCmbE7RUI0dNmRtvlkkifxxbT4zz+yRbg0mGoa3NgJo=; b=dY1rtxuhi+jdC7j14/e/S17402gPTIlhkfg+s5wpoSI8reCA+5MyZHbOeYiA7Ug1PDGE Z04W63gEvx+ixnLa0qeApq9EgJIyim7Ml+ccwGEf/9M3Yx8KD4DflKM4Uj03XRH1mkCO bq7104yYJOlPtG8Ryfbypk0j+6WSuBgjDvXwsYov2e1XQlA8tUZkON80pMvjxiaVeU0Y yn6Ji8MxTjzn+FvJ/ToqBXm1XRPQ31biJ8UokTP8NccsoXPgV4CjA0O8KpFre7RGtQs6 mew/CPRlyFKSGGnE3HOhyY7c2otajLrF6P9mJIG36n6ziZG4gy1QOpi+iHkzo7fBplL8 6w== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x1vf8808g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 09:33:39 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42N9XdLS007386; Sat, 23 Mar 2024 09:33:39 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x1vf8808c-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 09:33:39 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42N9Vqla009245; Sat, 23 Mar 2024 09:33:37 GMT Received: from smtprelay06.dal12v.mail.ibm.com ([172.16.1.8]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3x0x15kch2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 09:33:37 +0000 Received: from smtpav04.dal12v.mail.ibm.com (smtpav04.dal12v.mail.ibm.com [10.241.53.103]) by smtprelay06.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42N9XY0A28639978 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 23 Mar 2024 09:33:37 GMT Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CDF6558052; Sat, 23 Mar 2024 09:33:34 +0000 (GMT) Received: from smtpav04.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2CE505805A; Sat, 23 Mar 2024 09:33:32 +0000 (GMT) Received: from [9.43.111.31] (unknown [9.43.111.31]) by smtpav04.dal12v.mail.ibm.com (Postfix) with ESMTP; Sat, 23 Mar 2024 09:33:31 +0000 (GMT) Message-ID: <76307976-77b0-48f0-90b9-6dcec02e3c8f@linux.ibm.com> Date: Sat, 23 Mar 2024 15:03:30 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799] Content-Language: en-US To: Peter Bergner , Jakub Jelinek , "Kewen.Lin" , Segher Boessenkool , David Edelsohn , Michael Meissner , gcc-patches References: <8e8dad73-43a6-4764-a496-b600e6a220e1@linux.ibm.com> From: Ajit Agarwal In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: PRprxKmpzl6u-ex77BAG4Am5Ubb_dlKN X-Proofpoint-ORIG-GUID: 0QPKLoaw2j2FnDQO-0abyxoRwzX7lluE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-23_06,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 malwarescore=0 adultscore=0 lowpriorityscore=0 phishscore=0 spamscore=0 priorityscore=1501 clxscore=1015 suspectscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403230065 X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_MANYTO,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Peter: On 23/03/24 10:07 am, Peter Bergner wrote: > On 3/22/24 5:15 AM, Ajit Agarwal wrote: >> When using FlexiBLAS with OpenBLAS we noticed corruption of >> the parameters passed to OpenBLAS functions. FlexiBLAS >> basically provides a BLAS interface where each function >> is a stub that forwards the arguments to a real BLAS lib, >> like OpenBLAS. >> >> Fixes the corruption of caller frame checking number of >> arguments is less than equal to GP_ARG_NUM_REG (8) >> excluding hidden unused DECLS. > > I think the git log entry commentary could be a little more descriptive > of what the problem is. How about something like the following? > > When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller > stack frame when calling OpenBLAS functions. This was caused by the > FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the > number of function parameters in the callee due to hidden Fortran > parameters. This can cause problems when the callee believes the caller > has allocated a parameter save area when the caller has not done so. > That means any writes by the callee into the non-existent parameter save > area will corrupt the caller stack frame. > > The workaround implemented here, is for the callee to determine whether > the caller has allocated a parameter save area or not, by ignoring any > unused hidden parameters when counting the number of parameters. > > I will address this change in the new version of the patch. > >> PR rtk-optimization/100799 > > s/rtk/rtl/ > > > I will address this in new version of the patch. >> * config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't >> generate parameter save area if number of arguments passed >> less than equal to GP_ARG_NUM_REG (8) excluding hidden >> parameter. > > The callee doesn't generate or allocate the parameter save area, the > caller does. The code here is for the callee trying to determine > whether the caller has done so. How about saying the following instead? > > Don't assume a parameter save area has been allocated if the number of > formal parameters, excluding unused hidden parameters, is less than or > equal to GP_ARG_NUM_REG (8). > > I will incorporate this change in new version of the patch. > > >> (init_cumulative_args): Check for hidden parameter in fortran >> routine and set the flag hidden_string_length and actual >> parameter passed excluding hidden unused DECLS. > > Check for unused hidden Fortran parameters and set hidden_string_length > and actual_parm_length. > > I will address this change in new version of the patch. >> + /* When the buggy C/C++ wrappers call the function with fewer arguments >> + than it actually has and doesn't expect the parameter save area on the >> + caller side because of that while the callee expects it and the callee >> + actually stores something in the parameter save area, it corrupts >> + whatever is in the caller stack frame at that location. */ > > The wrapper/caller is the one that allocates the parameter save area, so > saying "...doesn't expect the parameter save area on the caller side..." > doesn't make sense, since it knows whether it allocated it or not. > How about saying something like the following instead? > > Check whether this function contains any unused hidden parameters and > record how many there are for use in rs6000_function_arg() to determine > whether its callers have allocated a parameter save area or not. > See PR100799 for details. > > I will incorporate this change in new version of the patch. > >> + unsigned int num_args = 0; >> + unsigned int hidden_length = 0; >> + >> + for (tree arg = DECL_ARGUMENTS (current_function_decl); >> + arg; arg = DECL_CHAIN (arg)) >> + { >> + num_args++; >> + if (DECL_HIDDEN_STRING_LENGTH (arg)) >> + { >> + tree parmdef = ssa_default_def (cfun, arg); >> + if (parmdef == NULL || has_zero_uses (parmdef)) >> + { >> + cum->hidden_string_length = 1; >> + hidden_length++; >> + } >> + } >> + } >> + >> + cum->actual_parm_length = num_args - hidden_length; > > This code looks fine, but do we really need two new fields in rs6000_args? > Can't we just get along with only cum->actual_parm_length by modifying > the rs6000_function_arg() change from: > >> + else if (align_words < GP_ARG_NUM_REG >> + || (cum->hidden_string_length >> + && cum->actual_parm_length <= GP_ARG_NUM_REG)) > > to: > > + else if (align_words < GP_ARG_NUM_REG > + || cum->actual_parm_length <= GP_ARG_NUM_REG) > > ??? > Yes we can do that. I will address this in new version of the patch. > That said, I have a further comment below on what happens here when > align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG. > > If we exceed the align_words >= GP_ARG_NUM_REG then there could be hidden unused DECL paramter in align_words which is greater than 8 then it will return NULL_RTX. Hence in the above condition it should not return NULL_RTX. For the above condition it will not return NULL_RTX instead it will return a rtx reg. > > >> + /* When the buggy C/C++ wrappers call the function with fewer arguments >> + than it actually has and doesn't expect the parameter save area on the >> + caller side because of that while the callee expects it and the callee >> + actually stores something in the parameter save area, it corrupts >> + whatever is in the caller stack frame at that location. */ > > Same comment as before, so same problem with the comment, but the following > change... > >> - else if (align_words < GP_ARG_NUM_REG) >> + else if (align_words < GP_ARG_NUM_REG >> + || (cum->hidden_string_length >> + && cum->actual_parm_length <= GP_ARG_NUM_REG)) > { > if (TARGET_32BIT && TARGET_POWERPC64) > return rs6000_mixed_function_arg (mode, type, align_words); > > return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words); > } > else > return NULL_RTX; > > The old code for the unused hidden parameter (which was the 9th param) would > fall thru to the "return NULL_RTX;" which would make the callee assume there > was a parameter save area allocated. Now instead, we'll return a reg rtx, > probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now > see a copy of r11 into a pseudo like we do for the other param regs. > Is that a problem? Given it's an unused parameter, it'll probably get deleted > as dead code, but could it cause any issues? What if we have more than one > unused hidden parameter and we return r12 and r13 which have specific uses > in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead. > Have you verified what the callee RTL looks like after expand for these > unused hidden parameters? Is there a rtx we can return that isn't a NULL_RTX > which triggers the assumption of a parameter save area, but isn't a reg rtx > which might lead to some rtl being generated? Would a (const_int 0) or > something else work? > > For the above use case it will return (reg:DI 5 %r5) and below check entry_parm = (reg:DI 5 %r5) and the following check will not return TRUE and hence parameter save area will not be allocated. It will not generate any rtx in the callee rtl code but it just used to check whether to allocate parameter save area or not when number of args > 8. /* If there is no incoming register, we need a stack. */ entry_parm = rs6000_function_arg (args_so_far, arg); if (entry_parm == NULL) return true; /* Likewise if we need to pass both in registers and on the stack. */ if (GET_CODE (entry_parm) == PARALLEL && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX) return true; Thanks & Regards Ajit > >> + /* Actual parameter length ignoring hidden parameter. >> + This is done to C++ wrapper calling fortran procedures >> + which has hidden parameter that are not used. */ > > I think a simpler comment will suffice: > > /* Actual parameter count ignoring unused hidden parameters. */ > > > Peter > > >