From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id D6F6B3858D1E for ; Sat, 23 Mar 2024 04:38:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D6F6B3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D6F6B3858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711168722; cv=none; b=K4n9sMlU5GDbEwaX9p63O7ARN1FW+3i3ItYOStA7Rg0GNfyDAjhc6ACZ0d5+fX468d81OQKEBCi7+PYOCa9PI52q8Myv5jumhWRiUciqQUom3egYP9L6ACvDRm6qPYO2nQWAWmQdN/rOVz0hTnz5YQWIxtnXXkfp/3DSYfDgrP4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711168722; c=relaxed/simple; bh=2ZMx1yRrSu3ZWqrvEUs+FnL0HfF725DcyVbXOL4QvuQ=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=fXPEEzCXlNkW/ijOyPzT2VuWnfDYnaXpmOaZCUXKVKTOZNT8r+lOOmlLo1xb8mHjj2qYqZkn8lg6d7kCk8Ea7A/moYAnHql2xaY1XKVLAUTnkVHQ7sybWAXhsS+No1mAlkNDHnTzF8nam4hczD/Sth0aTGRIAOf9yddEMuLj7H0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42N4cd3Q031214; Sat, 23 Mar 2024 04:38:39 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=N/XU68ZCX7UTle1gtn7r1TlFDmZZaWAf+ShQdlLDE1E=; b=GwCLWW1cONMp37DN7HsY4hdcjuSoeB1GPP8o7O90c5waLNhgmrd/CRQz2SZ3BY19EH6B Q6QJtwJt1tWYf6F4MNMVGwxbFDXsT3bwZFs0lRCsfNnC+xpIgDW8/nmbwHTpc36Ma7CM pfVI016dGsPOT1Lo7ILL5t6GusViQ/b/5bnTd32eKvG0FhFuYJOfJQjew8ZvQSGLmCRB 6GU5kaV9EoCXVRo7FkJKm5diAKzewJPNm2HDdfosqTN25vjkwo1Y11pDmu2JGcfUkAW8 rmzGKb67obUv23ZPr1nIiPAIGiSeSCqMjcVgEVMbALyx1VgsHDk6jKuU1f/eKSgVPLH4 +A== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x1ptdg4sc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 04:38:39 +0000 Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42N4ccvJ031177; Sat, 23 Mar 2024 04:38:38 GMT Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x1ptdg4s4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 04:38:38 +0000 Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42N2x2BB009160; Sat, 23 Mar 2024 04:37:19 GMT Received: from smtprelay05.wdc07v.mail.ibm.com ([172.16.1.72]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3x0x15j40h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 04:37:19 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay05.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42N4bGuO19858070 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 23 Mar 2024 04:37:18 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 467CE5805B; Sat, 23 Mar 2024 04:37:16 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7411558066; Sat, 23 Mar 2024 04:37:15 +0000 (GMT) Received: from [9.61.153.201] (unknown [9.61.153.201]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP; Sat, 23 Mar 2024 04:37:15 +0000 (GMT) Message-ID: Date: Fri, 22 Mar 2024 23:37:14 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799] Content-Language: en-US To: Ajit Agarwal , Jakub Jelinek , "Kewen.Lin" , Segher Boessenkool , David Edelsohn , Michael Meissner , gcc-patches References: <8e8dad73-43a6-4764-a496-b600e6a220e1@linux.ibm.com> From: Peter Bergner In-Reply-To: <8e8dad73-43a6-4764-a496-b600e6a220e1@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: jOnKRxutvgKYVfhEroXHMQWDcRZ_BdKY X-Proofpoint-GUID: 8tti_Vp1pRHCr8mU6FFY38IDHzrrjymD X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-23_01,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 impostorscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 lowpriorityscore=0 malwarescore=0 suspectscore=0 clxscore=1015 spamscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403230030 X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_MANYTO,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 3/22/24 5:15 AM, Ajit Agarwal wrote: > When using FlexiBLAS with OpenBLAS we noticed corruption of > the parameters passed to OpenBLAS functions. FlexiBLAS > basically provides a BLAS interface where each function > is a stub that forwards the arguments to a real BLAS lib, > like OpenBLAS. > > Fixes the corruption of caller frame checking number of > arguments is less than equal to GP_ARG_NUM_REG (8) > excluding hidden unused DECLS. I think the git log entry commentary could be a little more descriptive of what the problem is. How about something like the following? When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller stack frame when calling OpenBLAS functions. This was caused by the FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the number of function parameters in the callee due to hidden Fortran parameters. This can cause problems when the callee believes the caller has allocated a parameter save area when the caller has not done so. That means any writes by the callee into the non-existent parameter save area will corrupt the caller stack frame. The workaround implemented here, is for the callee to determine whether the caller has allocated a parameter save area or not, by ignoring any unused hidden parameters when counting the number of parameters. > PR rtk-optimization/100799 s/rtk/rtl/ > * config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't > generate parameter save area if number of arguments passed > less than equal to GP_ARG_NUM_REG (8) excluding hidden > parameter. The callee doesn't generate or allocate the parameter save area, the caller does. The code here is for the callee trying to determine whether the caller has done so. How about saying the following instead? Don't assume a parameter save area has been allocated if the number of formal parameters, excluding unused hidden parameters, is less than or equal to GP_ARG_NUM_REG (8). > (init_cumulative_args): Check for hidden parameter in fortran > routine and set the flag hidden_string_length and actual > parameter passed excluding hidden unused DECLS. Check for unused hidden Fortran parameters and set hidden_string_length and actual_parm_length. > + /* When the buggy C/C++ wrappers call the function with fewer arguments > + than it actually has and doesn't expect the parameter save area on the > + caller side because of that while the callee expects it and the callee > + actually stores something in the parameter save area, it corrupts > + whatever is in the caller stack frame at that location. */ The wrapper/caller is the one that allocates the parameter save area, so saying "...doesn't expect the parameter save area on the caller side..." doesn't make sense, since it knows whether it allocated it or not. How about saying something like the following instead? Check whether this function contains any unused hidden parameters and record how many there are for use in rs6000_function_arg() to determine whether its callers have allocated a parameter save area or not. See PR100799 for details. > + unsigned int num_args = 0; > + unsigned int hidden_length = 0; > + > + for (tree arg = DECL_ARGUMENTS (current_function_decl); > + arg; arg = DECL_CHAIN (arg)) > + { > + num_args++; > + if (DECL_HIDDEN_STRING_LENGTH (arg)) > + { > + tree parmdef = ssa_default_def (cfun, arg); > + if (parmdef == NULL || has_zero_uses (parmdef)) > + { > + cum->hidden_string_length = 1; > + hidden_length++; > + } > + } > + } > + > + cum->actual_parm_length = num_args - hidden_length; This code looks fine, but do we really need two new fields in rs6000_args? Can't we just get along with only cum->actual_parm_length by modifying the rs6000_function_arg() change from: > + else if (align_words < GP_ARG_NUM_REG > + || (cum->hidden_string_length > + && cum->actual_parm_length <= GP_ARG_NUM_REG)) to: + else if (align_words < GP_ARG_NUM_REG + || cum->actual_parm_length <= GP_ARG_NUM_REG) ??? That said, I have a further comment below on what happens here when align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG. > + /* When the buggy C/C++ wrappers call the function with fewer arguments > + than it actually has and doesn't expect the parameter save area on the > + caller side because of that while the callee expects it and the callee > + actually stores something in the parameter save area, it corrupts > + whatever is in the caller stack frame at that location. */ Same comment as before, so same problem with the comment, but the following change... > - else if (align_words < GP_ARG_NUM_REG) > + else if (align_words < GP_ARG_NUM_REG > + || (cum->hidden_string_length > + && cum->actual_parm_length <= GP_ARG_NUM_REG)) { if (TARGET_32BIT && TARGET_POWERPC64) return rs6000_mixed_function_arg (mode, type, align_words); return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words); } else return NULL_RTX; The old code for the unused hidden parameter (which was the 9th param) would fall thru to the "return NULL_RTX;" which would make the callee assume there was a parameter save area allocated. Now instead, we'll return a reg rtx, probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now see a copy of r11 into a pseudo like we do for the other param regs. Is that a problem? Given it's an unused parameter, it'll probably get deleted as dead code, but could it cause any issues? What if we have more than one unused hidden parameter and we return r12 and r13 which have specific uses in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead. Have you verified what the callee RTL looks like after expand for these unused hidden parameters? Is there a rtx we can return that isn't a NULL_RTX which triggers the assumption of a parameter save area, but isn't a reg rtx which might lead to some rtl being generated? Would a (const_int 0) or something else work? > + /* Actual parameter length ignoring hidden parameter. > + This is done to C++ wrapper calling fortran procedures > + which has hidden parameter that are not used. */ I think a simpler comment will suffice: /* Actual parameter count ignoring unused hidden parameters. */ Peter