From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by sourceware.org (Postfix) with ESMTPS id E4C1B3858D33 for ; Sat, 23 Mar 2024 14:28:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E4C1B3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linux.ibm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E4C1B3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711204114; cv=none; b=DsHbI43a7UNQvQVMtwGIqCM/+Ds7i0UEos2ok5yCNFPxECRIEw62dIVqdqynaBJU6yF08mnGlMmlp7K+jLTPgRCMO9H16q4SWxvqOleXISMxXImU3r5locApBH7nhADPpW1BF0AX/gbSVXxizJT/gKSH4h1gYQaMtADXVD6ahIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1711204114; c=relaxed/simple; bh=yfSgYyasFGEhMdwu7DeWiJGjUqEFsxSV1tjeDWpRFc8=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:From:To; b=wXgHhM3H/a+TWM5wseNCR/mOsoii4PgWZS5L250BqeHXDrOs/fxogeKxKEt0gcCLViF9pMSCDjch/CD29oTMZdRo06/zBrL2kTcUINcRsQ5e9WpbmF4TWw49pVe+rD3/4I7wpvG140eTf1G5e85D61TNQg0cm3GQ9rRez1lpcOY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 42NCU8qq028839; Sat, 23 Mar 2024 14:28:32 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : from : to : references : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=sruphjsKZlBU5DlnENquZf5YO3rkSKSRkv/kO+woCM8=; b=oDAZZWnFqd9rByct1lEkRFZh2bDpbGvqsN7xgypjeNwwviGiQbD4vYsd7rCUZ7KBJitt 27Y/Y61Jwm0zaSrNBS8h7UpxGYhlIJz5XZ87n2beWQX5Otyn0LedFxIr9w2ieO4R9NEX ulWuQ8RL/XlZ4LHs3ohFGqx7iWen1Q7bICJGifjGRoOMv8DUz9xBXohU8XvYkBD2SmFo Tsi5kla2J6ds+JDO104YSgGvKpIwdhA3uh9cMyoL61S9GD5+9rO4IdHt6MjfDY7MB6jr HyOZ3OiVYKQaWYckHQE/9GKZrzuYwmUjyJw6STHbgPF358oghsvn4EqMLBRBH11VL1of gQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x1wv80979-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 14:28:31 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 42NESUs4031349; Sat, 23 Mar 2024 14:28:30 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3x1wv80976-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 14:28:30 +0000 Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 42NCq5VG023091; Sat, 23 Mar 2024 14:28:30 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([172.16.1.7]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3x0x14vknh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sat, 23 Mar 2024 14:28:30 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 42NESQfr25166552 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sat, 23 Mar 2024 14:28:29 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 983E358065; Sat, 23 Mar 2024 14:28:26 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 81E0E58063; Sat, 23 Mar 2024 14:28:23 +0000 (GMT) Received: from [9.43.18.241] (unknown [9.43.18.241]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP; Sat, 23 Mar 2024 14:28:23 +0000 (GMT) Message-ID: <22397646-bf74-41b5-9ba4-019020ae0538@linux.ibm.com> Date: Sat, 23 Mar 2024 19:58:22 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799] Content-Language: en-US From: Ajit Agarwal To: Peter Bergner , Jakub Jelinek , "Kewen.Lin" , Segher Boessenkool , David Edelsohn , Michael Meissner , gcc-patches References: <8e8dad73-43a6-4764-a496-b600e6a220e1@linux.ibm.com> <76307976-77b0-48f0-90b9-6dcec02e3c8f@linux.ibm.com> In-Reply-To: <76307976-77b0-48f0-90b9-6dcec02e3c8f@linux.ibm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: SeX8fFlEyMqkKrc9okYHHrjsbLhyEY_I X-Proofpoint-ORIG-GUID: nDlBELMNAjpPGcVmjab63yxjoOZ1C-uf X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-23_10,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 adultscore=0 suspectscore=0 phishscore=0 mlxlogscore=999 priorityscore=1501 spamscore=0 impostorscore=0 mlxscore=0 clxscore=1015 malwarescore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2403210000 definitions=main-2403230098 X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,KAM_MANYTO,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello Peter: Sent version-3 of the patch addressing below review comments. Thanks & Regards Ajit On 23/03/24 3:03 pm, Ajit Agarwal wrote: > Hello Peter: > > On 23/03/24 10:07 am, Peter Bergner wrote: >> On 3/22/24 5:15 AM, Ajit Agarwal wrote: >>> When using FlexiBLAS with OpenBLAS we noticed corruption of >>> the parameters passed to OpenBLAS functions. FlexiBLAS >>> basically provides a BLAS interface where each function >>> is a stub that forwards the arguments to a real BLAS lib, >>> like OpenBLAS. >>> >>> Fixes the corruption of caller frame checking number of >>> arguments is less than equal to GP_ARG_NUM_REG (8) >>> excluding hidden unused DECLS. >> >> I think the git log entry commentary could be a little more descriptive >> of what the problem is. How about something like the following? >> >> When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller >> stack frame when calling OpenBLAS functions. This was caused by the >> FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the >> number of function parameters in the callee due to hidden Fortran >> parameters. This can cause problems when the callee believes the caller >> has allocated a parameter save area when the caller has not done so. >> That means any writes by the callee into the non-existent parameter save >> area will corrupt the caller stack frame. >> >> The workaround implemented here, is for the callee to determine whether >> the caller has allocated a parameter save area or not, by ignoring any >> unused hidden parameters when counting the number of parameters. >> >> > I will address this change in the new version of the patch. >> >>> PR rtk-optimization/100799 >> >> s/rtk/rtl/ >> >> >> > I will address this in new version of the patch. >>> * config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't >>> generate parameter save area if number of arguments passed >>> less than equal to GP_ARG_NUM_REG (8) excluding hidden >>> parameter. >> >> The callee doesn't generate or allocate the parameter save area, the >> caller does. The code here is for the callee trying to determine >> whether the caller has done so. How about saying the following instead? >> >> Don't assume a parameter save area has been allocated if the number of >> formal parameters, excluding unused hidden parameters, is less than or >> equal to GP_ARG_NUM_REG (8). >> >> > > I will incorporate this change in new version of the patch. > >> >> >>> (init_cumulative_args): Check for hidden parameter in fortran >>> routine and set the flag hidden_string_length and actual >>> parameter passed excluding hidden unused DECLS. >> >> Check for unused hidden Fortran parameters and set hidden_string_length >> and actual_parm_length. >> >> > > I will address this change in new version of the patch. > > >>> + /* When the buggy C/C++ wrappers call the function with fewer arguments >>> + than it actually has and doesn't expect the parameter save area on the >>> + caller side because of that while the callee expects it and the callee >>> + actually stores something in the parameter save area, it corrupts >>> + whatever is in the caller stack frame at that location. */ >> >> The wrapper/caller is the one that allocates the parameter save area, so >> saying "...doesn't expect the parameter save area on the caller side..." >> doesn't make sense, since it knows whether it allocated it or not. >> How about saying something like the following instead? >> >> Check whether this function contains any unused hidden parameters and >> record how many there are for use in rs6000_function_arg() to determine >> whether its callers have allocated a parameter save area or not. >> See PR100799 for details. >> >> > > I will incorporate this change in new version of the patch. > >> >>> + unsigned int num_args = 0; >>> + unsigned int hidden_length = 0; >>> + >>> + for (tree arg = DECL_ARGUMENTS (current_function_decl); >>> + arg; arg = DECL_CHAIN (arg)) >>> + { >>> + num_args++; >>> + if (DECL_HIDDEN_STRING_LENGTH (arg)) >>> + { >>> + tree parmdef = ssa_default_def (cfun, arg); >>> + if (parmdef == NULL || has_zero_uses (parmdef)) >>> + { >>> + cum->hidden_string_length = 1; >>> + hidden_length++; >>> + } >>> + } >>> + } >>> + >>> + cum->actual_parm_length = num_args - hidden_length; >> >> This code looks fine, but do we really need two new fields in rs6000_args? >> Can't we just get along with only cum->actual_parm_length by modifying >> the rs6000_function_arg() change from: >> >>> + else if (align_words < GP_ARG_NUM_REG >>> + || (cum->hidden_string_length >>> + && cum->actual_parm_length <= GP_ARG_NUM_REG)) >> >> to: >> >> + else if (align_words < GP_ARG_NUM_REG >> + || cum->actual_parm_length <= GP_ARG_NUM_REG) >> >> ??? >> > > Yes we can do that. I will address this in new version of the patch. > > >> That said, I have a further comment below on what happens here when >> align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG. >> >> > > If we exceed the align_words >= GP_ARG_NUM_REG then there could be hidden unused DECL paramter in align_words which is greater than 8 then it will return NULL_RTX. Hence in the above condition it should not return NULL_RTX. For the above condition it will not return NULL_RTX instead it will return a rtx reg. > >> >> >>> + /* When the buggy C/C++ wrappers call the function with fewer arguments >>> + than it actually has and doesn't expect the parameter save area on the >>> + caller side because of that while the callee expects it and the callee >>> + actually stores something in the parameter save area, it corrupts >>> + whatever is in the caller stack frame at that location. */ >> >> Same comment as before, so same problem with the comment, but the following >> change... >> >>> - else if (align_words < GP_ARG_NUM_REG) >>> + else if (align_words < GP_ARG_NUM_REG >>> + || (cum->hidden_string_length >>> + && cum->actual_parm_length <= GP_ARG_NUM_REG)) >> { >> if (TARGET_32BIT && TARGET_POWERPC64) >> return rs6000_mixed_function_arg (mode, type, align_words); >> >> return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words); >> } >> else >> return NULL_RTX; >> >> The old code for the unused hidden parameter (which was the 9th param) would >> fall thru to the "return NULL_RTX;" which would make the callee assume there >> was a parameter save area allocated. Now instead, we'll return a reg rtx, >> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now >> see a copy of r11 into a pseudo like we do for the other param regs. >> Is that a problem? Given it's an unused parameter, it'll probably get deleted >> as dead code, but could it cause any issues? What if we have more than one >> unused hidden parameter and we return r12 and r13 which have specific uses >> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead. >> Have you verified what the callee RTL looks like after expand for these >> unused hidden parameters? Is there a rtx we can return that isn't a NULL_RTX >> which triggers the assumption of a parameter save area, but isn't a reg rtx >> which might lead to some rtl being generated? Would a (const_int 0) or >> something else work? >> >> > For the above use case it will return > > (reg:DI 5 %r5) and below check entry_parm = > (reg:DI 5 %r5) and the following check will not return TRUE and hence parameter save area will not be allocated. > > It will not generate any rtx in the callee rtl code but it just used to check whether to allocate parameter save area or not when number of args > 8. > > /* If there is no incoming register, we need a stack. */ > entry_parm = rs6000_function_arg (args_so_far, arg); > if (entry_parm == NULL) > return true; > > /* Likewise if we need to pass both in registers and on the stack. */ > if (GET_CODE (entry_parm) == PARALLEL > && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX) > return true; > > Thanks & Regards > Ajit >> >>> + /* Actual parameter length ignoring hidden parameter. >>> + This is done to C++ wrapper calling fortran procedures >>> + which has hidden parameter that are not used. */ >> >> I think a simpler comment will suffice: >> >> /* Actual parameter count ignoring unused hidden parameters. */ >> >> >> Peter >> >> >>