public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Ajit Agarwal <aagarwa1@linux.ibm.com>
To: Peter Bergner <bergner@linux.ibm.com>,
	Jakub Jelinek <jakub@redhat.com>,
	"Kewen.Lin" <linkw@linux.ibm.com>,
	Segher Boessenkool <segher@kernel.crashing.org>,
	David Edelsohn <dje.gcc@gmail.com>,
	Michael Meissner <meissner@linux.ibm.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH v2] rs6000: Stackoverflow in optimized code on PPC [PR100799]
Date: Sat, 23 Mar 2024 19:58:22 +0530	[thread overview]
Message-ID: <22397646-bf74-41b5-9ba4-019020ae0538@linux.ibm.com> (raw)
In-Reply-To: <76307976-77b0-48f0-90b9-6dcec02e3c8f@linux.ibm.com>

Hello Peter:

Sent version-3 of the patch addressing below review comments.

Thanks & Regards
Ajit

On 23/03/24 3:03 pm, Ajit Agarwal wrote:
> Hello Peter:
> 
> On 23/03/24 10:07 am, Peter Bergner wrote:
>> On 3/22/24 5:15 AM, Ajit Agarwal wrote:
>>> When using FlexiBLAS with OpenBLAS we noticed corruption of
>>> the parameters passed to OpenBLAS functions. FlexiBLAS
>>> basically provides a BLAS interface where each function
>>> is a stub that forwards the arguments to a real BLAS lib,
>>> like OpenBLAS.
>>>
>>> Fixes the corruption of caller frame checking number of
>>> arguments is less than equal to GP_ARG_NUM_REG (8)
>>> excluding hidden unused DECLS.
>>
>> I think the git log entry commentary could be a little more descriptive
>> of what the problem is. How about something like the following?
>>
>>   When using FlexiBLAS with OpenBLAS, we noticed corruption of the caller
>>   stack frame when calling OpenBLAS functions.  This was caused by the
>>   FlexiBLAS C/C++ caller and OpenBLAS Fortran callee disagreeing on the
>>   number of function parameters in the callee due to hidden Fortran
>>   parameters. This can cause problems when the callee believes the caller
>>   has allocated a parameter save area when the caller has not done so.
>>   That means any writes by the callee into the non-existent parameter save
>>   area will corrupt the caller stack frame.
>>
>>   The workaround implemented here, is for the callee to determine whether
>>   the caller has allocated a parameter save area or not, by ignoring any
>>   unused hidden parameters when counting the number of parameters.
>>
>>
> I will address this change in the new version of the patch.
>>
>>> 	PR rtk-optimization/100799
>>
>> s/rtk/rtl/
>>
>>
>>
> I will address this in new version of the patch.
>>> 	* config/rs6000/rs6000-calls.cc (rs6000_function_arg): Don't
>>> 	generate parameter save area if number of arguments passed
>>> 	less than equal to GP_ARG_NUM_REG (8) excluding hidden
>>> 	parameter.
>>
>> The callee doesn't generate or allocate the parameter save area, the
>> caller does.  The code here is for the callee trying to determine
>> whether the caller has done so.  How about saying the following instead?
>>
>>   Don't assume a parameter save area has been allocated if the number of
>>   formal parameters, excluding unused hidden parameters, is less than or
>>   equal to GP_ARG_NUM_REG (8).
>>
>>
> 
> I will incorporate this change in new version of the patch.
> 
>>
>>
>>> 	(init_cumulative_args): Check for hidden parameter in fortran
>>> 	routine and set the flag hidden_string_length and actual
>>> 	parameter passed excluding hidden unused DECLS.
>>
>> Check for unused hidden Fortran parameters and set hidden_string_length
>> and actual_parm_length.
>>
>>
> 
> I will address this change in new version of the patch.
> 
>  
>>> +  /* When the buggy C/C++ wrappers call the function with fewer arguments
>>> +     than it actually has and doesn't expect the parameter save area on the
>>> +     caller side because of that while the callee expects it and the callee
>>> +     actually stores something in the parameter save area, it corrupts
>>> +     whatever is in the caller stack frame at that location.  */
>>
>> The wrapper/caller is the one that allocates the parameter save area, so
>> saying "...doesn't expect the parameter save area on the caller side..."
>> doesn't make sense, since it knows whether it allocated it or not.
>> How about saying something like the following instead?
>>
>>   Check whether this function contains any unused hidden parameters and
>>   record how many there are for use in rs6000_function_arg() to determine
>>   whether its callers have allocated a parameter save area or not.
>>   See PR100799 for details.
>>
>>
> 
> I will incorporate this change in new version of the patch.
> 
>>
>>> +  unsigned int num_args = 0;
>>> +  unsigned int hidden_length = 0;
>>> +
>>> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
>>> +       arg; arg = DECL_CHAIN (arg))
>>> +    {
>>> +      num_args++;
>>> +      if (DECL_HIDDEN_STRING_LENGTH (arg))
>>> +	{
>>> +	  tree parmdef = ssa_default_def (cfun, arg);
>>> +	  if (parmdef == NULL || has_zero_uses (parmdef))
>>> +	    {
>>> +	      cum->hidden_string_length = 1;
>>> +	      hidden_length++;
>>> +	    }
>>> +	}
>>> +   }
>>> +
>>> +  cum->actual_parm_length = num_args - hidden_length;
>>
>> This code looks fine, but do we really need two new fields in rs6000_args?
>> Can't we just get along with only cum->actual_parm_length by modifying
>> the rs6000_function_arg() change from:
>>
>>> +      else if (align_words < GP_ARG_NUM_REG
>>> +	       || (cum->hidden_string_length
>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>
>> to:
>>
>> +      else if (align_words < GP_ARG_NUM_REG
>> +	       || cum->actual_parm_length <= GP_ARG_NUM_REG)
>>
>> ???
>>
> 
> Yes we can do that. I will address this in new version of the patch.
> 
>  
>> That said, I have a further comment below on what happens here when 
>> align_words >= GP_ARG_NUM_REG and cum->actual_parm_length <= GP_ARG_NUM_REG.
>>
>>
> 
> If we exceed the align_words >= GP_ARG_NUM_REG then there could be hidden unused DECL paramter in align_words which is greater than 8 then it will return NULL_RTX. Hence in the above condition it should not return NULL_RTX. For the above condition it will not return NULL_RTX instead it will return a rtx reg. 
>  
>>
>>
>>> +     /* When the buggy C/C++ wrappers call the function with fewer arguments
>>> +	than it actually has and doesn't expect the parameter save area on the
>>> +	caller side because of that while the callee expects it and the callee
>>> +	actually stores something in the parameter save area, it corrupts
>>> +	whatever is in the caller stack frame at that location.  */
>>
>> Same comment as before, so same problem with the comment, but the following
>> change...
>>
>>> -      else if (align_words < GP_ARG_NUM_REG)
>>> +      else if (align_words < GP_ARG_NUM_REG
>>> +	       || (cum->hidden_string_length
>>> +	       && cum->actual_parm_length <= GP_ARG_NUM_REG))
>>         {
>>           if (TARGET_32BIT && TARGET_POWERPC64)
>>             return rs6000_mixed_function_arg (mode, type, align_words);
>>
>>           return gen_rtx_REG (mode, GP_ARG_MIN_REG + align_words);
>>         }
>>       else
>>         return NULL_RTX;
>>
>> The old code for the unused hidden parameter (which was the 9th param) would
>> fall thru to the "return NULL_RTX;" which would make the callee assume there
>> was a parameter save area allocated.  Now instead, we'll return a reg rtx,
>> probably of r11 (r3 thru r10 are our param regs) and I'm guessing we'll now
>> see a copy of r11 into a pseudo like we do for the other param regs.
>> Is that a problem? Given it's an unused parameter, it'll probably get deleted
>> as dead code, but could it cause any issues?  What if we have more than one
>> unused hidden parameter and we return r12 and r13 which have specific uses
>> in our ABIs (eg, r13 is our TCB pointer), so it may not actually look dead.
>> Have you verified what the callee RTL looks like after expand for these
>> unused hidden parameters?  Is there a rtx we can return that isn't a NULL_RTX
>> which triggers the assumption of a parameter save area, but isn't a reg rtx
>> which might lead to some rtl being generated?  Would a (const_int 0) or
>> something else work?
>>
>>
> For the above use case it will return 
> 
> (reg:DI 5 %r5) and below check entry_parm = 
> (reg:DI 5 %r5) and the following check will not return TRUE and hence parameter save area will not be allocated.
> 
> It will not generate any rtx in the callee rtl code but it just used to check whether to allocate parameter save area or not when number of args > 8.
> 
> /* If there is no incoming register, we need a stack.  */
>   entry_parm = rs6000_function_arg (args_so_far, arg);
>   if (entry_parm == NULL)
>     return true;
> 
>   /* Likewise if we need to pass both in registers and on the stack.  */
>   if (GET_CODE (entry_parm) == PARALLEL
>       && XEXP (XVECEXP (entry_parm, 0, 0), 0) == NULL_RTX)
>     return true;
> 
> Thanks & Regards
> Ajit
>>
>>> +  /* Actual parameter length ignoring hidden parameter.
>>> +     This is done to C++ wrapper calling fortran procedures
>>> +     which has hidden parameter that are not used.  */
>>
>> I think a simpler comment will suffice:
>>
>>   /* Actual parameter count ignoring unused hidden parameters.  */
>>
>>
>> Peter
>>
>>
>>

  reply	other threads:[~2024-03-23 14:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 10:15 Ajit Agarwal
2024-03-23  4:37 ` Peter Bergner
2024-03-23  9:33   ` Ajit Agarwal
2024-03-23 14:28     ` Ajit Agarwal [this message]
2024-03-23 16:03     ` Peter Bergner
2024-03-23 18:37       ` Ajit Agarwal
2024-04-02  6:12         ` Kewen.Lin
2024-04-02  8:03           ` Jakub Jelinek
2024-04-03  5:18             ` Kewen.Lin
2024-04-03  8:35               ` Jakub Jelinek
2024-04-03  9:02                 ` Kewen.Lin
2024-04-03  9:23                   ` Jakub Jelinek
2024-04-03 11:01                     ` Kewen.Lin
2024-04-03 11:18                       ` Jakub Jelinek
2024-04-03 12:18                         ` Kewen.Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=22397646-bf74-41b5-9ba4-019020ae0538@linux.ibm.com \
    --to=aagarwa1@linux.ibm.com \
    --cc=bergner@linux.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=linkw@linux.ibm.com \
    --cc=meissner@linux.ibm.com \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).