From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from loongson.cn (mail.loongson.cn [114.242.206.163]) by sourceware.org (Postfix) with ESMTP id DE9DB3858C5E for ; Tue, 18 Apr 2023 11:21:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DE9DB3858C5E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=loongson.cn Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=loongson.cn Received: from loongson.cn (unknown [10.20.4.52]) by gateway (Coremail) with SMTP id _____8Cxidk1fT5ktmkeAA--.35897S3; Tue, 18 Apr 2023 19:21:25 +0800 (CST) Received: from [10.20.4.52] (unknown [10.20.4.52]) by localhost.localdomain (Coremail) with SMTP id AQAAf8Bxqr0zfT5kjdssAA--.51373S2; Tue, 18 Apr 2023 19:21:23 +0800 (CST) Subject: Re: [PATCH] LoongArch: Improve GAR store for va_list To: Xi Ruoyao , gcc-patches@gcc.gnu.org Cc: WANG Xuerui , Chenghua Xu References: <20230328180139.74395-1-xry111@xry111.site> <7ac6f461-7fcc-6df9-8089-8728c4211b31@loongson.cn> <48461b85f62ba02b334aaffb34936beb7a874d6e.camel@xry111.site> From: Lulu Cheng Message-ID: Date: Tue, 18 Apr 2023 19:21:23 +0800 User-Agent: Mozilla/5.0 (X11; Linux mips64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <48461b85f62ba02b334aaffb34936beb7a874d6e.camel@xry111.site> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-CM-TRANSID:AQAAf8Bxqr0zfT5kjdssAA--.51373S2 X-CM-SenderInfo: xfkh0wpoxo3qxorr0wxvrqhubq/ X-Coremail-Antispam: 1Uk129KBjvJXoW3Xry7tw4fKw1rZw47Jr1rXrb_yoW7Gw18p3 ykAF1YkFW8JFs7Gr1jgw15Jr9ayw47J3W7uFn2gFyxZrsFyry2gF40gr909FykJw48Wr1I qr45W343uF1Yy3DanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUj1kv1TuYvTs0mT0YCTnIWj qI5I8CrVACY4xI64kE6c02F40Ex7xfYxn0WfASr-VFAUDa7-sFnT9fnUUIcSsGvfJTRUUU bIxYFVCjjxCrM7AC8VAFwI0_Jr0_Gr1l1xkIjI8I6I8E6xAIw20EY4v20xvaj40_Wr0E3s 1l1IIY67AEw4v_JrI_Jryl8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxSw2x7M28EF7xv wVC0I7IYx2IY67AKxVWUCVW8JwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwA2z4 x0Y4vEx4A2jsIE14v26r4UJVWxJr1l84ACjcxK6I8E87Iv6xkF7I0E14v26r4UJVWxJr1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqjxCEc2xF0cIa020Ex4CE44I27wAqx4xG64xvF2 IEw4CE5I8CrVC2j2WlYx0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r4j6F4U McvjeVCFs4IE7xkEbVWUJVW8JwACjcxG0xvEwIxGrwCYjI0SjxkI62AI1cAE67vIY487Mx AIw28IcxkI7VAKI48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_ Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwI xGrwCI42IY6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8 JwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcV C2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU8PCzJUUUUU== X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,BODY_8BITS,GIT_PATCH_0,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: 在 2023/4/18 下午5:27, Xi Ruoyao 写道: > On Mon, 2023-04-10 at 17:45 +0800, Lulu Cheng wrote: >> Sorry, it's my question. I still have some questions that I haven't >> understood, so I haven't replied to the email yet.:-( > I've verified the value of cfun->va_list_gpr_size with -fdump-tree- > stdarg and various testcases (including extracting aggregates and > floating-point values in the va list) and the result seems correct. And > gcc/testsuite/gcc.c-torture/execute/va-arg-*.c should provide a good > enough test coverage. > > Is there still something seemly problematic? I think there is no problem with the code modification, but I found that the $r12 register is stored whether or not this patch is added. I don't understand why.:-( > >> 在 2023/4/10 下午5:04, Xi Ruoyao 写道: >>> Ping.  Or maybe I've lost some replies here because my mail server >>> crashed several days ago :). >>> >>> On Wed, 2023-03-29 at 02:01 +0800, Xi Ruoyao wrote: >>>> LoongArch backend used to save all GARs for a function with >>>> variable >>>> arguments.  But sometimes a function only accepts variable >>>> arguments >>>> for >>>> a purpose like C++ function overloading.  For example, POSIX >>>> defines >>>> open() as: >>>> >>>>      int open(const char *path, int oflag, ...); >>>> >>>> But only two forms are actually used: >>>> >>>>      int open(const char *pathname, int flags); >>>>      int open(const char *pathname, int flags, mode_t mode); >>>> >>>> So it's obviously a waste to save all 8 GARs in open().  We can >>>> use >>>> the >>>> cfun->va_list_gpr_size field set by the stdarg pass to only save >>>> the >>>> GARs necessary to be saved. >>>> >>>> If the va_list escapes (for example, in fprintf() we pass it to >>>> vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we >>>> don't need a special case. >>>> >>>> With this patch, only one GAR ($a2/$r6) is saved in open(). >>>> Ideally >>>> even this stack store should be omitted too, but doing so is not >>>> trivial >>>> and AFAIK there are no compilers (for any target) performing the >>>> "ideal" >>>> optimization here, see https://godbolt.org/z/n1YqWq9c9. >>>> >>>> Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk >>>> (GCC 14 or now)? >>>> >>>> gcc/ChangeLog: >>>> >>>>          * config/loongarch/loongarch.cc >>>>          (loongarch_setup_incoming_varargs): Don't save more GARs >>>> than >>>>          cfun->va_list_gpr_size / UNITS_PER_WORD. >>>> >>>> gcc/testsuite/ChangeLog: >>>> >>>>          * gcc.target/loongarch/va_arg.c: New test. >>>> --- >>>>   gcc/config/loongarch/loongarch.cc           |  4 +++- >>>>   gcc/testsuite/gcc.target/loongarch/va_arg.c | 24 >>>> +++++++++++++++++++++ >>>>   2 files changed, 27 insertions(+), 1 deletion(-) >>>>   create mode 100644 gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> >>>> diff --git a/gcc/config/loongarch/loongarch.cc >>>> b/gcc/config/loongarch/loongarch.cc >>>> index 6927bdc7fe5..0ecb91ca997 100644 >>>> --- a/gcc/config/loongarch/loongarch.cc >>>> +++ b/gcc/config/loongarch/loongarch.cc >>>> @@ -764,7 +764,9 @@ loongarch_setup_incoming_varargs >>>> (cumulative_args_t cum, >>>>       loongarch_function_arg_advance (pack_cumulative_args >>>> (&local_cum), arg); >>>> >>>>     /* Found out how many registers we need to save.  */ >>>> -  gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; >>>> +  gp_saved = cfun->va_list_gpr_size / UNITS_PER_WORD; >>>> +  if (gp_saved > (int) (MAX_ARGS_IN_REGISTERS - >>>> local_cum.num_gprs)) >>>> +    gp_saved = MAX_ARGS_IN_REGISTERS - local_cum.num_gprs; >>>> >>>>     if (!no_rtl && gp_saved > 0) >>>>       { >>>> diff --git a/gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> b/gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> new file mode 100644 >>>> index 00000000000..980c96d0e3d >>>> --- /dev/null >>>> +++ b/gcc/testsuite/gcc.target/loongarch/va_arg.c >>>> @@ -0,0 +1,24 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2" } */ >>>> + >>>> +/* Technically we shouldn't save any register for this function: >>>> it >>>> should be >>>> +   compiled as if it accepts 3 named arguments.  But AFAIK no >>>> compilers can >>>> +   achieve this "perfect" optimization now, so just ensure we are >>>> using the >>>> +   knowledge provided by stdarg pass and we won't save GARs >>>> impossible to be >>>> +   accessed with __builtin_va_arg () when the va_list does not >>>> escape.  */ >>>> + >>>> +/* { dg-final { scan-assembler-not "st.*r7" } } */ >>>> + >>>> +int >>>> +test (int a0, ...) >>>> +{ >>>> +  void *arg; >>>> +  int a1, a2; >>>> + >>>> +  __builtin_va_start (arg, a0); >>>> +  a1 = __builtin_va_arg (arg, int); >>>> +  a2 = __builtin_va_arg (arg, int); >>>> +  __builtin_va_end (arg); >>>> + >>>> +  return a0 + a1 + a2; >>>> +}