From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=yDrC=36=dabbelt.com=palmer@sourceware.org>
Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633])
	by sourceware.org (Postfix) with ESMTPS id D89CB385B526
	for <gcc-patches@gcc.gnu.org>; Wed, 30 Nov 2022 22:50:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D89CB385B526
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dabbelt.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dabbelt.com
Received: by mail-pl1-x633.google.com with SMTP id jl24so670190plb.8
        for <gcc-patches@gcc.gnu.org>; Wed, 30 Nov 2022 14:50:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=dabbelt-com.20210112.gappssmtp.com; s=20210112;
        h=content-transfer-encoding:mime-version:message-id:to:from:cc
         :in-reply-to:subject:date:from:to:cc:subject:date:message-id
         :reply-to;
        bh=8oWh/Y3z2zr3s5E0KYJ/noK8+x3/aiyrsuArH7Y8MZ0=;
        b=h3ylktPVgTb1y0v5Ls4DCaFqlTYqM0o+f8PpS61aDRftKRPLFMc1SSC5mkdsP+3hRt
         XaC755ibdrjUwCeeNg+f3vzVisg8wNUfTvrm2n7fYLqEeFJmMGtMfIDaREntE48BO7Ss
         aB6l4g3Uxd9V9qiVykV9h9kJoqmF61Y2Wl5itfKDSNbGMUnE1VvH5oG2M+RkWcFAy2sh
         jmSg7J4p8sW/IANhuDH7ZdPWuXFT3eYNFEo03NN6XQDoR4lnkCNW6tQZ0DD7ch5/0u4X
         voquzaQTIETIk7e6IX3lGfZcadv6H3yeGgDkx0pjOCZiaydokAN8TF/SeVFhw7hSYViZ
         6/1g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:message-id:to:from:cc
         :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=8oWh/Y3z2zr3s5E0KYJ/noK8+x3/aiyrsuArH7Y8MZ0=;
        b=WKhFGwbKcdkN48Eh5piLiU8qWEAbhsbRTMTnxcBQRI2+a7+o8Y5cXSPD2DPyljBL2a
         l1jCzlnKXV8PzH7LRgz3PSPWghO5bMhvx3tLvdqdQ2P1/PbYYfFZcNVax+5JF15Lklyg
         +0If4R0KqUypf4oTNcoFFY8HaYQGdUjRDOTTXdn3WwX63qi8qDK2UejWXD4USPVJFQcg
         +OKsOWpRmAnLCtpUZsaew6BB8eZGzozLMIDL3D+ayR5WIbje3Q5yEh6fRoX66Z44nkqT
         j7vjYszqk8p264rWcOHxS5vmL5GQSQvQyvYVHNlYBpui5VZOHn6uxQmWvw7HoNPAPITD
         knFw==
X-Gm-Message-State: ANoB5plu9vt4CL10nn1pvt9kfLvWKMLfs+q0Oc88WNyK7uICwKoJIEHn
	vN7xd2yAaiHFqQdadwb/CGZizVKqO46HBQ==
X-Google-Smtp-Source: AA0mqf56F231IzYGV+FxPu+HEr5XJPAAc7zefJhv6Awfv2lkI1zyDf/M6D9K93cdbppylQWukpp6dw==
X-Received: by 2002:a17:902:864c:b0:189:4139:2054 with SMTP id y12-20020a170902864c00b0018941392054mr39155878plt.64.1669848634281;
        Wed, 30 Nov 2022 14:50:34 -0800 (PST)
Received: from localhost ([50.221.140.188])
        by smtp.gmail.com with ESMTPSA id p15-20020a17090a680f00b00218f529e486sm3654881pjj.0.2022.11.30.14.50.33
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 30 Nov 2022 14:50:33 -0800 (PST)
Date: Wed, 30 Nov 2022 14:50:33 -0800 (PST)
X-Google-Original-Date: Wed, 30 Nov 2022 14:50:24 PST (-0800)
Subject:     Re: [PATCH] RISC-V: optimize stack manipulation in save-restore
In-Reply-To: <20221130083717.14438-1-gaofei@eswincomputing.com>
CC: gcc-patches@gcc.gnu.org, jeffreyalaw@gmail.com,
  Kito Cheng <kito.cheng@gmail.com>, gaofei@eswincomputing.com
From: Palmer Dabbelt <palmer@dabbelt.com>
To: gaofei@eswincomputing.com
Message-ID: <mhng-0c8efbb7-a300-4c78-969a-c66f4a857042@palmer-ri-x1c9a>
Mime-Version: 1.0 (MHng)
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, 30 Nov 2022 00:37:17 PST (-0800), gaofei@eswincomputing.com wrote:
> The stack that save-restore reserves is not well accumulated in stack allocation and deallocation.
> This patch allows less instructions to be used in stack allocation and deallocation if save-restore enabled,
> and also a much clear logic for save-restore stack manipulation.
>
> before patch:
> 	bar:
> 		call	t0,__riscv_save_4
> 		addi	sp,sp,-64
> 		...
> 		li	t0,-12288
> 		addi	t0,t0,-1968 # optimized out after patch
> 		add	sp,sp,t0 # prologue
> 		...
> 		li	t0,12288 # epilogue
> 		addi	t0,t0,2000 # optimized out after patch
> 		add	sp,sp,t0
> 		...
> 		addi	sp,sp,32
> 		tail	__riscv_restore_4
>
> after patch:
> 	bar:
> 		call	t0,__riscv_save_4
> 		addi	sp,sp,-2032
> 		...
> 		li	t0,-12288
> 		add	sp,sp,t0 # prologue
> 		...
> 		li	t0,12288 # epilogue
> 		add	sp,sp,t0
> 		...
> 		addi	sp,sp,2032
> 		tail	__riscv_restore_4
>
> gcc/ChangeLog:
>
>         * config/riscv/riscv.cc (riscv_first_stack_step): add a new function parameter remaining_size.
>         (riscv_compute_frame_info): adapt new riscv_first_stack_step interface.
>         (riscv_expand_prologue): consider save-restore in stack allocation.
>         (riscv_expand_epilogue): consider save-restore in stack deallocation.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/stack_save_restore.c: New test.
> ---
>  gcc/config/riscv/riscv.cc                     | 58 ++++++++++---------
>  .../gcc.target/riscv/stack_save_restore.c     | 40 +++++++++++++
>  2 files changed, 70 insertions(+), 28 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c

I guess with the RISC-V backend still being open for things as big as 
the V port we should probably be taking code like this as well?  I 
wouldn't be opposed to making an exception for the V code and holding 
everything else back, though.

> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 05bdba5ab4d..9e92e729a5f 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask)
>     They decrease stack_pointer_rtx but leave frame_pointer_rtx and
>     hard_frame_pointer_rtx unchanged.  */
>
> -static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame);
> +static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 remaining_size);
>
>  /* Handle stack align for poly_int.  */
>  static poly_int64
> @@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void)
>       save/restore t0.  We check for this before clearing the frame struct.  */
>    if (cfun->machine->interrupt_handler_p)
>      {
> -      HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
> +      HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size);
>        if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1)))
>  	interrupt_save_prologue_temp = true;
>      }
> @@ -4913,31 +4913,31 @@ riscv_restore_reg (rtx reg, rtx mem)
>     without adding extra instructions.  */
>
>  static HOST_WIDE_INT
> -riscv_first_stack_step (struct riscv_frame_info *frame)
> +riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 remaining_size)
>  {
> -  HOST_WIDE_INT frame_total_constant_size;
> -  if (!frame->total_size.is_constant ())
> -    frame_total_constant_size
> -      = riscv_stack_align (frame->total_size.coeffs[0])
> -	- riscv_stack_align (frame->total_size.coeffs[1]);
> +  HOST_WIDE_INT remaining_const_size;
> +  if (!remaining_size.is_constant ())
> +    remaining_const_size
> +      = riscv_stack_align (remaining_size.coeffs[0])
> +	- riscv_stack_align (remaining_size.coeffs[1]);

The alignment looks off here, at least in the email.  Worth fixing it up 
if you're touching the lines anyway.

>    else
> -    frame_total_constant_size = frame->total_size.to_constant ();
> +    remaining_const_size = remaining_size.to_constant ();
>
> -  if (SMALL_OPERAND (frame_total_constant_size))
> -    return frame_total_constant_size;
> +  if (SMALL_OPERAND (remaining_const_size))
> +    return remaining_const_size;
>
>    HOST_WIDE_INT min_first_step =
> -    RISCV_STACK_ALIGN ((frame->total_size - frame->frame_pointer_offset).to_constant());
> +    RISCV_STACK_ALIGN ((remaining_size - frame->frame_pointer_offset).to_constant());
>    HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8;
> -  HOST_WIDE_INT min_second_step = frame_total_constant_size - max_first_step;
> +  HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step;
>    gcc_assert (min_first_step <= max_first_step);
>
>    /* As an optimization, use the least-significant bits of the total frame
>       size, so that the second adjustment step is just LUI + ADD.  */
>    if (!SMALL_OPERAND (min_second_step)
> -      && frame_total_constant_size % IMM_REACH < IMM_REACH / 2
> -      && frame_total_constant_size % IMM_REACH >= min_first_step)
> -    return frame_total_constant_size % IMM_REACH;
> +      && remaining_const_size % IMM_REACH < IMM_REACH / 2
> +      && remaining_const_size % IMM_REACH >= min_first_step)
> +    return remaining_const_size % IMM_REACH;

Looks like this entire frame->total_size -> remaining_size conversion 
could be done as an independent patch that would change no 
functionality, that's always a nice way to do things as it makes the 
code easier to read.

I spent a bit poking around here and nothing wrong is jumping out, but 
trying to keep all these offset differences in my head is a bit tricky.  
If you have the time to refactor this to be easier to read that'd be 
great, otherwise hopefully I (or someone else) will have the time to 
take a look -- probably not today on my end, though, as I've got some 
Linux backlog to look at.

Thanks!

>    if (TARGET_RVC)
>      {
> @@ -5037,9 +5037,7 @@ riscv_expand_prologue (void)
>    /* Save the registers.  */
>    if ((frame->mask | frame->fmask) != 0)
>      {
> -      HOST_WIDE_INT step1 = riscv_first_stack_step (frame);
> -      if (size.is_constant ())
> -	step1 = MIN (size.to_constant(), step1);
> +      HOST_WIDE_INT step1 = riscv_first_stack_step (frame, size);
>
>        insn = gen_add3_insn (stack_pointer_rtx,
>  			    stack_pointer_rtx,
> @@ -5142,6 +5140,8 @@ riscv_expand_epilogue (int style)
>    HOST_WIDE_INT step2 = 0;
>    bool use_restore_libcall = ((style == NORMAL_RETURN)
>  			      && riscv_use_save_libcall (frame));
> +  unsigned libcall_size = use_restore_libcall ?
> +                            frame->save_libcall_adjustment : 0;
>    rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM);
>    rtx insn;
>
> @@ -5212,13 +5212,18 @@ riscv_expand_epilogue (int style)
>        REG_NOTES (insn) = dwarf;
>      }
>
> +  if (use_restore_libcall)
> +    frame->mask = 0; /* Temporarily fib for GPRs.  */
> +
>    /* If we need to restore registers, deallocate as much stack as
>       possible in the second step without going out of range.  */
>    if ((frame->mask | frame->fmask) != 0)
> -    {
> -      step2 = riscv_first_stack_step (frame);
> -      step1 -= step2;
> -    }
> +    step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size);
> +
> +  if (use_restore_libcall)
> +    frame->mask = mask; /* Undo the above fib.  */
> +
> +  step1 -= step2 + libcall_size;
>
>    /* Set TARGET to BASE + STEP1.  */
>    if (known_gt (step1, 0))
> @@ -5272,15 +5277,12 @@ riscv_expand_epilogue (int style)
>      frame->mask = 0; /* Temporarily fib that we need not save GPRs.  */
>
>    /* Restore the registers.  */
> -  riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg,
> +  riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size,
> +                            riscv_restore_reg,
>  			    true, style == EXCEPTION_RETURN);
>
>    if (use_restore_libcall)
> -    {
>        frame->mask = mask; /* Undo the above fib.  */
> -      gcc_assert (step2 >= frame->save_libcall_adjustment);
> -      step2 -= frame->save_libcall_adjustment;
> -    }
>
>    if (need_barrier_p)
>      riscv_emit_stack_tie ();
> diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore.c b/gcc/testsuite/gcc.target/riscv/stack_save_restore.c
> new file mode 100644
> index 00000000000..4695ef9469a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32imafc -mabi=ilp32f -msave-restore -O2 -fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +char my_getchar();
> +float getf();
> +
> +/*
> +**bar:
> +**	call	t0,__riscv_save_4
> +**	addi	sp,sp,-2032
> +**	...
> +**	li	t0,-12288
> +**	add	sp,sp,t0
> +**	...
> +**	li	t0,12288
> +**	add	sp,sp,t0
> +**	...
> +**	addi	sp,sp,2032
> +**	tail	__riscv_restore_4
> +*/

The test needs to actually check this, it can't just be manual.

> +int bar()
> +{
> +  float volatile farray[3568];
> +
> +  float sum = 0;
> +  float f1 = getf();
> +  float f2 = getf();
> +  float f3 = getf();
> +  float f4 = getf();
> +
> +  for (int i = 0; i < 3568; i++)
> +  {
> +    farray[i] = my_getchar() * 1.2;
> +    sum += farray[i];
> +  }
> +
> +  return sum + f1 + f2 + f3 + f4;
> +}
> +