From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id D89CB385B526 for ; Wed, 30 Nov 2022 22:50:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D89CB385B526 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dabbelt.com Received: by mail-pl1-x633.google.com with SMTP id jl24so670190plb.8 for ; Wed, 30 Nov 2022 14:50:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=8oWh/Y3z2zr3s5E0KYJ/noK8+x3/aiyrsuArH7Y8MZ0=; b=h3ylktPVgTb1y0v5Ls4DCaFqlTYqM0o+f8PpS61aDRftKRPLFMc1SSC5mkdsP+3hRt XaC755ibdrjUwCeeNg+f3vzVisg8wNUfTvrm2n7fYLqEeFJmMGtMfIDaREntE48BO7Ss aB6l4g3Uxd9V9qiVykV9h9kJoqmF61Y2Wl5itfKDSNbGMUnE1VvH5oG2M+RkWcFAy2sh jmSg7J4p8sW/IANhuDH7ZdPWuXFT3eYNFEo03NN6XQDoR4lnkCNW6tQZ0DD7ch5/0u4X voquzaQTIETIk7e6IX3lGfZcadv6H3yeGgDkx0pjOCZiaydokAN8TF/SeVFhw7hSYViZ 6/1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8oWh/Y3z2zr3s5E0KYJ/noK8+x3/aiyrsuArH7Y8MZ0=; b=WKhFGwbKcdkN48Eh5piLiU8qWEAbhsbRTMTnxcBQRI2+a7+o8Y5cXSPD2DPyljBL2a l1jCzlnKXV8PzH7LRgz3PSPWghO5bMhvx3tLvdqdQ2P1/PbYYfFZcNVax+5JF15Lklyg +0If4R0KqUypf4oTNcoFFY8HaYQGdUjRDOTTXdn3WwX63qi8qDK2UejWXD4USPVJFQcg +OKsOWpRmAnLCtpUZsaew6BB8eZGzozLMIDL3D+ayR5WIbje3Q5yEh6fRoX66Z44nkqT j7vjYszqk8p264rWcOHxS5vmL5GQSQvQyvYVHNlYBpui5VZOHn6uxQmWvw7HoNPAPITD knFw== X-Gm-Message-State: ANoB5plu9vt4CL10nn1pvt9kfLvWKMLfs+q0Oc88WNyK7uICwKoJIEHn vN7xd2yAaiHFqQdadwb/CGZizVKqO46HBQ== X-Google-Smtp-Source: AA0mqf56F231IzYGV+FxPu+HEr5XJPAAc7zefJhv6Awfv2lkI1zyDf/M6D9K93cdbppylQWukpp6dw== X-Received: by 2002:a17:902:864c:b0:189:4139:2054 with SMTP id y12-20020a170902864c00b0018941392054mr39155878plt.64.1669848634281; Wed, 30 Nov 2022 14:50:34 -0800 (PST) Received: from localhost ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id p15-20020a17090a680f00b00218f529e486sm3654881pjj.0.2022.11.30.14.50.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Nov 2022 14:50:33 -0800 (PST) Date: Wed, 30 Nov 2022 14:50:33 -0800 (PST) X-Google-Original-Date: Wed, 30 Nov 2022 14:50:24 PST (-0800) Subject: Re: [PATCH] RISC-V: optimize stack manipulation in save-restore In-Reply-To: <20221130083717.14438-1-gaofei@eswincomputing.com> CC: gcc-patches@gcc.gnu.org, jeffreyalaw@gmail.com, Kito Cheng , gaofei@eswincomputing.com From: Palmer Dabbelt To: gaofei@eswincomputing.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, 30 Nov 2022 00:37:17 PST (-0800), gaofei@eswincomputing.com wrote: > The stack that save-restore reserves is not well accumulated in stack allocation and deallocation. > This patch allows less instructions to be used in stack allocation and deallocation if save-restore enabled, > and also a much clear logic for save-restore stack manipulation. > > before patch: > bar: > call t0,__riscv_save_4 > addi sp,sp,-64 > ... > li t0,-12288 > addi t0,t0,-1968 # optimized out after patch > add sp,sp,t0 # prologue > ... > li t0,12288 # epilogue > addi t0,t0,2000 # optimized out after patch > add sp,sp,t0 > ... > addi sp,sp,32 > tail __riscv_restore_4 > > after patch: > bar: > call t0,__riscv_save_4 > addi sp,sp,-2032 > ... > li t0,-12288 > add sp,sp,t0 # prologue > ... > li t0,12288 # epilogue > add sp,sp,t0 > ... > addi sp,sp,2032 > tail __riscv_restore_4 > > gcc/ChangeLog: > > * config/riscv/riscv.cc (riscv_first_stack_step): add a new function parameter remaining_size. > (riscv_compute_frame_info): adapt new riscv_first_stack_step interface. > (riscv_expand_prologue): consider save-restore in stack allocation. > (riscv_expand_epilogue): consider save-restore in stack deallocation. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/stack_save_restore.c: New test. > --- > gcc/config/riscv/riscv.cc | 58 ++++++++++--------- > .../gcc.target/riscv/stack_save_restore.c | 40 +++++++++++++ > 2 files changed, 70 insertions(+), 28 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c I guess with the RISC-V backend still being open for things as big as the V port we should probably be taking code like this as well? I wouldn't be opposed to making an exception for the V code and holding everything else back, though. > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc > index 05bdba5ab4d..9e92e729a5f 100644 > --- a/gcc/config/riscv/riscv.cc > +++ b/gcc/config/riscv/riscv.cc > @@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask) > They decrease stack_pointer_rtx but leave frame_pointer_rtx and > hard_frame_pointer_rtx unchanged. */ > > -static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame); > +static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 remaining_size); > > /* Handle stack align for poly_int. */ > static poly_int64 > @@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void) > save/restore t0. We check for this before clearing the frame struct. */ > if (cfun->machine->interrupt_handler_p) > { > - HOST_WIDE_INT step1 = riscv_first_stack_step (frame); > + HOST_WIDE_INT step1 = riscv_first_stack_step (frame, frame->total_size); > if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1))) > interrupt_save_prologue_temp = true; > } > @@ -4913,31 +4913,31 @@ riscv_restore_reg (rtx reg, rtx mem) > without adding extra instructions. */ > > static HOST_WIDE_INT > -riscv_first_stack_step (struct riscv_frame_info *frame) > +riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 remaining_size) > { > - HOST_WIDE_INT frame_total_constant_size; > - if (!frame->total_size.is_constant ()) > - frame_total_constant_size > - = riscv_stack_align (frame->total_size.coeffs[0]) > - - riscv_stack_align (frame->total_size.coeffs[1]); > + HOST_WIDE_INT remaining_const_size; > + if (!remaining_size.is_constant ()) > + remaining_const_size > + = riscv_stack_align (remaining_size.coeffs[0]) > + - riscv_stack_align (remaining_size.coeffs[1]); The alignment looks off here, at least in the email. Worth fixing it up if you're touching the lines anyway. > else > - frame_total_constant_size = frame->total_size.to_constant (); > + remaining_const_size = remaining_size.to_constant (); > > - if (SMALL_OPERAND (frame_total_constant_size)) > - return frame_total_constant_size; > + if (SMALL_OPERAND (remaining_const_size)) > + return remaining_const_size; > > HOST_WIDE_INT min_first_step = > - RISCV_STACK_ALIGN ((frame->total_size - frame->frame_pointer_offset).to_constant()); > + RISCV_STACK_ALIGN ((remaining_size - frame->frame_pointer_offset).to_constant()); > HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / 8; > - HOST_WIDE_INT min_second_step = frame_total_constant_size - max_first_step; > + HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step; > gcc_assert (min_first_step <= max_first_step); > > /* As an optimization, use the least-significant bits of the total frame > size, so that the second adjustment step is just LUI + ADD. */ > if (!SMALL_OPERAND (min_second_step) > - && frame_total_constant_size % IMM_REACH < IMM_REACH / 2 > - && frame_total_constant_size % IMM_REACH >= min_first_step) > - return frame_total_constant_size % IMM_REACH; > + && remaining_const_size % IMM_REACH < IMM_REACH / 2 > + && remaining_const_size % IMM_REACH >= min_first_step) > + return remaining_const_size % IMM_REACH; Looks like this entire frame->total_size -> remaining_size conversion could be done as an independent patch that would change no functionality, that's always a nice way to do things as it makes the code easier to read. I spent a bit poking around here and nothing wrong is jumping out, but trying to keep all these offset differences in my head is a bit tricky. If you have the time to refactor this to be easier to read that'd be great, otherwise hopefully I (or someone else) will have the time to take a look -- probably not today on my end, though, as I've got some Linux backlog to look at. Thanks! > if (TARGET_RVC) > { > @@ -5037,9 +5037,7 @@ riscv_expand_prologue (void) > /* Save the registers. */ > if ((frame->mask | frame->fmask) != 0) > { > - HOST_WIDE_INT step1 = riscv_first_stack_step (frame); > - if (size.is_constant ()) > - step1 = MIN (size.to_constant(), step1); > + HOST_WIDE_INT step1 = riscv_first_stack_step (frame, size); > > insn = gen_add3_insn (stack_pointer_rtx, > stack_pointer_rtx, > @@ -5142,6 +5140,8 @@ riscv_expand_epilogue (int style) > HOST_WIDE_INT step2 = 0; > bool use_restore_libcall = ((style == NORMAL_RETURN) > && riscv_use_save_libcall (frame)); > + unsigned libcall_size = use_restore_libcall ? > + frame->save_libcall_adjustment : 0; > rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM); > rtx insn; > > @@ -5212,13 +5212,18 @@ riscv_expand_epilogue (int style) > REG_NOTES (insn) = dwarf; > } > > + if (use_restore_libcall) > + frame->mask = 0; /* Temporarily fib for GPRs. */ > + > /* If we need to restore registers, deallocate as much stack as > possible in the second step without going out of range. */ > if ((frame->mask | frame->fmask) != 0) > - { > - step2 = riscv_first_stack_step (frame); > - step1 -= step2; > - } > + step2 = riscv_first_stack_step (frame, frame->total_size - libcall_size); > + > + if (use_restore_libcall) > + frame->mask = mask; /* Undo the above fib. */ > + > + step1 -= step2 + libcall_size; > > /* Set TARGET to BASE + STEP1. */ > if (known_gt (step1, 0)) > @@ -5272,15 +5277,12 @@ riscv_expand_epilogue (int style) > frame->mask = 0; /* Temporarily fib that we need not save GPRs. */ > > /* Restore the registers. */ > - riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg, > + riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size, > + riscv_restore_reg, > true, style == EXCEPTION_RETURN); > > if (use_restore_libcall) > - { > frame->mask = mask; /* Undo the above fib. */ > - gcc_assert (step2 >= frame->save_libcall_adjustment); > - step2 -= frame->save_libcall_adjustment; > - } > > if (need_barrier_p) > riscv_emit_stack_tie (); > diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore.c b/gcc/testsuite/gcc.target/riscv/stack_save_restore.c > new file mode 100644 > index 00000000000..4695ef9469a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-options "-march=rv32imafc -mabi=ilp32f -msave-restore -O2 -fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops" } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +char my_getchar(); > +float getf(); > + > +/* > +**bar: > +** call t0,__riscv_save_4 > +** addi sp,sp,-2032 > +** ... > +** li t0,-12288 > +** add sp,sp,t0 > +** ... > +** li t0,12288 > +** add sp,sp,t0 > +** ... > +** addi sp,sp,2032 > +** tail __riscv_restore_4 > +*/ The test needs to actually check this, it can't just be manual. > +int bar() > +{ > + float volatile farray[3568]; > + > + float sum = 0; > + float f1 = getf(); > + float f2 = getf(); > + float f3 = getf(); > + float f4 = getf(); > + > + for (int i = 0; i < 3568; i++) > + { > + farray[i] = my_getchar() * 1.2; > + sum += farray[i]; > + } > + > + return sum + f1 + f2 + f3 + f4; > +} > +