From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by sourceware.org (Postfix) with ESMTPS id 3D3983858024; Mon, 26 Jul 2021 09:48:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3D3983858024 Received: by mail-yb1-xb35.google.com with SMTP id z18so13850881ybg.8; Mon, 26 Jul 2021 02:48:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=A8mvEGQuo/gQe+uq5WpeK0wnuD9onSWwgWwUmX4ISOU=; b=qbDJl08WK35GFP+NhEBwcflwISE6uDLYvFMB4pGhWvINc3NjpQ8Rl8nhDnQsFZjo9Y 8oCuSs1TzgMNJw+pAjkDVe5Y1BYNrEYI8OXqZxhheM/5lVgsTeBQGxoWyYR4hq8KH4NK up5Zxsvk2DXVFHOWZVgHNVmB/3PcXVonq8eZqgU2s6BopoH4fxqK9Xu/jfji5LLZ374y doHy8moYlNIEJIClKIJqLCuwepMeWlNO2dzAjS9C/3bvq/T1NCS1SKGCtSkAMB5zHTIe uSoL8RWOVFMM5k+Odzr3aywdeUdkMWO60J8Y7EhBQbXoHrdBF8Iw8c5DP4WnOA30FAiU LRMA== X-Gm-Message-State: AOAM533OVJJ7NHPRd1ApL4+d33KSaHfrUNa3S40vdjNFm/JFj2YMs5F/ No5XZi6VHOn/FDv9IKxe8FxbHbZQyy4xLlH1pZ2D2FEpQXI= X-Google-Smtp-Source: ABdhPJwo1mkNUeCBsQ11eGOTteUxdYjwMqXwwG29faLl1Fi1/huDoWK15yGG9mBPq1gw7Fv0vf11l/mWisuFaSCG51E= X-Received: by 2002:a25:c085:: with SMTP id c127mr23151157ybf.506.1627292896354; Mon, 26 Jul 2021 02:48:16 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Kito Cheng Date: Mon, 26 Jul 2021 17:48:05 +0800 Message-ID: Subject: Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access To: =?UTF-8?Q?Christoph_M=C3=BCllner?= Cc: Palmer Dabbelt , GCC Patches , Jojo R , Jim Wilson Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jul 2021 09:48:19 -0000 LGTM, but I would like to wait Palmer's ack. I am fine with current scheme, just check riscv_slow_unaligned_access_p, in case we have performance issue, we can consider change mtune's slow_unaligned_access field to a number rather than a boolean value, so that we can have better cost model for that. On Fri, Jul 23, 2021 at 6:55 AM Christoph M=C3=BCllner via Gcc-patches wrote: > > I have added tests for memset and memcpy. > Additionally, I have added a test to ensure that -mstrict-align is > still working. > I've run the complete GCC test suite with and without the resulting > patch with no regressions > (rv32imac/ilp32/medlow, rv32imafdc/ilp32d/medlow, > rv64imac/lp64/medlow, rv64imafdc/lp64d/medlow). > > I have not changed the patch itself, because I think it is reasonable > to assume that overlapping access > (i.e. reducing store instructions) will always be cheaper on targets, > that don't have penalties for unaligned accesses. > If maintainers disagree, I can change accordingly. > > The new patch can be found here: > https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html > > > On Thu, Jul 22, 2021 at 8:23 PM Christoph M=C3=BCllner wrote: > > > > On Thu, Jul 22, 2021 at 7:27 PM Palmer Dabbelt wro= te: > > > > > > On Thu, 22 Jul 2021 06:29:46 PDT (-0700), gcc-patches@gcc.gnu.org wro= te: > > > > Could you add a testcase? Otherwise LGTM. > > > > > > > > Option: -O2 -mtune=3Dthead-c906 -march=3Drv64gc -mabi=3Dlp64 > > > > void foo(char *dst){ > > > > __builtin_memset(dst, 0, 15); > > > > } > > > > > > I'd like to see: > > > > > > * Test results. This is only on for one target right now, so relying= on > > > it to just work on others isn't a good idea. > > > > So you prefer the previous version of the patch, where each > > tune struct can decide if that feature should be enabled or not? > > > > > * Something to demonstrate this doesn't break -mstrict-align. > > > > -mstrict-align sets riscv_slow_unaligned_access_p to true. > > This will be returned by TARGET_SLOW_UNALIGNED_ACCESS. > > So, this cannot happen. I will add an additional test for that. > > > > > * Some sort of performance analysis. Most machines that support > > > unaligned access do so with some performance degredation, it's uncl= ear > > > that this conversion will actually manifst a performance improvemen= t. > > > I don't have a C906 and don't know what workloads people care about > > > running on one, but I'm certainly not convinced this is a win -- > > > what's listed here looks to be the best case, and that's only savin= g > > > two instructions to generate a pretty pathological sequence > > > (misaligned access that conflicts with a prior store). > > > > > > Jojo: do you have any description of the C906 pipeline? Specifically= in > > > this case it'd be good to know how it handles unaligned accesses. > > > > The tune struct already includes the field slow_unaligned_access. > > For c906 this is set to false. So the answer is: c906 handles unaligned > > access reasonably well (assuming the contents of its tune struct are co= rrect). > > > > Note, that the overlapping access only happens if unaligned accesses > > are allowed. > > If slow_unaligned_access is set, then you will get a sequence of > > store-byte instructions. > > > > > > > > > > > > > On Thu, Jul 22, 2021 at 8:53 PM Christoph Muellner via Gcc-patches > > > > wrote: > > > >> > > > >> This patch enables the overlap-by-pieces feature of the by-pieces > > > >> infrastructure for inlining builtins in case the target has set > > > >> riscv_slow_unaligned_access_p to false. > > > >> > > > >> To demonstrate the effect for targets with fast unaligned access, > > > >> the following code sequences are generated for a 15-byte memset-ze= ro. > > > >> > > > >> Without overlap_op_by_pieces we get: > > > >> 8e: 00053023 sd zero,0(a0) > > > >> 92: 00052423 sw zero,8(a0) > > > >> 96: 00051623 sh zero,12(a0) > > > >> 9a: 00050723 sb zero,14(a0) > > > >> > > > >> With overlap_op_by_pieces we get: > > > >> 7e: 00053023 sd zero,0(a0) > > > >> 82: 000533a3 sd zero,7(a0) > > > >> > > > >> gcc/ChangeLog: > > > >> > > > >> * config/riscv/riscv.c (riscv_overlap_op_by_pieces): New f= unction. > > > >> (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to > > > >> riscv_overlap_op_by_pieces. > > > >> > > > >> Signed-off-by: Christoph Muellner > > > >> --- > > > >> gcc/config/riscv/riscv.c | 11 +++++++++++ > > > >> 1 file changed, 11 insertions(+) > > > >> > > > >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c > > > >> index 576960bb37c..98c76ba657a 100644 > > > >> --- a/gcc/config/riscv/riscv.c > > > >> +++ b/gcc/config/riscv/riscv.c > > > >> @@ -5201,6 +5201,14 @@ riscv_slow_unaligned_access (machine_mode, = unsigned int) > > > >> return riscv_slow_unaligned_access_p; > > > >> } > > > >> > > > >> +/* Implement TARGET_OVERLAP_OP_BY_PIECES_P. */ > > > >> + > > > >> +static bool > > > >> +riscv_overlap_op_by_pieces (void) > > > >> +{ > > > >> + return !riscv_slow_unaligned_access_p; > > > >> +} > > > >> + > > > >> /* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ > > > >> > > > >> static bool > > > >> @@ -5525,6 +5533,9 @@ riscv_asan_shadow_offset (void) > > > >> #undef TARGET_SLOW_UNALIGNED_ACCESS > > > >> #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access > > > >> > > > >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P > > > >> +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces > > > >> + > > > >> #undef TARGET_SECONDARY_MEMORY_NEEDED > > > >> #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_nee= ded > > > >> > > > >> -- > > > >> 2.31.1 > > > >>