From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <npickito@gmail.com>
Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com
 [IPv6:2607:f8b0:4864:20::b35])
 by sourceware.org (Postfix) with ESMTPS id 3D3983858024;
 Mon, 26 Jul 2021 09:48:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3D3983858024
Received: by mail-yb1-xb35.google.com with SMTP id z18so13850881ybg.8;
 Mon, 26 Jul 2021 02:48:17 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=A8mvEGQuo/gQe+uq5WpeK0wnuD9onSWwgWwUmX4ISOU=;
 b=qbDJl08WK35GFP+NhEBwcflwISE6uDLYvFMB4pGhWvINc3NjpQ8Rl8nhDnQsFZjo9Y
 8oCuSs1TzgMNJw+pAjkDVe5Y1BYNrEYI8OXqZxhheM/5lVgsTeBQGxoWyYR4hq8KH4NK
 up5Zxsvk2DXVFHOWZVgHNVmB/3PcXVonq8eZqgU2s6BopoH4fxqK9Xu/jfji5LLZ374y
 doHy8moYlNIEJIClKIJqLCuwepMeWlNO2dzAjS9C/3bvq/T1NCS1SKGCtSkAMB5zHTIe
 uSoL8RWOVFMM5k+Odzr3aywdeUdkMWO60J8Y7EhBQbXoHrdBF8Iw8c5DP4WnOA30FAiU
 LRMA==
X-Gm-Message-State: AOAM533OVJJ7NHPRd1ApL4+d33KSaHfrUNa3S40vdjNFm/JFj2YMs5F/
 No5XZi6VHOn/FDv9IKxe8FxbHbZQyy4xLlH1pZ2D2FEpQXI=
X-Google-Smtp-Source: ABdhPJwo1mkNUeCBsQ11eGOTteUxdYjwMqXwwG29faLl1Fi1/huDoWK15yGG9mBPq1gw7Fv0vf11l/mWisuFaSCG51E=
X-Received: by 2002:a25:c085:: with SMTP id
 c127mr23151157ybf.506.1627292896354; 
 Mon, 26 Jul 2021 02:48:16 -0700 (PDT)
MIME-Version: 1.0
References: <CA+yXCZBfyW9TKSR0=x58OdR5GSBtmTchJdCedDuuboHVdrtoPw@mail.gmail.com>
 <mhng-35dbc17b-dadf-4e3c-afd6-c2a932e1cf8b@palmerdabbelt-glaptop>
 <CAHB2gtSuk3deYFnHM81jvq2JpT0TkBrHr7nSRz1oOAAwoOHWmg@mail.gmail.com>
 <CAHB2gtQMkEKp+aDJiWVy7+TbE34wqt+Rhw7+ZkxdJ1ukzVMZkw@mail.gmail.com>
In-Reply-To: <CAHB2gtQMkEKp+aDJiWVy7+TbE34wqt+Rhw7+ZkxdJ1ukzVMZkw@mail.gmail.com>
From: Kito Cheng <kito.cheng@gmail.com>
Date: Mon, 26 Jul 2021 17:48:05 +0800
Message-ID: <CA+yXCZCbtKnMEH2TvXxPWVrxYYHyhJJu0oaabUXuHCxFmQ_kjg@mail.gmail.com>
Subject: Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged
 access
To: =?UTF-8?Q?Christoph_M=C3=BCllner?= <cmuellner@gcc.gnu.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>, GCC Patches <gcc-patches@gcc.gnu.org>,
 Jojo R <rjiejie@linux.alibaba.com>, Jim Wilson <jimw@sifive.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Jul 2021 09:48:19 -0000

LGTM, but I would like to wait Palmer's ack.

I am fine with current scheme, just check riscv_slow_unaligned_access_p,
in case we have performance issue, we can consider change mtune's
slow_unaligned_access field to a number rather than a boolean value,
so that we can have better cost model for that.


On Fri, Jul 23, 2021 at 6:55 AM Christoph M=C3=BCllner via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> I have added tests for memset and memcpy.
> Additionally, I have added a test to ensure that -mstrict-align is
> still working.
> I've run the complete GCC test suite with and without the resulting
> patch with no regressions
> (rv32imac/ilp32/medlow, rv32imafdc/ilp32d/medlow,
> rv64imac/lp64/medlow, rv64imafdc/lp64d/medlow).
>
> I have not changed the patch itself, because I think it is reasonable
> to assume that overlapping access
> (i.e. reducing store instructions) will always be cheaper on targets,
> that don't have penalties for unaligned accesses.
> If maintainers disagree, I can change accordingly.
>
> The new patch can be found here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575864.html
>
>
> On Thu, Jul 22, 2021 at 8:23 PM Christoph M=C3=BCllner <cmuellner@gcc.gnu=
.org> wrote:
> >
> > On Thu, Jul 22, 2021 at 7:27 PM Palmer Dabbelt <palmer@dabbelt.com> wro=
te:
> > >
> > > On Thu, 22 Jul 2021 06:29:46 PDT (-0700), gcc-patches@gcc.gnu.org wro=
te:
> > > > Could you add a testcase? Otherwise LGTM.
> > > >
> > > > Option: -O2 -mtune=3Dthead-c906 -march=3Drv64gc -mabi=3Dlp64
> > > > void foo(char *dst){
> > > >    __builtin_memset(dst, 0, 15);
> > > > }
> > >
> > > I'd like to see:
> > >
> > > * Test results.  This is only on for one target right now, so relying=
 on
> > >   it to just work on others isn't a good idea.
> >
> > So you prefer the previous version of the patch, where each
> > tune struct can decide if that feature should be enabled or not?
> >
> > > * Something to demonstrate this doesn't break -mstrict-align.
> >
> > -mstrict-align sets riscv_slow_unaligned_access_p to true.
> > This will be returned by TARGET_SLOW_UNALIGNED_ACCESS.
> > So, this cannot happen. I will add an additional test for that.
> >
> > > * Some sort of performance analysis.  Most machines that support
> > >   unaligned access do so with some performance degredation, it's uncl=
ear
> > >   that this conversion will actually manifst a performance improvemen=
t.
> > >   I don't have a C906 and don't know what workloads people care about
> > >   running on one, but I'm certainly not convinced this is a win --
> > >   what's listed here looks to be the best case, and that's only savin=
g
> > >   two instructions to generate a pretty pathological sequence
> > >   (misaligned access that conflicts with a prior store).
> > >
> > > Jojo: do you have any description of the C906 pipeline?  Specifically=
 in
> > > this case it'd be good to know how it handles unaligned accesses.
> >
> > The tune struct already includes the field slow_unaligned_access.
> > For c906 this is set to false. So the answer is: c906 handles unaligned
> > access reasonably well (assuming the contents of its tune struct are co=
rrect).
> >
> > Note, that the overlapping access only happens if unaligned accesses
> > are allowed.
> > If slow_unaligned_access is set, then you will get a sequence of
> > store-byte instructions.
> >
> > >
> > > >
> > > > On Thu, Jul 22, 2021 at 8:53 PM Christoph Muellner via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > >>
> > > >> This patch enables the overlap-by-pieces feature of the by-pieces
> > > >> infrastructure for inlining builtins in case the target has set
> > > >> riscv_slow_unaligned_access_p to false.
> > > >>
> > > >> To demonstrate the effect for targets with fast unaligned access,
> > > >> the following code sequences are generated for a 15-byte memset-ze=
ro.
> > > >>
> > > >> Without overlap_op_by_pieces we get:
> > > >>   8e:   00053023                sd      zero,0(a0)
> > > >>   92:   00052423                sw      zero,8(a0)
> > > >>   96:   00051623                sh      zero,12(a0)
> > > >>   9a:   00050723                sb      zero,14(a0)
> > > >>
> > > >> With overlap_op_by_pieces we get:
> > > >>   7e:   00053023                sd      zero,0(a0)
> > > >>   82:   000533a3                sd      zero,7(a0)
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >>         * config/riscv/riscv.c (riscv_overlap_op_by_pieces): New f=
unction.
> > > >>         (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
> > > >>         riscv_overlap_op_by_pieces.
> > > >>
> > > >> Signed-off-by: Christoph Muellner <cmuellner@gcc.gnu.org>
> > > >> ---
> > > >>  gcc/config/riscv/riscv.c | 11 +++++++++++
> > > >>  1 file changed, 11 insertions(+)
> > > >>
> > > >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> > > >> index 576960bb37c..98c76ba657a 100644
> > > >> --- a/gcc/config/riscv/riscv.c
> > > >> +++ b/gcc/config/riscv/riscv.c
> > > >> @@ -5201,6 +5201,14 @@ riscv_slow_unaligned_access (machine_mode, =
unsigned int)
> > > >>    return riscv_slow_unaligned_access_p;
> > > >>  }
> > > >>
> > > >> +/* Implement TARGET_OVERLAP_OP_BY_PIECES_P.  */
> > > >> +
> > > >> +static bool
> > > >> +riscv_overlap_op_by_pieces (void)
> > > >> +{
> > > >> +  return !riscv_slow_unaligned_access_p;
> > > >> +}
> > > >> +
> > > >>  /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
> > > >>
> > > >>  static bool
> > > >> @@ -5525,6 +5533,9 @@ riscv_asan_shadow_offset (void)
> > > >>  #undef TARGET_SLOW_UNALIGNED_ACCESS
> > > >>  #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
> > > >>
> > > >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> > > >> +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
> > > >> +
> > > >>  #undef TARGET_SECONDARY_MEMORY_NEEDED
> > > >>  #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_nee=
ded
> > > >>
> > > >> --
> > > >> 2.31.1
> > > >>