From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb34.google.com (mail-yb1-xb34.google.com [IPv6:2607:f8b0:4864:20::b34]) by sourceware.org (Postfix) with ESMTPS id 00FC638418BB for ; Wed, 29 Jun 2022 19:32:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 00FC638418BB Received: by mail-yb1-xb34.google.com with SMTP id i7so29727601ybe.11 for ; Wed, 29 Jun 2022 12:32:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WKDlYUInBHDxHN78AanR9xEV1tWoQmQRG14ZhPdN/VA=; b=fejcQ2VnQMpexy+ApRSfSNUDDnGHNViaVIA2D7gfazyNN999DZ5tPFbe7Jxed0XJ/n USSP1Hl4lT3Ax7u8PfDuRz/xywUGyqGeVnrvAlca05yLQC1SBialvqBueIPye+Y86EaO FB1PVMv4glClJNt/J85g8fRRa9uYD9K/NZPYVn2HgxDQ+GdjYfEzJnIsk4gN0Uw8h+xS ckomHNrHv5h9cK7twrcUABy/9abYNzXNERNmSxSImUswnZbrWsFHx8zhHWHveBBjUM+8 uQcCnt8Med42IO4l5ADfiSXnz6JGnDhVDbpilyvz5ds5REpKoOkyPIZo+z50wP+kyNNR tQpg== X-Gm-Message-State: AJIora9782xY90h+JqXx/APYHtGVVEy+jVyzlpfQ5WKSyQMNG38nwPr0 euyiYwYFVdN+LdrioCog6sch2+mXMVLEIZnOnEU= X-Google-Smtp-Source: AGRyM1ugq9w/P/a4pwR339AMvyTZ2m1TjZrGp+RL58m6NaunsX71JPoYC6T1c75duJZPkRI434IPz0QdfgV/ry9Klvo= X-Received: by 2002:a25:a1a1:0:b0:668:b8e6:8012 with SMTP id a30-20020a25a1a1000000b00668b8e68012mr5015229ybi.526.1656531164301; Wed, 29 Jun 2022 12:32:44 -0700 (PDT) MIME-Version: 1.0 References: <20220628152735.17863-1-goldstein.w.n@gmail.com> <20220628152735.17863-2-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Wed, 29 Jun 2022 12:32:33 -0700 Message-ID: Subject: Re: [PATCH v1 2/3] x86: Move and slightly improve memset_erms To: "H.J. Lu" Cc: GNU C Library , "Carlos O'Donell" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Jun 2022 19:32:46 -0000 On Wed, Jun 29, 2022 at 12:26 PM H.J. Lu wrote: > > On Tue, Jun 28, 2022 at 8:27 AM Noah Goldstein wrote: > > > > Implementation wise: > > 1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not > > use the L(stosb) label that was previously defined. > > > > 2. Don't give the hotpath (fallthrough) to zero size. > > > > Code positioning wise: > > > > Move L(memset_{chk}_erms) to its own file. Leaving it in between the > > It is ENTRY, not L. Did you mean to move them to the end of file? Will fix L -> ENTRY for V2. Yes it should be moved to new file in this patch. Was rebase mistake. The file change is in the isa raising patch. Will fix both for v2. > > > memset_{impl}_unaligned both adds unnecessary complexity to the > > file and wastes space in a relatively hot cache section. > > --- > > .../multiarch/memset-vec-unaligned-erms.S | 54 ++++++++----------- > > 1 file changed, 23 insertions(+), 31 deletions(-) > > > > diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > > index abc12d9cda..d98c613651 100644 > > --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > > +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S > > @@ -156,37 +156,6 @@ L(entry_from_wmemset): > > #if defined USE_MULTIARCH && IS_IN (libc) > > END (MEMSET_SYMBOL (__memset, unaligned)) > > > > -# if VEC_SIZE == 16 > > -ENTRY (__memset_chk_erms) > > - cmp %RDX_LP, %RCX_LP > > - jb HIDDEN_JUMPTARGET (__chk_fail) > > -END (__memset_chk_erms) > > - > > -/* Only used to measure performance of REP STOSB. */ > > -ENTRY (__memset_erms) > > - /* Skip zero length. */ > > - test %RDX_LP, %RDX_LP > > - jnz L(stosb) > > - movq %rdi, %rax > > - ret > > -# else > > -/* Provide a hidden symbol to debugger. */ > > - .hidden MEMSET_SYMBOL (__memset, erms) > > -ENTRY (MEMSET_SYMBOL (__memset, erms)) > > -# endif > > -L(stosb): > > - mov %RDX_LP, %RCX_LP > > - movzbl %sil, %eax > > - mov %RDI_LP, %RDX_LP > > - rep stosb > > - mov %RDX_LP, %RAX_LP > > - VZEROUPPER_RETURN > > -# if VEC_SIZE == 16 > > -END (__memset_erms) > > -# else > > -END (MEMSET_SYMBOL (__memset, erms)) > > -# endif > > - > > # if defined SHARED && IS_IN (libc) > > ENTRY_CHK (MEMSET_CHK_SYMBOL (__memset_chk, unaligned_erms)) > > cmp %RDX_LP, %RCX_LP > > @@ -461,3 +430,26 @@ L(between_2_3): > > #endif > > ret > > END (MEMSET_SYMBOL (__memset, unaligned_erms)) > > + > > +#if defined USE_MULTIARCH && IS_IN (libc) && VEC_SIZE == 16 > > +ENTRY (__memset_chk_erms) > > + cmp %RDX_LP, %RCX_LP > > + jb HIDDEN_JUMPTARGET (__chk_fail) > > +END (__memset_chk_erms) > > + > > +/* Only used to measure performance of REP STOSB. */ > > +ENTRY (__memset_erms) > > + /* Skip zero length. */ > > + test %RDX_LP, %RDX_LP > > + jz L(stosb_return_zero) > > + mov %RDX_LP, %RCX_LP > > + movzbl %sil, %eax > > + mov %RDI_LP, %RDX_LP > > + rep stosb > > + mov %RDX_LP, %RAX_LP > > + ret > > +L(stosb_return_zero): > > + movq %rdi, %rax > > + ret > > +END (__memset_erms) > > +#endif > > -- > > 2.34.1 > > > > > -- > H.J.