From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by sourceware.org (Postfix) with ESMTPS id DE1A83858421 for ; Sat, 6 Nov 2021 02:32:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DE1A83858421 Received: by mail-pf1-x432.google.com with SMTP id l3so8524244pfu.13 for ; Fri, 05 Nov 2021 19:32:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=rHNAHT0gQWHEPW6Wa0+gUNBIdWGd48vOo5BF/xU7sLQ=; b=do3SQNreVj+SpHR5LNkgjWv6XW1lHOyAZLk1sVNuUzn0iZhPyWT3lD2BpwyjOTc7A1 fjRS5ljnBEk4eyV8nccElRjc8L2hpdl/fu21m8gQ1a9BVOipWyaYfHODKK3hXqxmh+f0 zuC89gUepQE87JSTyiHC0KnhptSc1IoKMLhzdV/YgFLvvA+ElBzH6T+QtAgFLi9p7pH8 MIQXUJrZVOHTv/RruC/Ku1Fw/wcr3CzIhlKS15I7T3P8IvdELn+gVx74qe+T3k4eb3fS UfgikWBqzXPpWzZ408wo6BOklM4lPVfzU18caNH78X9Ttc0XR0MvKImcbxwBIXUJ63aE 4XkQ== X-Gm-Message-State: AOAM530dzewA/6hB0Rw9DFqB3f8XTGzx/pDFpYU1OUKa4y5sl+g1VDPv LxxG3QWgqe9kNJSyOz3SmD9uR3xlFl8= X-Google-Smtp-Source: ABdhPJzx+URSvNjzOaMrnpK++jz2ticUi5LuxxoUzeRWltFVbhCWkF4Ft8AzAWtknLHIE7vlS3T2HA== X-Received: by 2002:a05:6a00:8c4:b0:44c:9827:16cc with SMTP id s4-20020a056a0008c400b0044c982716ccmr63450740pfu.7.1636165920834; Fri, 05 Nov 2021 19:32:00 -0700 (PDT) Received: from gnu-cfl-2.localdomain ([172.58.35.133]) by smtp.gmail.com with ESMTPSA id d17sm8308222pfo.40.2021.11.05.19.32.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Nov 2021 19:32:00 -0700 (PDT) Received: by gnu-cfl-2.localdomain (Postfix, from userid 1000) id 3A15C1A0706; Fri, 5 Nov 2021 19:31:59 -0700 (PDT) Date: Fri, 5 Nov 2021 19:31:59 -0700 From: "H.J. Lu" To: Noah Goldstein Cc: libc-alpha@sourceware.org, carlos@systemhalted.org Subject: Re: [PATCH v1 5/5] x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h Message-ID: References: <20211101054952.2349590-1-goldstein.w.n@gmail.com> <20211101054952.2349590-5-goldstein.w.n@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211101054952.2349590-5-goldstein.w.n@gmail.com> X-Spam-Status: No, score=-3030.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Nov 2021 02:32:03 -0000 On Mon, Nov 01, 2021 at 12:49:52AM -0500, Noah Goldstein wrote: > No bug. > > This patch doubles the rep_movsb_threshold when using ERMS. Based on > benchmarks the vector copy loop, especially now that it handles 4k > aliasing, is better for these medium ranged. > > On Skylake with ERMS: > > Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) > 4096, 0, 0, 0, 0.975 > 4096, 0, 0, 1, 0.953 > 4096, 12, 0, 0, 0.969 > 4096, 12, 0, 1, 0.872 > 4096, 44, 0, 0, 0.979 > 4096, 44, 0, 1, 0.83 > 4096, 0, 12, 0, 1.006 > 4096, 0, 12, 1, 0.989 > 4096, 0, 44, 0, 0.739 > 4096, 0, 44, 1, 0.942 > 4096, 12, 12, 0, 1.009 > 4096, 12, 12, 1, 0.973 > 4096, 44, 44, 0, 0.791 > 4096, 44, 44, 1, 0.961 > 4096, 2048, 0, 0, 0.978 > 4096, 2048, 0, 1, 0.951 > 4096, 2060, 0, 0, 0.986 > 4096, 2060, 0, 1, 0.963 > 4096, 2048, 12, 0, 0.971 > 4096, 2048, 12, 1, 0.941 > 4096, 2060, 12, 0, 0.977 > 4096, 2060, 12, 1, 0.949 > 8192, 0, 0, 0, 0.85 > 8192, 0, 0, 1, 0.845 > 8192, 13, 0, 0, 0.937 > 8192, 13, 0, 1, 0.939 > 8192, 45, 0, 0, 0.932 > 8192, 45, 0, 1, 0.927 > 8192, 0, 13, 0, 0.621 > 8192, 0, 13, 1, 0.62 > 8192, 0, 45, 0, 0.53 > 8192, 0, 45, 1, 0.516 > 8192, 13, 13, 0, 0.664 > 8192, 13, 13, 1, 0.659 > 8192, 45, 45, 0, 0.593 > 8192, 45, 45, 1, 0.575 > 8192, 2048, 0, 0, 0.854 > 8192, 2048, 0, 1, 0.834 > 8192, 2061, 0, 0, 0.863 > 8192, 2061, 0, 1, 0.857 > 8192, 2048, 13, 0, 0.63 > 8192, 2048, 13, 1, 0.629 > 8192, 2061, 13, 0, 0.627 > 8192, 2061, 13, 1, 0.62 > --- > sysdeps/x86/dl-cacheinfo.h | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > index e6c94dfd02..712b7c7fd0 100644 > --- a/sysdeps/x86/dl-cacheinfo.h > +++ b/sysdeps/x86/dl-cacheinfo.h > @@ -871,7 +871,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F) > && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512)) > { > - rep_movsb_threshold = 2048 * (64 / 16); > + rep_movsb_threshold = 4096 * (64 / 16); Please also update the default of x86_rep_stosb_threshold in sysdeps/x86/dl-tunables.list > #if HAVE_TUNABLES > minimum_rep_movsb_threshold = 64 * 8; > #endif > @@ -879,14 +879,14 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > else if (CPU_FEATURE_PREFERRED_P (cpu_features, > AVX_Fast_Unaligned_Load)) > { > - rep_movsb_threshold = 2048 * (32 / 16); > + rep_movsb_threshold = 4096 * (32 / 16); > #if HAVE_TUNABLES > minimum_rep_movsb_threshold = 32 * 8; > #endif > } > else > { > - rep_movsb_threshold = 2048 * (16 / 16); > + rep_movsb_threshold = 4096 * (16 / 16); > #if HAVE_TUNABLES > minimum_rep_movsb_threshold = 16 * 8; > #endif > @@ -896,6 +896,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) > rep_movsb_threshold = 2112; > > + > + > + Please don't add these blank lines. > unsigned long int rep_movsb_stop_threshold; > /* ERMS feature is implemented from AMD Zen3 architecture and it is > performing poorly for data above L2 cache size. Henceforth, adding > -- > 2.25.1 > Thanks. H.J.