From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by sourceware.org (Postfix) with ESMTPS id 975523858401 for ; Sat, 6 Nov 2021 18:12:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 975523858401 Received: by mail-pf1-x42e.google.com with SMTP id g18so7207320pfk.5 for ; Sat, 06 Nov 2021 11:12:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=hr1JjQsHXd0WmTY9OzcyDzVVkLb3Ij7aBsvK48Su+As=; b=AZdiZvKMLvlC0OMfpi5x4+BW2Q6tXwW8/GEKER6XdwFES6P/pvEFn0n53EIMII68pz 8YmhJ2z2KHMbggiF0Y2573gRAXtfHcq5SIbOai7Ke8eflR+QgYnvCFtp4sdiLZRvm6aQ 9N/xvi2hNs+ByO65wRyDYjbwPlKKz0DFZCYlllDwQXZI8n1UmMd/gpdXI+IJ4BzqKB2P jcHUL5qp4rL0i4oBsSYwmmnQb4JzvxoB35jw9XQJCuAr2IQa3D37mt9X4m72YDhzk0fN DQ1PWnqZDdsbBdYVDOWFXG6UFtd7wMiVLk3GCUFFc8n7Pa1sQW027NkSd83wmOZSe6tD JVxQ== X-Gm-Message-State: AOAM533m+FXuh4aIW3/L2OqOcAMsSmp5qG+1qWNuw4p+qfh9aYMHC3hI hhly0EIwou7v4slZt7cYodQYXas1QUk7CXi37nM= X-Google-Smtp-Source: ABdhPJwouMojUW0FEnokbfbq5+m7U2YQq4g2VtcH3wwtb7edFSZ9RpRvq1ttmkrA2Uwg3f3435gRQyDA7EKA0D6NiYs= X-Received: by 2002:a05:6a00:24d0:b0:492:727a:8905 with SMTP id d16-20020a056a0024d000b00492727a8905mr24928248pfv.79.1636222319630; Sat, 06 Nov 2021 11:11:59 -0700 (PDT) MIME-Version: 1.0 References: <20211101054952.2349590-1-goldstein.w.n@gmail.com> <20211106173706.3125357-1-goldstein.w.n@gmail.com> <20211106173706.3125357-5-goldstein.w.n@gmail.com> In-Reply-To: From: Noah Goldstein Date: Sat, 6 Nov 2021 13:11:48 -0500 Message-ID: Subject: Re: [PATCH v3 5/5] x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h To: "H.J. Lu" Cc: GNU C Library Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 06 Nov 2021 18:12:02 -0000 On Sat, Nov 6, 2021 at 12:57 PM H.J. Lu wrote: > > On Sat, Nov 6, 2021 at 10:39 AM Noah Goldstein via Libc-alpha > wrote: > > > > No bug. > > > > This patch doubles the rep_movsb_threshold when using ERMS. Based on > > benchmarks the vector copy loop, especially now that it handles 4k > > aliasing, is better for these medium ranged. > > > > On Skylake with ERMS: > > > > Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) > > 4096, 0, 0, 0, 0.975 > > 4096, 0, 0, 1, 0.953 > > 4096, 12, 0, 0, 0.969 > > 4096, 12, 0, 1, 0.872 > > 4096, 44, 0, 0, 0.979 > > 4096, 44, 0, 1, 0.83 > > 4096, 0, 12, 0, 1.006 > > 4096, 0, 12, 1, 0.989 > > 4096, 0, 44, 0, 0.739 > > 4096, 0, 44, 1, 0.942 > > 4096, 12, 12, 0, 1.009 > > 4096, 12, 12, 1, 0.973 > > 4096, 44, 44, 0, 0.791 > > 4096, 44, 44, 1, 0.961 > > 4096, 2048, 0, 0, 0.978 > > 4096, 2048, 0, 1, 0.951 > > 4096, 2060, 0, 0, 0.986 > > 4096, 2060, 0, 1, 0.963 > > 4096, 2048, 12, 0, 0.971 > > 4096, 2048, 12, 1, 0.941 > > 4096, 2060, 12, 0, 0.977 > > 4096, 2060, 12, 1, 0.949 > > 8192, 0, 0, 0, 0.85 > > 8192, 0, 0, 1, 0.845 > > 8192, 13, 0, 0, 0.937 > > 8192, 13, 0, 1, 0.939 > > 8192, 45, 0, 0, 0.932 > > 8192, 45, 0, 1, 0.927 > > 8192, 0, 13, 0, 0.621 > > 8192, 0, 13, 1, 0.62 > > 8192, 0, 45, 0, 0.53 > > 8192, 0, 45, 1, 0.516 > > 8192, 13, 13, 0, 0.664 > > 8192, 13, 13, 1, 0.659 > > 8192, 45, 45, 0, 0.593 > > 8192, 45, 45, 1, 0.575 > > 8192, 2048, 0, 0, 0.854 > > 8192, 2048, 0, 1, 0.834 > > 8192, 2061, 0, 0, 0.863 > > 8192, 2061, 0, 1, 0.857 > > 8192, 2048, 13, 0, 0.63 > > 8192, 2048, 13, 1, 0.629 > > 8192, 2061, 13, 0, 0.627 > > 8192, 2061, 13, 1, 0.62 > > --- > > sysdeps/x86/dl-cacheinfo.h | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h > > index e6c94dfd02..ceb3b53828 100644 > > --- a/sysdeps/x86/dl-cacheinfo.h > > +++ b/sysdeps/x86/dl-cacheinfo.h > > @@ -866,12 +866,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8. */ > > unsigned int minimum_rep_movsb_threshold; > > #endif > > - /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16). */ > > + /* NB: The default REP MOVSB threshold is 4096 * (VEC_SIZE / 16). */ > > unsigned int rep_movsb_threshold; > > if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F) > > && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512)) > > { > > - rep_movsb_threshold = 2048 * (64 / 16); > > + rep_movsb_threshold = 4096 * (64 / 16); > > #if HAVE_TUNABLES > > minimum_rep_movsb_threshold = 64 * 8; > > #endif > > @@ -879,14 +879,14 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) > > else if (CPU_FEATURE_PREFERRED_P (cpu_features, > > AVX_Fast_Unaligned_Load)) > > { > > - rep_movsb_threshold = 2048 * (32 / 16); > > + rep_movsb_threshold = 4096 * (32 / 16); > > #if HAVE_TUNABLES > > minimum_rep_movsb_threshold = 32 * 8; > > #endif > > } > > else > > { > > - rep_movsb_threshold = 2048 * (16 / 16); > > + rep_movsb_threshold = 4096 * (16 / 16); > > #if HAVE_TUNABLES > > minimum_rep_movsb_threshold = 16 * 8; > > #endif > > -- > > 2.25.1 > > > > You need to update comments for x86_rep_movsb_threshold > in sysdeps/x86/dl-tunables.list Can do. Noticing that the original values were based on comparisons with SSE2 likely on SnB or IVB. I don't have any indication that the 2048 value is not optimal for those processors. Should we keep 2048 / (VEC_SIZE / 16) for SSE2? > > -- > H.J.