From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 71552 invoked by alias); 12 May 2017 20:21:51 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 71535 invoked by uid 89); 12 May 2017 20:21:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=Hx-languages-length:2450, representative, perfect X-HELO: mail-qk0-f179.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=MoGCg1WaDvgh4R6HrR/YKy489Q9RLvd6Zm/fUedG0b4=; b=ZEjVr0tHKa+/pJXh64XL5wienlzOL8gbQXgrVS19ykKMdrfUWsixs6CJxghejioVR1 A94fvY4zq5A6RgGyUO4JQcu1KpfTA6hF2g9n9S8qqKNsnvKw+LLNW3Kudh5vJsKkeLeb x0SymJJiOjRd6UNwyuCQxmBeTnQJkMFVeILFuEQLFJs6DgL06ieVupI9CkuyHyNoUFfC t3E8NqbLS1HP7qr5IHlI1G1gzklBLMM20rvIhOLCj5kdd/lXDW2ybL8Jlu1r9GDqPFxT LKp05uPshi12hLLL3NhVSV3kVsorx7GIzOwKgCbnVC8cmFl6Q3AvbVI3k720c1yEca0H aEsg== X-Gm-Message-State: AODbwcDLGoMAvlM4rnPmEToSNKI7zGhuA6xxRWFl7o9JewUynwMrE7n9 sVQZhE7D7Dc32bYrLb9FPddlOgXs+g== X-Received: by 10.55.162.136 with SMTP id l130mr5662900qke.275.1494620510850; Fri, 12 May 2017 13:21:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> From: "H.J. Lu" Date: Fri, 12 May 2017 20:21:00 -0000 Message-ID: Subject: Re: memcpy performance regressions 2.19 -> 2.24(5) To: Erich Elsen Cc: "Carlos O'Donell" , GNU C Library Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2017-05/txt/msg00421.txt.bz2 On Fri, May 12, 2017 at 12:43 PM, Erich Elsen wrote: > HJ - yes, the benchmark still shows the same behavior. I did have to modify > the build to add -std=c++11. I updated hjl/x86/optimize branch with memcpy_benchmark2.cc to change its output for easy comparison. Please take a look to see if it is still valid. H.J. > Carlos - Maybe the first step is to add a tunable that allows for selection > of the non-temporal-store size threshold without changing the implementation > that is selected. I can work on submitting this patch. > > On Wed, May 10, 2017 at 7:17 PM, Carlos O'Donell wrote: >> >> On 05/10/2017 01:33 PM, H.J. Lu wrote: >> > On Tue, May 9, 2017 at 4:48 PM, Erich Elsen wrote: >> >> store is a net win even though it causes a 2-3x decrease in single >> >> threaded performance for some processors? Or how else is the decision >> >> about the threshold made? >> > >> > There is no perfect number to make everyone happy. I am open >> > to suggestion to improve the compromise. >> > >> > H.J. >> >> I agree with H.J., there is a compromise to be made here. Having a single >> process thrash the box by taking all of the memory bandwidth might be >> sensible for a microservice, but glibc has to default to something that >> works well on average. >> >> With the new tunables infrastructure we can start talking about ways in >> which a tunable could influence IFUNC selection though, allowing users >> some kind of choice into tweaking for single-threaded or multi-threaded, >> single-user or multi-user etc. >> >> What I would like to see as the output of any discussion is a set of >> microbenchmarks (benchtests/) added to glibc that are the distillation >> of whatever workload we're talking about here. This is crucial to the >> community having a way to test from release-to-release that we don't >> regress performance. >> >> Unless you want to sign up to test your workload at every release then >> we need this kind of microbenchmark addition. And microbenchmarks are >> dead-easy to integrate with glibc so most people should have no excuse. >> >> The hardware vendors and distros who want particular performance tests >> are putting such tests in place (representative of their users), and >> direct >> end-users who want particular performance are also adding tests. >> >> -- >> Cheers, >> Carlos. > > -- H.J.