From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 65937 invoked by alias); 24 May 2017 03:42:52 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 65919 invoked by uid 89); 24 May 2017 03:42:51 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=Interesting, HX-Received:10.55.147.3, trade X-HELO: mail-qk0-f179.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=0UQXBbF4WmcSFJgdYqkJU1rWK76V+x6oZRrVFh4cCqQ=; b=NhzPu5apXdEI1m81TQqFtkNk3SuUjYIvRvcuKp90K8bee/+kUFoWVco5WwEqESBDjI cp4pfFykJ9LBOJoDEbxxsR4l7ZSGbeYq1mv0imFSuiVlBmz/Sjn0AsHOG5QJmDtLKvft L0XBnciLcxGumzglCRuSu5ChhZp2YdjjcgXyvJfIf50fiZvyLP4mnWVA3Uo6VV4aCoK1 lTzu4vwDaZahDnl3dzS64xBxTegC9cXMi4rXTVHmTBfFu4mH7czRk4CG0u3up5IjIDJx 4MAUF+h6yj6WLiPD8YC0Ab+/0VkMEbqTuCo+F8JXtQwBMtUbZyjhb+OT4XV8Ea8UvgRv XB0w== X-Gm-Message-State: AODbwcBNaJgSM1jLluBIoCWzaXBWrT/gpf1zIM2sLf9indpZFP2hcatT /No9wZFdYwmNtM+7zG18W2Jt9tUjkw== X-Received: by 10.55.147.3 with SMTP id v3mr27827021qkd.165.1495597371929; Tue, 23 May 2017 20:42:51 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> From: "H.J. Lu" Date: Wed, 24 May 2017 03:42:00 -0000 Message-ID: Subject: Re: memcpy performance regressions 2.19 -> 2.24(5) To: Erich Elsen Cc: "Carlos O'Donell" , GNU C Library Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2017-05/txt/msg00709.txt.bz2 On Tue, May 23, 2017 at 5:56 PM, Erich Elsen wrote: > Ok. Do you have any specific concerns? It would help make it easier > for us to do the testing internally to switch to memcpy.c. We use libc_ifunc to implement IFUNC, like x86_64/multiarch/strstr.c. It may be a good idea to switch to a different format and require all IFUNCs in C for x86-64 if compilers with IFUNC attribute are required to build glibc. But this is independent to tunables. > Interesting, thanks for the info. More reason for being able to > select the implementation! > On Tue, May 23, 2017 at 3:55 PM, H.J. Lu wrote: >> On Tue, May 23, 2017 at 3:12 PM, Erich Elsen wrote: >>> Sounds good to me. Even if tunables aren't added, does memcpy.S -> >>> memcpy.c seem reasonable? >> >> I prefer not to do it for now. We can revisit it later after tunable is added >> to cpu_features. >> >> BTW, REP MOV is expected to have lower bandwidth on multi-socket >> systems, but has the benefit of lower cache disruption throughout the >> cache hierarchy. This is trade off of between overall system throughput >> and single program performance. >> >> >>> On Tue, May 23, 2017 at 3:07 PM, H.J. Lu wrote: >>>> On Tue, May 23, 2017 at 1:57 PM, Erich Elsen wrote: >>>>> Maybe there's room for both? >>>>> >>>>> Setting the cpu_features would affect everything; it would be useful >>>>> to be able to target only specific (and very important) routines. >>>> >>>> I prefer to do the cpu_features first. If it turns out not >>>> sufficient, we then do >>>> the IFUNC implementation. >>>> >>>>> On Tue, May 23, 2017 at 1:46 PM, H.J. Lu wrote: >>>>>> On Tue, May 23, 2017 at 1:39 PM, Erich Elsen wrote: >>>>>>> I was also thinking that it might be nice to have a TUNABLE that sets >>>>>>> the implementation of memcpy directly. It would be easier to do this >>>>>>> if memcpy.S was memcpy.c. Attached is a patch that does the >>>>>>> conversion but doesn't add the tunables. How would you feel about >>>>>>> this? It has no runtime impact, probably increases the size slightly, >>>>>>> and makes the code easier to read / modify. >>>>>>> >>>>>> >>>>>> It depends on how far you want to go. We can add TUNABLE support >>>>>> to each IFUNC implementation or we can add TUNABLE support to >>>>>> cpu_features to update processor features. I prefer latter. >>>>>> >>>>>> >>>>>> -- >>>>>> H.J. >>>> >>>> >>>> >>>> -- >>>> H.J. >> >> >> >> -- >> H.J. -- H.J.