From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 101415 invoked by alias); 24 May 2017 00:56:44 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 101231 invoked by uid 89); 24 May 2017 00:56:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 spammy=Hx-languages-length:2200, trade X-HELO: mail-oi0-f41.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=e2KFWmn0hm8DoJr8dJeXiSv6G526z5p5xviQw4zqiKc=; b=VazzI21EhuY0ZlUUsZupwrGvMLAwPGsRBPWKDeqFkP8qcKi7vaN/Mhsn/zDdiWrsaG nfgRwxMWQQ+/Z3e11Uj043yFAiVCfi5cldjvmVFrJaltHCJw0CvV4xj0SQocXrKGE2CU KNwDJQwnz1SJulYGsHL34sVdgAoVU1XnXtlkKHFqOBaxZ8epCtV91mpXqcui5cm9HQhl yer/p1pXiNVjg88UnoiUcX75aSl1+dbQbeIu0+Jcui0WY0RWKr8B2IH1A/VgPWLE3zf0 cUkP2DkhgLntcLf6i3u/12rn5nuJiCWwOxgPu6p/IZh6OopVPLycss52711NL6im4LVX FWPw== X-Gm-Message-State: AODbwcAWavQvrxWmTvFGr19bEfATbYaRe3PPkpcojlWVrq+tNeLZmVKd 8iMhDcl8myfUAedD2pSamm4J7BRPyoHA0R8= X-Received: by 10.202.94.198 with SMTP id s189mr8173732oib.202.1495587399925; Tue, 23 May 2017 17:56:39 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <9c563a4b-424b-242f-b82f-4650ab2637f7@redhat.com> <28e34264-e8c5-5570-c48c-9125893808b2@redhat.com> From: Erich Elsen Date: Wed, 24 May 2017 00:56:00 -0000 Message-ID: Subject: Re: memcpy performance regressions 2.19 -> 2.24(5) To: "H.J. Lu" Cc: "Carlos O'Donell" , GNU C Library Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2017-05/txt/msg00705.txt.bz2 Ok. Do you have any specific concerns? It would help make it easier for us to do the testing internally to switch to memcpy.c. Interesting, thanks for the info. More reason for being able to select the implementation! On Tue, May 23, 2017 at 3:55 PM, H.J. Lu wrote: > On Tue, May 23, 2017 at 3:12 PM, Erich Elsen wrote: >> Sounds good to me. Even if tunables aren't added, does memcpy.S -> >> memcpy.c seem reasonable? > > I prefer not to do it for now. We can revisit it later after tunable is added > to cpu_features. > > BTW, REP MOV is expected to have lower bandwidth on multi-socket > systems, but has the benefit of lower cache disruption throughout the > cache hierarchy. This is trade off of between overall system throughput > and single program performance. > > >> On Tue, May 23, 2017 at 3:07 PM, H.J. Lu wrote: >>> On Tue, May 23, 2017 at 1:57 PM, Erich Elsen wrote: >>>> Maybe there's room for both? >>>> >>>> Setting the cpu_features would affect everything; it would be useful >>>> to be able to target only specific (and very important) routines. >>> >>> I prefer to do the cpu_features first. If it turns out not >>> sufficient, we then do >>> the IFUNC implementation. >>> >>>> On Tue, May 23, 2017 at 1:46 PM, H.J. Lu wrote: >>>>> On Tue, May 23, 2017 at 1:39 PM, Erich Elsen wrote: >>>>>> I was also thinking that it might be nice to have a TUNABLE that sets >>>>>> the implementation of memcpy directly. It would be easier to do this >>>>>> if memcpy.S was memcpy.c. Attached is a patch that does the >>>>>> conversion but doesn't add the tunables. How would you feel about >>>>>> this? It has no runtime impact, probably increases the size slightly, >>>>>> and makes the code easier to read / modify. >>>>>> >>>>> >>>>> It depends on how far you want to go. We can add TUNABLE support >>>>> to each IFUNC implementation or we can add TUNABLE support to >>>>> cpu_features to update processor features. I prefer latter. >>>>> >>>>> >>>>> -- >>>>> H.J. >>> >>> >>> >>> -- >>> H.J. > > > > -- > H.J.