From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 85708 invoked by alias); 8 Nov 2017 18:52:20 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 85695 invoked by uid 89); 8 Nov 2017 18:52:19 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=stxvd2x, azanella X-HELO: mx0a-001b2d01.pphosted.com From: "Tulio Magno Quites Machado Filho" To: Adhemerval Zanella , Rajalakshmi Srinivasaraghavan Cc: libc-alpha@sourceware.org, Florian Weimer Cc: Subject: Re: [PATCH] powerpc: Use aligned stores in memset In-Reply-To: References: <1503033107-20047-1-git-send-email-raji@linux.vnet.ibm.com> <87mv5yhhdh.fsf@linux.vnet.ibm.com> <45dcb803-4632-0cc4-0f73-c3f9a8a442d9@redhat.com> User-Agent: Notmuch/0.25 (http://notmuchmail.org) Emacs/25.3.1 (x86_64-redhat-linux-gnu) Date: Wed, 08 Nov 2017 18:52:00 -0000 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 x-cbid: 17110818-0040-0000-0000-000003BF4BD5 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008033; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000239; SDB=6.00943020; UDB=6.00475749; IPR=6.00723307; BA=6.00005679; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017914; XFM=3.00000015; UTC=2017-11-08 18:52:14 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17110818-0041-0000-0000-000007B465E0 Message-Id: <87vaik8uxy.fsf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-08_03:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711080250 X-SW-Source: 2017-11/txt/msg00291.txt.bz2 Adhemerval Zanella writes: > I think one way to provide a slight better memcpy implementation for POWER8 > and still be able to circumvent the non-aligned on non-cacheable memory > is to use tunables. > > The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which > uses unaligned load and stores that I created some time ago but never actually > send upstream. It shows better performance on both bench-memcpy and > bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large > (which it is mainly dominated by memory throughput and on the environment I am > using, a shared PowerKVM instance, the results does not seem to be reliable). > > It could use some tunning, specially on some the range I used for unrolling > the load/stores and it also does not care for unaligned access on cross-page > boundary (which tend to be quite slow on current hardware, but also on > current page size of usual 64k also uncommon). > > This first patch does not enable this option as a default for POWER8, it just > add on string tests as an option. The second patch changes the selection to: > > 1. If glibc is configure with tunables, set the new implementation as the > default for ISA 2.07 (power8). > > 2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt > to disable the new implementation selection. I think it would be safer if we don't change the default behavior. IMHO, programs that want a performance improvement would have to set a tunable. In other words, the new implementation would be disabled by default. > So programs that rely on aligned loads can set: > > GLIBC_TUNABLES=glibc.tune.aligned_memopt=1 I also think that we should not expose internal details of the implementation to users, i.e. avoiding to use aligned/unaligned in the name of the function and in the tunables. I think that glibc.tune.cached_memopt=1 better exposes what is the optimal use-case scenario of this implementation. > This is a RFC patch and if the idea sounds to powerpc arch mantainers I can > work on finishing the patch with more comments and send upstream. I tried > to apply same unaligned idea for memset and memmove, but I could get any real > improvement in neither. I like the idea. Could you merge both patches and send it to libc-alpha, please? > [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8 > [ bench-memcpy-random.out: text/plain ] > { > "timing_type": "hp_timing", > "functions": { > "memcpy": { > "bench-variant": "random", > "ifuncs": ["__memcpy_power8", I also suggest give a more specific name, e.g. __memcpy_power8_cached. That would make room for a POWER8 implementation what uses only naturally aligned loads/stores. Your implementation uses lxvd2x and stxvd2x, which should be avoided in a cache-inhibited scenario, i.e. glibc.tune.aligned_memopt=0. However, after changing the tunables' name to glibc.tune.cached_memopt, I think these instructions could stay they're executed when glibc.tune.cached_memopt=1. Thanks!!! -- Tulio Magno