From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30541 invoked by alias); 5 Oct 2017 12:13:35 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 29852 invoked by uid 89); 5 Oct 2017 12:13:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-4.1 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Subject: Re: [PATCH] powerpc: Use aligned stores in memset To: Adhemerval Zanella , Tulio Magno Quites Machado Filho Cc: libc-alpha@sourceware.org References: <1503033107-20047-1-git-send-email-raji@linux.vnet.ibm.com> <87mv5yhhdh.fsf@linux.vnet.ibm.com> <45dcb803-4632-0cc4-0f73-c3f9a8a442d9@redhat.com> From: Rajalakshmi Srinivasaraghavan Date: Thu, 05 Oct 2017 12:13:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable x-cbid: 17100512-0048-0000-0000-0000025EB70B X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17100512-0049-0000-0000-00004815F9B2 Message-Id: <9b5d72a6-9851-1f78-e93c-3e6a22673846@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-10-05_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710050174 X-SW-Source: 2017-10/txt/msg00234.txt.bz2 On 10/03/2017 11:59 PM, Adhemerval Zanella wrote: > I think one way to provide a slight better memcpy implementation for POWER8 > and still be able to circumvent the non-aligned on non-cacheable memory > is to use tunables. > > The branch azanella/memcpy-power8 [1] has a power8 memcpy optimization which > uses unaligned load and stores that I created some time ago but never actually > send upstream. It shows better performance on both bench-memcpy and > bench-memcpy-random (about 10% on latter) and mixed results on bench-memcpy-large > (which it is mainly dominated by memory throughput and on the environment I am > using, a shared PowerKVM instance, the results does not seem to be reliable). > > It could use some tunning, specially on some the range I used for unrolling > the load/stores and it also does not care for unaligned access on cross-page > boundary (which tend to be quite slow on current hardware, but also on > current page size of usual 64k also uncommon). > > This first patch does not enable this option as a default for POWER8, it just > add on string tests as an option. The second patch changes the selection to: > > 1. If glibc is configure with tunables, set the new implementation as the > default for ISA 2.07 (power8). > > 2. Also if tunable is active, add the parameter glibc.tune.aligned_memopt > to disable the new implementation selection. > > So programs that rely on aligned loads can set: > > GLIBC_TUNABLES=glibc.tune.aligned_memopt=1 > > And then the memcpy ifunc selection would pick the power7 one which uses > only aligned load and stores. > > This is a RFC patch and if the idea sounds to powerpc arch mantainers I can > work on finishing the patch with more comments and send upstream. I tried > to apply same unaligned idea for memset and memmove, but I could get any real > improvement in neither. > > [1]https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/memcpy-power8 Thanks for sharing the patches. At this point we are also working on memcpy for power8 with a different approach and we are planning to post it soon. We can choose the better performing version and use your tunables patch too. -- Thanks Rajalakshmi S