From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 47702 invoked by alias); 11 Dec 2017 19:48:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 47684 invoked by uid 89); 11 Dec 2017 19:48:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com From: "Tulio Magno Quites Machado Filho" To: Rajalakshmi Srinivasaraghavan , libc-alpha@sourceware.org Cc: Subject: Re: [PATCHv2] powerpc: POWER8 memcpy optimization for cached memory In-Reply-To: References: <87vaik8uxy.fsf@linux.vnet.ibm.com> <20171208194020.5005-1-tuliom@linux.vnet.ibm.com> User-Agent: Notmuch/0.25 (http://notmuchmail.org) Emacs/25.3.1 (x86_64-redhat-linux-gnu) Date: Mon, 11 Dec 2017 19:48:00 -0000 MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 x-cbid: 17121119-0040-0000-0000-000003CFD7AC X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008190; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000244; SDB=6.00958788; UDB=6.00484835; IPR=6.00738810; BA=6.00005736; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018485; XFM=3.00000015; UTC=2017-12-11 19:48:16 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17121119-0041-0000-0000-000007C51764 Message-Id: <87a7yp2ghu.fsf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-12-11_08:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1712110282 X-SW-Source: 2017-12/txt/msg00336.txt.bz2 Rajalakshmi Srinivasaraghavan writes: > On 12/09/2017 01:10 AM, Tulio Magno Quites Machado Filho wrote: >> * manual/tunables.texi (Hardware Capability Tunables): Document >> glibc.tune.cached_memopt. >> * sysdeps/powerpc/cpu-features.c: New file. >> * sysdeps/powerpc/cpu-features.h: New file. >> * sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add >> _dl_powerpc_cpu_features. >> * sysdeps/powerpc/dl-tunables.list: New file. >> * sysdeps/powerpc/ldsodefs.h: Include cpu-features.h. >> * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h: . > > Comment missing. Ooops. >> * sysdeps/powerpc/powerpc64/dl-machine.h (INIT_ARCH): Initialize >> use_aligned_memopt. > > Should this be moved to init-arch.h? (also use_cached_memopt) Indeed. Changed to: * sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h (INIT_ARCH): Initialize use_aligned_memopt. * sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED && IS_IN(rtld))]: Restrict dl_platform_init availability and initialize CPU features used by tunables. >> diff --git a/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S >> new file mode 100644 >> index 0000000..e5b6f25 >> --- /dev/null >> +++ b/sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S >> @@ -0,0 +1,179 @@ >> + stxvd2x v0,r0,r3 >> +L(dst_is_align_16): >> + cmpldi cr7,r5,127 >> + ble cr7,L(tail_copy) >> + addi r8,r5,-128 >> + mr r9,r12 >> + rldicr r8,r8,0,56 >> + li r11,16 >> + srdi r10,r8,7 >> + addi r0,r8,128 >> + addi r10,r10,1 > > Can we directly do > rldicr r0, r5, 0, 56 > srdi r10,r5,7 > instead of this sequence? > 79 addi r8,r5,-128 > 81 rldicr r8,r8,0,56 > 83 srdi r10,r8,7 > 84 addi r0,r8,128 > 85 addi r10,r10,1 Yes. I changed that and made more changes for clarity: - Replaced rldicr with clrrdi. - Replace r0 with 0 where it's treated as an immediate. Pushed as c9cd7b0ce5c5. -- Tulio Magno