From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 53163 invoked by alias); 3 Dec 2018 14:33:30 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 53151 invoked by uid 89); 3 Dec 2018 14:33:29 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Hx-languages-length:1181, H*r:ip*209.85.210.68, Hx-spam-relays-external:209.85.210.68, H*RU:209.85.210.68 X-HELO: mail-ot1-f68.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gCwj9vA5SaRF/TqpnESVh5+SKmkDCZ0mS332uaQSoxY=; b=RvOibRz2Tqq2VYtjg3Eq9GYlXdQSqz5QStT70zMP0ySS8wrXKbaUp/LDqL818oSuNw WdkZo9u8ZDm7MKHse0IyfYfg2MCsSJ3Is8/Qj0P0lm3VwhvuCej3Fdb373p8W4E+T/nE C0dIpeioIU5ovpkrYQwBsQecC35qPXENCGOBxpPQVQoBF+FhmT8fdo/UKZWRqYMCSnw+ w0JbDv1qqVnvgCQnfs+20OdhGaahuh20cCdMRE18EOtByMvZg3+CB8WJV+5QREpEaR34 my9mFoZiMo+qwPmCrXg/erQJv7hJgFpAggBvQwJMQkMjDdH3U6rKcD5GJODadCOJiydg XHYA== MIME-Version: 1.0 References: In-Reply-To: From: "H.J. Lu" Date: Mon, 03 Dec 2018 14:33:00 -0000 Message-ID: Subject: Re: [PATCH 1/3] Update s_sincosf.c and x86-64 s_sincosf-fma.c To: Wilco Dijkstra Cc: Adhemerval Zanella , "szabolcs.nagy" , GNU C Library , nd Content-Type: text/plain; charset="UTF-8" X-SW-Source: 2018-12/txt/msg00059.txt.bz2 On Mon, Dec 3, 2018 at 4:13 AM Wilco Dijkstra wrote: > > Hi Adhemerval, > > > I did check on a A53 I saw no regressions with benchtests. Do you see any > > regressions on other chips or systems? > > Cortex-A53 doesn't support 128-bit loads, however most other AArch64 cores do. > > > If it is the case one option could be use my suggestion to move s_sincosf_t > > to its own header. > > Well there is no need to change the existing structure, it's small so the vector > version could just add a new structure. In fact I can't see why any of this should Only sincosf_poly is vectorized. Without changing the existing structure, I need to duplicate everything in sysdeps/ieee754/flt-32/s_sincosf.h. > be target specific. GCC supports generic vector notation, so that should be the > obvious approach for this optimization. > My x86-64 vector version has x86-64 specific intrinsics: __v2df vps1c2 = (__v2df) _mm_loadu_pd (&p->s1c2.s1); __v2df vps2c3 = (__v2df) _mm_loadu_pd (&p->s2c3.s2); __v2df vps3c4 = (__v2df) _mm_loadu_pd (&p->s3c4.s3); __v4sf v4sf = _mm_cvtpd_ps (vsincos); -- H.J.