From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 130605 invoked by alias); 16 Jun 2017 21:55:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 130595 invoked by uid 89); 16 Jun 2017 21:55:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,KHOP_DYNAMIC,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Jun 2017 21:55:37 +0000 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v5GLreXG107175 for ; Fri, 16 Jun 2017 17:55:40 -0400 Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx0a-001b2d01.pphosted.com with ESMTP id 2b4np4bp0p-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Fri, 16 Jun 2017 17:55:40 -0400 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 16 Jun 2017 15:55:39 -0600 Received: from b03cxnp07028.gho.boulder.ibm.com (9.17.130.15) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 16 Jun 2017 15:55:36 -0600 Received: from b03ledav006.gho.boulder.ibm.com (b03ledav006.gho.boulder.ibm.com [9.17.130.237]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v5GLtad82687258; Fri, 16 Jun 2017 14:55:36 -0700 Received: from b03ledav006.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5C423C604A; Fri, 16 Jun 2017 15:55:36 -0600 (MDT) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b03ledav006.gho.boulder.ibm.com (Postfix) with ESMTP id 2433DC603C; Fri, 16 Jun 2017 15:55:36 -0600 (MDT) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 714804756A; Fri, 16 Jun 2017 17:55:35 -0400 (EDT) Date: Fri, 16 Jun 2017 21:55:00 -0000 From: Michael Meissner To: Segher Boessenkool Cc: Michael Meissner , GCC Patches , David Edelsohn , Bill Schmidt Subject: Re: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9) Mail-Followup-To: Michael Meissner , Segher Boessenkool , GCC Patches , David Edelsohn , Bill Schmidt References: <20170615000158.GA11033@ibm-tiger.the-meissners.org> <20170615233938.GA15195@ibm-tiger.the-meissners.org> <20170616021027.GA2916@ibm-tiger.the-meissners.org> <20170616195246.GH16550@gate.crashing.org> <20170616202658.GA2150@ibm-tiger.the-meissners.org> <20170616213047.GJ16550@gate.crashing.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="Kj7319i9nmIyA2yE" Content-Disposition: inline In-Reply-To: <20170616213047.GJ16550@gate.crashing.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 x-cbid: 17061621-0028-0000-0000-000007D3CCE7 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007245; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00875790; UDB=6.00436083; IPR=6.00655879; BA=6.00005425; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015857; XFM=3.00000015; UTC=2017-06-16 21:55:38 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061621-0029-0000-0000-00003640D113 Message-Id: <20170616215534.GA24208@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-16_11:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706160372 X-IsSubscribed: yes X-SW-Source: 2017-06/txt/msg01249.txt.bz2 --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 3552 On Fri, Jun 16, 2017 at 04:30:48PM -0500, Segher Boessenkool wrote: > On Fri, Jun 16, 2017 at 04:26:58PM -0400, Michael Meissner wrote: > > > > + "&& reload_completed" > > > > > > I still don't think it is such a good idea to do all of this not until > > > after reload. It does of course allow you to play tricks with changing > > > register mode at will, like you do ;-) > > > > The problem is MODES_TIEABLE_P. V4S{I,F}mode and SImode cannot be tied > > together (i.e. use gen_lowpart to change the mode and use a SUBREG). So after > > reload, we can just use gen_rtx_REG (...) to change the register type, but > > before reload, by creating the SUBREG, it can lead to various aborts if rtl > > checking is turned on. > > That sounds like a problem elsewhere? Hrm. > > > > All these unspecs are a similar problem: the RTL optimisers cannot do > > > much at all with it. > > > > I don't think there is a good way to represent a vec_insert. And vec_extract > > can't represent a variable extract either. > > Yeah. But especially for all this lane shuffling etc. the generic > optimisers could do a good job, if only they knew how. Maybe we need > some new RTL codes. > > > > > + [(set_attr "type" "vecperm") > > > > > Is that a good type for this? I think the convert is more expensive > > > than the permutes? If so, that would be better (of course it only > > > matters for sched1, not super important). > > > > I generally use the type of the last insn. I am open to other suggestions. > > It should describe the resulting insns as a whole. Picking the type of > the most expensive insn is often a reasonable approximation; for integer > insns "two" or "three" can be okay. > > I don't think we can do much better currently. Here is the latest patch that restricts the optimization to 64-bit (due to needing VSX small integers). I've done a full bootstrap/make check on a little endian power8 system, and a build without bootstrap and make check on a little endian power9 system. Neither the power8 nor the power9 systems had any regressions. I'm also running a test on a big endian power7 system for completeness. Assuming the power7 test finishes without any regressions, can I check this patch into the trunk and later the GCC 7 branch. The main change was to restrict the optimization to 64-bit PowerPC that have VSX small integer support turned on (default for 64-bit). I did shorten the one line in the testsuite that you mentioned. [gcc] 2017-06-16 Michael Meissner PR target/79799 * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support for doing vector set of SFmode on ISA 3.0. * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise. (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF element. (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a SFmode value into a V4SF variable that was extracted from another V4SF variable without converting the element to double precision and back to single precision vector format. (vsx_insert_extract_v4sf_p9_2): Likewise. [gcc/testsuite] 2017-06-16 Michael Meissner PR target/79799 * gcc.target/powerpc/pr79799-1.c: New test. * gcc.target/powerpc/pr79799-2.c: Likewise. * gcc.target/powerpc/pr79799-3.c: Likewise. * gcc.target/powerpc/pr79799-4.c: Likewise. * gcc.target/powerpc/pr79799-5.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 --Kj7319i9nmIyA2yE Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="pr79799.patch03b" Content-length: 13686 Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 249175) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7451,6 +7451,8 @@ rs6000_expand_vector_set (rtx target, rt insn = gen_vsx_set_v8hi_p9 (target, target, val, elt_rtx); else if (mode == V16QImode) insn = gen_vsx_set_v16qi_p9 (target, target, val, elt_rtx); + else if (mode == V4SFmode) + insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx); } if (insn) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 249175) +++ gcc/config/rs6000/vsx.md (working copy) @@ -3012,6 +3012,134 @@ (define_insn "vsx_set__p9" } [(set_attr "type" "vecperm")]) +(define_insn_and_split "vsx_set_v4sf_p9" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2 "gpc_reg_operand" "ww") + (match_operand:QI 3 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 4 "=&wJwK"))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER + && TARGET_UPPER_REGS_DI && TARGET_POWERPC64" + "#" + "&& reload_completed" + [(set (match_dup 5) + (unspec:V4SF [(match_dup 2)] + UNSPEC_VSX_CVDPSPN)) + (parallel [(set (match_dup 4) + (vec_select:SI (match_dup 6) + (parallel [(match_dup 7)]))) + (clobber (scratch:SI))]) + (set (match_dup 8) + (unspec:V4SI [(match_dup 8) + (match_dup 4) + (match_dup 3)] + UNSPEC_VSX_SET))] +{ + unsigned int tmp_regno = reg_or_subregno (operands[4]); + + operands[5] = gen_rtx_REG (V4SFmode, tmp_regno); + operands[6] = gen_rtx_REG (V4SImode, tmp_regno); + operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2); + operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0])); +} + [(set_attr "type" "vecperm") + (set_attr "length" "12")]) + +;; Special case setting 0.0f to a V4SF element +(define_insn_and_split "*vsx_set_v4sf_p9_zero" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2 "zero_fp_constant" "j") + (match_operand:QI 3 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 4 "=&wJwK"))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER + && TARGET_UPPER_REGS_DI && TARGET_POWERPC64" + "#" + "&& reload_completed" + [(set (match_dup 4) + (const_int 0)) + (set (match_dup 5) + (unspec:V4SI [(match_dup 5) + (match_dup 4) + (match_dup 3)] + UNSPEC_VSX_SET))] +{ + operands[5] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0])); +} + [(set_attr "type" "vecperm") + (set_attr "length" "8")]) + +;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is the element +;; that is in the default scalar position (1 for big endian, 2 for little +;; endian). We just need to do an xxinsertw since the element is in the +;; correct location. + +(define_insn "*vsx_insert_extract_v4sf_p9" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa") + (parallel + [(match_operand:QI 3 "const_0_to_3_operand" "n")])) + (match_operand:QI 4 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER + && TARGET_UPPER_REGS_DI && TARGET_POWERPC64 + && (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))" +{ + int ele = INTVAL (operands[4]); + + if (!VECTOR_ELT_ORDER_BIG) + ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele; + + operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele); + return "xxinsertw %x0,%x2,%4"; +} + [(set_attr "type" "vecperm")]) + +;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is not the element +;; that is in the default scalar position (1 for big endian, 2 for little +;; endian). Convert the insert/extract to int and avoid doing the conversion. + +(define_insn_and_split "*vsx_insert_extract_v4sf_p9_2" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa") + (parallel + [(match_operand:QI 3 "const_0_to_3_operand" "n")])) + (match_operand:QI 4 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 5 "=&wJwK"))] + "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode) + && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER + && TARGET_UPPER_REGS_DI && TARGET_POWERPC64 + && (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))" + "#" + "&& 1" + [(parallel [(set (match_dup 5) + (vec_select:SI (match_dup 6) + (parallel [(match_dup 3)]))) + (clobber (scratch:SI))]) + (set (match_dup 7) + (unspec:V4SI [(match_dup 8) + (match_dup 5) + (match_dup 4)] + UNSPEC_VSX_SET))] +{ + if (GET_CODE (operands[5]) == SCRATCH) + operands[5] = gen_reg_rtx (SImode); + + operands[6] = gen_lowpart (V4SImode, operands[2]); + operands[7] = gen_lowpart (V4SImode, operands[0]); + operands[8] = gen_lowpart (V4SImode, operands[1]); +} + [(set_attr "type" "vecperm")]) + ;; Expanders for builtins (define_expand "vsx_mergel_" [(use (match_operand:VSX_D 0 "vsx_register_operand" "")) Index: gcc/testsuite/gcc.target/powerpc/pr79799-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c (revision 0) @@ -0,0 +1,43 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating + point on ISA 3.0 (power9) systems. */ + +vector float +insert_arg_0 (vector float vf, float f) +{ + return vec_insert (f, vf, 0); +} + +vector float +insert_arg_1 (vector float vf, float f) +{ + return vec_insert (f, vf, 1); +} + +vector float +insert_arg_2 (vector float vf, float f) +{ + return vec_insert (f, vf, 2); +} + +vector float +insert_arg_3 (vector float vf, float f) +{ + return vec_insert (f, vf, 3); +} + +/* { dg-final { scan-assembler {\mxscvdpspn\M} } } */ +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-2.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c (revision 0) @@ -0,0 +1,31 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the default + scalar position. */ + +#if __ORDER_LITTLE_ENDIAN__ +#define ELE 2 +#else +#define ELE 1 +#endif + +vector float +foo (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, ELE), v1, 0); +} + +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr79799-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr79799-3.c (revision 0) @@ -0,0 +1,24 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode. */ + +vector float +foo (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 4), v1, 0); +} + +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler {\mxxextractuw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr79799-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-4.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr79799-4.c (revision 0) @@ -0,0 +1,105 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target p9vector_hw } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include +#include + +__attribute__ ((__noinline__)) +vector float +insert_0 (vector float v, float f) +{ + return vec_insert (f, v, 0); +} + +__attribute__ ((__noinline__)) +vector float +insert_1 (vector float v, float f) +{ + return vec_insert (f, v, 1); +} + +__attribute__ ((__noinline__)) +vector float +insert_2 (vector float v, float f) +{ + return vec_insert (f, v, 2); +} + +__attribute__ ((__noinline__)) +vector float +insert_3 (vector float v, float f) +{ + return vec_insert (f, v, 3); +} + +__attribute__ ((__noinline__)) +void +test_insert (void) +{ + vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f }; + vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f }; + + v1 = insert_0 (v1, 5.0f); + v1 = insert_1 (v1, 6.0f); + v1 = insert_2 (v1, 7.0f); + v1 = insert_3 (v1, 8.0f); + + if (vec_any_ne (v1, v2)) + abort (); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_0_3 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 3), v1, 0); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_1_2 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 2), v1, 1); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_2_1 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 1), v1, 2); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_3_0 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 0), v1, 3); +} + +__attribute__ ((__noinline__)) +void +test_insert_extract (void) +{ + vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f }; + vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f }; + vector float v3 = { 8.0f, 7.0f, 6.0f, 5.0f }; + + v1 = insert_extract_0_3 (v1, v2); + v1 = insert_extract_1_2 (v1, v2); + v1 = insert_extract_2_1 (v1, v2); + v1 = insert_extract_3_0 (v1, v2); + + if (vec_any_ne (v1, v3)) + abort (); +} + +int +main (void) +{ + test_insert (); + test_insert_extract (); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr79799-5.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-5.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr79799-5.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* Insure setting 0.0f to a V4SFmode element does not do a FP conversion. */ + +vector float +insert_arg_0 (vector float vf) +{ + return vec_insert (0.0f, vf, 0); +} + +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ +/* { dg-final { scan-assembler-not {\mxscvdpspn\M} } } */ +/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */ --Kj7319i9nmIyA2yE--