From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 94038 invoked by alias); 16 Jun 2017 02:10:36 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 94012 invoked by uid 89); 16 Jun 2017 02:10:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,GIT_PATCH_3,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,KHOP_DYNAMIC,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0b-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.158.5) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 16 Jun 2017 02:10:31 +0000 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v5G28W5S053530 for ; Thu, 15 Jun 2017 22:10:34 -0400 Received: from e38.co.us.ibm.com (e38.co.us.ibm.com [32.97.110.159]) by mx0a-001b2d01.pphosted.com with ESMTP id 2b44gg3egy-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 15 Jun 2017 22:10:33 -0400 Received: from localhost by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 15 Jun 2017 20:10:32 -0600 Received: from b03cxnp07029.gho.boulder.ibm.com (9.17.130.16) by e38.co.us.ibm.com (192.168.1.138) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 15 Jun 2017 20:10:29 -0600 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v5G2ATL210027488; Thu, 15 Jun 2017 19:10:29 -0700 Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EB1F378043; Thu, 15 Jun 2017 20:10:28 -0600 (MDT) Received: from ibm-tiger.the-meissners.org (unknown [9.32.77.111]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP id C60D678038; Thu, 15 Jun 2017 20:10:28 -0600 (MDT) Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 3E72845F58; Thu, 15 Jun 2017 22:10:28 -0400 (EDT) Date: Fri, 16 Jun 2017 02:10:00 -0000 From: Michael Meissner To: Michael Meissner , Segher Boessenkool , GCC Patches , David Edelsohn , Bill Schmidt Subject: [PATCH, rev 2] PR target/79799, Add vec_insert of V4SFmode on PowerPC ISA 3.0 (power9) Mail-Followup-To: Michael Meissner , Segher Boessenkool , GCC Patches , David Edelsohn , Bill Schmidt References: <20170615000158.GA11033@ibm-tiger.the-meissners.org> <20170615233938.GA15195@ibm-tiger.the-meissners.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="mP3DRpeJDSE+ciuQ" Content-Disposition: inline In-Reply-To: <20170615233938.GA15195@ibm-tiger.the-meissners.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-GCONF: 00 x-cbid: 17061602-0028-0000-0000-000007D22831 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007240; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000212; SDB=6.00875395; UDB=6.00435846; IPR=6.00655484; BA=6.00005423; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00015844; XFM=3.00000015; UTC=2017-06-16 02:10:31 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17061602-0029-0000-0000-0000363D8998 Message-Id: <20170616021027.GA2916@ibm-tiger.the-meissners.org> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-16_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1706160032 X-IsSubscribed: yes X-SW-Source: 2017-06/txt/msg01155.txt.bz2 --mP3DRpeJDSE+ciuQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 2446 On Thu, Jun 15, 2017 at 07:39:39PM -0400, Michael Meissner wrote: > I thought the patch was fine as I posted. I had an optimization I thought > about (optimizing for inserting 0.0f) and I noticed some problems with it. > However, even in backing out the change, there are some problems. So, I will > hopefully reissue the patch tomorrow. Ok, the problem was I need to patch the compiler with a work around to run code on the current alpha hardware, and in backing out the patches of the code I was working on, I backed out the work around as well. This patch replaces the first patch. It adds an optimazation so that if you set a field in a V4SFmode vector to 0.0f, the compiler will know it can just clear the field, and it doesn't have to convert the 0.0 in internal scalar format to vector format witht he XSCVDPSPN instruction. As before, I have bootstrapped this patch on a little endian power8 system, and I had no regressions in the test suite. The new tests pr79799-{1,2,3,5}.c all generate the appropriate code. I have also done a non-bootstrap build and make check on the alpha power9 hardware with --with-cpu=power9, and there are no regressions. The executable test (pr79799-4.c) runs fine. Can I install this change to the trunk? After a week of burn-in, can I install this on the GCC 7.x branch? Note, it will not work on previous branches. [gcc] 2017-06-15 Michael Meissner PR target/79799 * config/rs6000/rs6000.c (rs6000_expand_vector_init): Add support for doing vector set of SFmode on ISA 3.0. * config/rs6000/vsx.md (vsx_set_v4sf_p9): Likewise. (vsx_set_v4sf_p9_zero): Special case setting 0.0f to a V4SF element. (vsx_insert_extract_v4sf_p9): Add an optimization for inserting a SFmode value into a V4SF variable that was extracted from another V4SF variable without converting the element to double precision and back to single precision vector format. (vsx_insert_extract_v4sf_p9_2): Likewise. [gcc/testsuite] 2017-06-15 Michael Meissner PR target/79799 * gcc.target/powerpc/pr79799-1.c: New test. * gcc.target/powerpc/pr79799-2.c: Likewise. * gcc.target/powerpc/pr79799-3.c: Likewise. * gcc.target/powerpc/pr79799-4.c: Likewise. * gcc.target/powerpc/pr79799-5.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797 --mP3DRpeJDSE+ciuQ Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="pr79799.patch02b" Content-length: 13473 Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 249175) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -7442,6 +7442,9 @@ rs6000_expand_vector_set (rtx target, rt else if (mode == V2DImode) insn = gen_vsx_set_v2di (target, target, val, elt_rtx); + else if (TARGET_P9_VECTOR && mode == V4SFmode) + insn = gen_vsx_set_v4sf_p9 (target, target, val, elt_rtx); + else if (TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER && TARGET_UPPER_REGS_DI && TARGET_POWERPC64) { Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 249175) +++ gcc/config/rs6000/vsx.md (working copy) @@ -3012,6 +3012,130 @@ (define_insn "vsx_set__p9" } [(set_attr "type" "vecperm")]) +(define_insn_and_split "vsx_set_v4sf_p9" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2 "gpc_reg_operand" "ww") + (match_operand:QI 3 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 4 "=&wJwK"))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR" + "#" + "&& reload_completed" + [(set (match_dup 5) + (unspec:V4SF [(match_dup 2)] + UNSPEC_VSX_CVDPSPN)) + (parallel [(set (match_dup 4) + (vec_select:SI (match_dup 6) + (parallel [(match_dup 7)]))) + (clobber (scratch:SI))]) + (set (match_dup 8) + (unspec:V4SI [(match_dup 8) + (match_dup 4) + (match_dup 3)] + UNSPEC_VSX_SET))] +{ + unsigned int tmp_regno = reg_or_subregno (operands[4]); + + operands[5] = gen_rtx_REG (V4SFmode, tmp_regno); + operands[6] = gen_rtx_REG (V4SImode, tmp_regno); + operands[7] = GEN_INT (VECTOR_ELT_ORDER_BIG ? 1 : 2); + operands[8] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0])); +} + [(set_attr "type" "vecperm") + (set_attr "length" "12")]) + +;; Special case setting 0.0f to a V4SF element +(define_insn_and_split "*vsx_set_v4sf_p9_zero" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (match_operand:SF 2 "zero_fp_constant" "j") + (match_operand:QI 3 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 4 "=&wJwK"))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR" + "#" + "&& reload_completed" + [(set (match_dup 4) + (const_int 0)) + (set (match_dup 5) + (unspec:V4SI [(match_dup 5) + (match_dup 4) + (match_dup 3)] + UNSPEC_VSX_SET))] +{ + operands[5] = gen_rtx_REG (V4SImode, reg_or_subregno (operands[0])); +} + [(set_attr "type" "vecperm") + (set_attr "length" "8")]) + +;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is the element +;; that is in the default scalar position (1 for big endian, 2 for little +;; endian). We just need to do an xxinsertw since the element is in the +;; correct location. + +(define_insn "*vsx_insert_extract_v4sf_p9" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa") + (parallel + [(match_operand:QI 3 "const_0_to_3_operand" "n")])) + (match_operand:QI 4 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET))] + "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_P9_VECTOR + && (INTVAL (operands[3]) == (VECTOR_ELT_ORDER_BIG ? 1 : 2))" +{ + int ele = INTVAL (operands[4]); + + if (!VECTOR_ELT_ORDER_BIG) + ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele; + + operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele); + return "xxinsertw %x0,%x2,%4"; +} + [(set_attr "type" "vecperm")]) + +;; Optimize x = vec_insert (vec_extract (v2, n), v1, m) if n is not the element +;; that is in the default scalar position (1 for big endian, 2 for little +;; endian). Convert the insert/extract to int and avoid doing the conversion. + +(define_insn_and_split "*vsx_insert_extract_v4sf_p9_2" + [(set (match_operand:V4SF 0 "gpc_reg_operand" "=wa") + (unspec:V4SF + [(match_operand:V4SF 1 "gpc_reg_operand" "0") + (vec_select:SF (match_operand:V4SF 2 "gpc_reg_operand" "wa") + (parallel + [(match_operand:QI 3 "const_0_to_3_operand" "n")])) + (match_operand:QI 4 "const_0_to_3_operand" "n")] + UNSPEC_VSX_SET)) + (clobber (match_scratch:SI 5 "=&wJwK"))] + "VECTOR_MEM_VSX_P (V4SFmode) && VECTOR_MEM_VSX_P (V4SImode) + && TARGET_P9_VECTOR && TARGET_VSX_SMALL_INTEGER + && (INTVAL (operands[3]) != (VECTOR_ELT_ORDER_BIG ? 1 : 2))" + "#" + "&& 1" + [(parallel [(set (match_dup 5) + (vec_select:SI (match_dup 6) + (parallel [(match_dup 3)]))) + (clobber (scratch:SI))]) + (set (match_dup 7) + (unspec:V4SI [(match_dup 8) + (match_dup 5) + (match_dup 4)] + UNSPEC_VSX_SET))] +{ + if (GET_CODE (operands[5]) == SCRATCH) + operands[5] = gen_reg_rtx (SImode); + + operands[6] = gen_lowpart (V4SImode, operands[2]); + operands[7] = gen_lowpart (V4SImode, operands[0]); + operands[8] = gen_lowpart (V4SImode, operands[1]); +} + [(set_attr "type" "vecperm")]) + ;; Expanders for builtins (define_expand "vsx_mergel_" [(use (match_operand:VSX_D 0 "vsx_register_operand" "")) Index: gcc/testsuite/gcc.target/powerpc/pr79799-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-1.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr79799-1.c (working copy) @@ -0,0 +1,43 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* GCC 7.1 did not have a specialized method for inserting 32-bit floating point on + ISA 3.0 (power9) systems. */ + +vector float +insert_arg_0 (vector float vf, float f) +{ + return vec_insert (f, vf, 0); +} + +vector float +insert_arg_1 (vector float vf, float f) +{ + return vec_insert (f, vf, 1); +} + +vector float +insert_arg_2 (vector float vf, float f) +{ + return vec_insert (f, vf, 2); +} + +vector float +insert_arg_3 (vector float vf, float f) +{ + return vec_insert (f, vf, 3); +} + +/* { dg-final { scan-assembler {\mxscvdpspn\M} } } */ +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr79799-2.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-2.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr79799-2.c (working copy) @@ -0,0 +1,31 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode if N is the default + scalar position. */ + +#if __ORDER_LITTLE_ENDIAN__ +#define ELE 2 +#else +#define ELE 1 +#endif + +vector float +foo (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, ELE), v1, 0); +} + +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr79799-3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-3.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr79799-3.c (working copy) @@ -0,0 +1,24 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* Optimize x = vec_insert (vec_extract (v2, N), v1, M) for SFmode. */ + +vector float +foo (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 4), v1, 0); +} + +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler {\mxxextractuw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/pr79799-4.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-4.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr79799-4.c (working copy) @@ -0,0 +1,105 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target p9vector_hw } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include +#include + +__attribute__ ((__noinline__)) +vector float +insert_0 (vector float v, float f) +{ + return vec_insert (f, v, 0); +} + +__attribute__ ((__noinline__)) +vector float +insert_1 (vector float v, float f) +{ + return vec_insert (f, v, 1); +} + +__attribute__ ((__noinline__)) +vector float +insert_2 (vector float v, float f) +{ + return vec_insert (f, v, 2); +} + +__attribute__ ((__noinline__)) +vector float +insert_3 (vector float v, float f) +{ + return vec_insert (f, v, 3); +} + +__attribute__ ((__noinline__)) +void +test_insert (void) +{ + vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f }; + vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f }; + + v1 = insert_0 (v1, 5.0f); + v1 = insert_1 (v1, 6.0f); + v1 = insert_2 (v1, 7.0f); + v1 = insert_3 (v1, 8.0f); + + if (vec_any_ne (v1, v2)) + abort (); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_0_3 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 3), v1, 0); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_1_2 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 2), v1, 1); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_2_1 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 1), v1, 2); +} + +__attribute__ ((__noinline__)) +vector float +insert_extract_3_0 (vector float v1, vector float v2) +{ + return vec_insert (vec_extract (v2, 0), v1, 3); +} + +__attribute__ ((__noinline__)) +void +test_insert_extract (void) +{ + vector float v1 = { 1.0f, 2.0f, 3.0f, 4.0f }; + vector float v2 = { 5.0f, 6.0f, 7.0f, 8.0f }; + vector float v3 = { 8.0f, 7.0f, 6.0f, 5.0f }; + + v1 = insert_extract_0_3 (v1, v2); + v1 = insert_extract_1_2 (v1, v2); + v1 = insert_extract_2_1 (v1, v2); + v1 = insert_extract_3_0 (v1, v2); + + if (vec_any_ne (v1, v3)) + abort (); +} + +int +main (void) +{ + test_insert (); + test_insert_extract (); + return 0; +} Index: gcc/testsuite/gcc.target/powerpc/pr79799-5.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr79799-5.c (nonexistent) +++ gcc/testsuite/gcc.target/powerpc/pr79799-5.c (working copy) @@ -0,0 +1,25 @@ +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-options "-mcpu=power9 -O2" } */ + +#include + +/* Insure setting 0.0f to a V4SFmode element does not do a FP conversion. */ + +vector float +insert_arg_0 (vector float vf) +{ + return vec_insert (0.0f, vf, 0); +} + +/* { dg-final { scan-assembler {\mxxinsertw\M} } } */ +/* { dg-final { scan-assembler-not {\mlvewx\M} } } */ +/* { dg-final { scan-assembler-not {\mlvx\M} } } */ +/* { dg-final { scan-assembler-not {\mvperm\M} } } */ +/* { dg-final { scan-assembler-not {\mvpermr\M} } } */ +/* { dg-final { scan-assembler-not {\mstfs\M} } } */ +/* { dg-final { scan-assembler-not {\mstxssp\M} } } */ +/* { dg-final { scan-assembler-not {\mstxsspx\M} } } */ +/* { dg-final { scan-assembler-not {\mxscvdpspn\M} } } */ +/* { dg-final { scan-assembler-not {\mxxextractuw\M} } } */ --mP3DRpeJDSE+ciuQ--