From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19112 invoked by alias); 13 Feb 2020 22:45:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 18929 invoked by uid 89); 13 Feb 2020 22:45:41 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.1 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,KAM_NUMSUBJECT,KAM_SHORT,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 spammy= X-HELO: us-smtp-delivery-1.mimecast.com Received: from us-smtp-1.mimecast.com (HELO us-smtp-delivery-1.mimecast.com) (207.211.31.81) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 13 Feb 2020 22:45:31 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1581633914; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type; bh=wu6p2pCbAfRvNn047e7c4W2BCdoHvbwAnNJWFqLE9jc=; b=IvcUj1zkRCOnypN0Am6NlzEN/rSLHrwTQ5FP3GGFZG8thCjyaasGR/W2eCupQBaYiQJ7N1 JkCLr5JhpotBuMStUAaCPZ71d1hC+J02De5FJm3b+TZ70LFQSeRZZaUH0rbocQqeha8AP2 NYnRbzyZSxzyMZMVpDdJu8qrmLHUe88= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-349-3hIYTYOIMUSgdhqhVmhx1w-1; Thu, 13 Feb 2020 17:45:02 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BCBF9DBA5 for ; Thu, 13 Feb 2020 22:45:01 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-51.ams2.redhat.com [10.36.116.51]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D804B90097 for ; Thu, 13 Feb 2020 22:45:00 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id 01DMiwpS014089 for ; Thu, 13 Feb 2020 23:44:59 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id 01DMiv53014088 for gcc-patches@gcc.gnu.org; Thu, 13 Feb 2020 23:44:57 +0100 Date: Thu, 13 Feb 2020 22:45:00 -0000 From: Jakub Jelinek To: gcc-patches@gcc.gnu.org Subject: Backports to 9.3 Message-ID: <20200213224457.GL17695@tucnak> Reply-To: Jakub Jelinek MIME-Version: 1.0 User-Agent: Mutt/1.11.3 (2019-02-01) X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="x4pBfXISqBoDm8sr" Content-Disposition: inline X-IsSubscribed: yes X-SW-Source: 2020-02/txt/msg00817.txt.bz2 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-length: 908 Hi! I've backported following 15 commits from trunk to 9.3 branch, bootstrapped/regtested on x86_64-linux and i686-linux, committed. r10-6186-g32667e04c7153d97d09d81c1af073d400f0c719a r10-6273-gbff948aa337807260344c83ac9079d6386410094 r10-6314-gaa1b56967d85bfc80d71341395f862ec2b30ca36 r10-6315-g8d7c0bf876fa784101f9ad9e3bba82cc065357da r10-6358-g56b92750f83724177d2c6eae30c208e935a56a37 r10-6444-gb843bcb89519293404bb00d2ed09aae529b54d7f r10-6460-g5a8ad97b6e4823d4ded00a3ce8d80e4bf93368d4 r10-6470-gcf785618ecc90e3f063b99572de48cb91aa5ab5d r10-6471-gcb3f06480a17f98579704b9927632627a3814c5c r10-6522-g79ab8c4321b2dc940bb706a7432a530e26f0df1a r10-6565-gf57aa9503ff170ff6c8549718bd736f6c8168bab r10-6593-g62fc0a6ce28c502fc6a7b7c09157840bf98f945f r10-6612-gdc6d0f89d4be3ed7fde73417606a78c73d954cdf r10-6617-gae2b8ede40a81a83f50d1e705972bc46fafd4ce5 r10-6625-gbacdd5e978dad84e9c547b0d5c7fed14b8d75157 Jakub --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6186-g32667e04c7153d97d09d81c1af073d400f0c719a Content-Transfer-Encoding: quoted-printable Content-length: 4011 =46rom 3b2fbe3e723b20ea9089e5f45c55b79feb37085b Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 23 Jan 2020 20:08:22 +0100 Subject: [PATCH 01/15] postreload: Fix up postreload combine [PR93402] The following testcase is miscompiled, because the postreload pass changes: -(insn 14 13 23 2 (parallel [ - (set (reg:DI 1 dx [94]) - (plus:DI (reg:DI 1 dx [95]) - (reg:DI 5 di [92]))) - (clobber (reg:CC 17 flags)) - ]) "pr93402.c":8:30 186 {*adddi_1} - (expr_list:REG_EQUAL (plus:DI (reg:DI 5 di [92]) - (const_int 111111111111 [0x19debd01c7])) - (nil))) -(insn 23 14 25 2 (set (reg:SI 0 ax) +(insn 23 13 25 2 (set (reg:SI 0 ax) (const_int 0 [0])) "pr93402.c":10:1 67 {*movsi_internal} (nil)) (insn 25 23 26 2 (use (reg:SI 0 ax)) "pr93402.c":10:1 -1 (nil)) -(insn 26 25 35 2 (use (reg:DI 1 dx)) "pr93402.c":10:1 -1 +(insn 26 25 35 2 (use (plus:DI (reg:DI 1 dx [95]) + (reg:DI 5 di [92]))) "pr93402.c":10:1 -1 (nil)) A USE insn is not a normal insn and verify_changes called from apply_change_group is happy about any changes into it. The following patch avoids this optimization if we were to change the USE operand (this routine only changes a reg into (plus reg reg2)). 2020-01-23 Jakub Jelinek PR rtl-optimization/93402 * postreload.c (reload_combine_recognize_pattern): Don't try to adjust USE insns. * gcc.c-torture/execute/pr93402.c: New test. --- gcc/ChangeLog | 9 ++++++++ gcc/postreload.c | 4 ++++ gcc/testsuite/ChangeLog | 8 +++++++ gcc/testsuite/gcc.c-torture/execute/pr93402.c | 21 +++++++++++++++++++ 4 files changed, 42 insertions(+) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr93402.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1916dab20d1..2029c67bf02 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2020-02-13 Jakub Jelinek + + Backported from mainline + 2020-01-23 Jakub Jelinek + + PR rtl-optimization/93402 + * postreload.c (reload_combine_recognize_pattern): Don't try to adjust + USE insns. + 2020-02-11 Tamar Christina =20 Backport from mainline diff --git a/gcc/postreload.c b/gcc/postreload.c index 728aa9b0ed5..b76c7b0b758 100644 --- a/gcc/postreload.c +++ b/gcc/postreload.c @@ -1081,6 +1081,10 @@ reload_combine_recognize_pattern (rtx_insn *insn) struct reg_use *use =3D reg_state[regno].reg_use + i; if (GET_MODE (*use->usep) !=3D mode) return false; + /* Don't try to adjust (use (REGX)). */ + if (GET_CODE (PATTERN (use->insn)) =3D=3D USE + && &XEXP (PATTERN (use->insn), 0) =3D=3D use->usep) + return false; } =20 /* Look for (set (REGX) (CONST_INT)) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 1dcf894a92a..bec5eba5033 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,11 @@ +2020-02-13 Jakub Jelinek + + Backported from mainline + 2020-01-23 Jakub Jelinek + + PR rtl-optimization/93402 + * gcc.c-torture/execute/pr93402.c: New test. + 2020-02-11 Tamar Christina =20 Backport from mainline diff --git a/gcc/testsuite/gcc.c-torture/execute/pr93402.c b/gcc/testsuite/= gcc.c-torture/execute/pr93402.c new file mode 100644 index 00000000000..6487797d0aa --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr93402.c @@ -0,0 +1,21 @@ +/* PR rtl-optimization/93402 */ + +struct S { unsigned int a; unsigned long long b; }; + +__attribute__((noipa)) struct S +foo (unsigned long long x) +{ + struct S ret; + ret.a =3D 0; + ret.b =3D x * 11111111111ULL + 111111111111ULL; + return ret; +} + +int +main () +{ + struct S a =3D foo (1); + if (a.a !=3D 0 || a.b !=3D 122222222222ULL) + __builtin_abort (); + return 0; +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6273-gbff948aa337807260344c83ac9079d6386410094 Content-Transfer-Encoding: quoted-printable Content-length: 4531 =46rom 764e831291a2e510978ca7be0bffb55589a5a0b6 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Tue, 28 Jan 2020 08:46:23 +0100 Subject: [PATCH 02/15] i386: Fix ix86_fold_builtin shift folding [PR93418] The following testcase is miscompiled, because the variable shift left operand, { -1, -1, -1, -1 } is represented as a VECTOR_CST with VECTOR_CST_NPATTERNS 1 and VECTOR_CST_NELTS_PER_PATTERN 1, so when we call builder.new_unary_operation, builder.encoded_nelts () will be just 1 and thus we encode the resulting vector as if all the elements were the same. For non-masked is_vshift, we could perhaps call builder.new_binary_operation (TREE_TYPE (args[0]), args[0], args[1], false), but then there are masked shifts, for non-is_vshift we could perhaps call it too but with args[2] instead of args[1], but there is no builder.new_ternary_operation. All this stuff is primarily for aarch64 anyway, on x86 we don't have any variable length vectors, and it is not a big deal to compute all elements and just let builder.finalize () find the most efficient VECTOR_CST representation of the vector. So, instead of doing too much, this just keeps using new_unary_operation only if only one VECTOR_CST is involved (i.e. non-masked shift by constant) and for the rest just compute all elts. 2020-01-28 Jakub Jelinek PR target/93418 * config/i386/i386.c (ix86_fold_builtin) : If mask is not -1 or is_vshift is true, use new_vector with number of elts npatterns rather than new_unary_operation. * gcc.target/i386/avx2-pr93418.c: New test. --- gcc/ChangeLog | 7 +++++++ gcc/config/i386/i386.c | 9 +++++++-- gcc/testsuite/ChangeLog | 5 +++++ gcc/testsuite/gcc.target/i386/avx2-pr93418.c | 20 ++++++++++++++++++++ 4 files changed, 39 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx2-pr93418.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2029c67bf02..ca09488b59d 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,13 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-01-28 Jakub Jelinek + + PR target/93418 + * config/i386/i386.c (ix86_fold_builtin) : If mask is not + -1 or is_vshift is true, use new_vector with number of elts npatterns + rather than new_unary_operation. + 2020-01-23 Jakub Jelinek =20 PR rtl-optimization/93402 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 6ee6aea2389..779e8111379 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -33418,8 +33418,13 @@ ix86_fold_builtin (tree fndecl, int n_args, countt =3D build_int_cst (integer_type_node, count); } tree_vector_builder builder; - builder.new_unary_operation (TREE_TYPE (args[0]), args[0], - false); + if (mask !=3D HOST_WIDE_INT_M1U || is_vshift) + builder.new_vector (TREE_TYPE (args[0]), + TYPE_VECTOR_SUBPARTS (TREE_TYPE (args[0])), + 1); + else + builder.new_unary_operation (TREE_TYPE (args[0]), args[0], + false); unsigned int cnt =3D builder.encoded_nelts (); for (unsigned int i =3D 0; i < cnt; ++i) { diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index bec5eba5033..532f8dbef6c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,11 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-01-28 Jakub Jelinek + + PR target/93418 + * gcc.target/i386/avx2-pr93418.c: New test. + 2020-01-23 Jakub Jelinek =20 PR rtl-optimization/93402 diff --git a/gcc/testsuite/gcc.target/i386/avx2-pr93418.c b/gcc/testsuite/g= cc.target/i386/avx2-pr93418.c new file mode 100644 index 00000000000..67ed33ddf9d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-pr93418.c @@ -0,0 +1,20 @@ +/* PR target/93418 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx2 -fdump-tree-optimized" } */ +/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */ + +#include + +void link_error (void); + +void +foo (void) +{ + __m128i a =3D _mm_set1_epi32 (0xffffffffU); + __m128i b =3D _mm_setr_epi32 (16, 31, -34, 3); + __m128i c =3D _mm_sllv_epi32 (a, b); + __v4su d =3D (__v4su) c; + if (d[0] !=3D 0xffff0000U || d[1] !=3D 0x80000000U + || d[2] !=3D 0 || d[3] !=3D 0xfffffff8U) + link_error (); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6314-gaa1b56967d85bfc80d71341395f862ec2b30ca36 Content-Transfer-Encoding: quoted-printable Content-length: 3580 =46rom 244f4b8c2823531a1e479a3773272af539dda258 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Wed, 29 Jan 2020 09:39:16 +0100 Subject: [PATCH 03/15] openmp: Handle rest of EXEC_OACC_* in oacc_code_to_statement [PR93463] As the testcase shows, some EXEC_OACC_* codes weren't handled in oacc_code_to_statement. Fixed thusly. 2020-01-29 Jakub Jelinek PR fortran/93463 * openmp.c (oacc_code_to_statement): Handle EXEC_OACC_{ROUTINE,UPDATE,WAIT,CACHE,{ENTER,EXIT}_DATA,DECLARE}. * gfortran.dg/goacc/pr93463.f90: New test. --- gcc/fortran/ChangeLog | 9 +++++++++ gcc/fortran/openmp.c | 14 ++++++++++++++ gcc/testsuite/ChangeLog | 5 +++++ gcc/testsuite/gfortran.dg/goacc/pr93463.f90 | 15 +++++++++++++++ 4 files changed, 43 insertions(+) create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr93463.f90 diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog index 0f5b7e60a50..f31e052b10a 100644 --- a/gcc/fortran/ChangeLog +++ b/gcc/fortran/ChangeLog @@ -1,3 +1,12 @@ +2020-02-13 Jakub Jelinek + + Backported from mainline + 2020-01-29 Jakub Jelinek + + PR fortran/93463 + * openmp.c (oacc_code_to_statement): Handle + EXEC_OACC_{ROUTINE,UPDATE,WAIT,CACHE,{ENTER,EXIT}_DATA,DECLARE}. + 2020-02-03 Tobias Burnus =20 Backported from mainline diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 716dd5ec3e2..83b1c4487de 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -5858,6 +5858,20 @@ oacc_code_to_statement (gfc_code *code) return ST_OACC_LOOP; case EXEC_OACC_ATOMIC: return ST_OACC_ATOMIC; + case EXEC_OACC_ROUTINE: + return ST_OACC_ROUTINE; + case EXEC_OACC_UPDATE: + return ST_OACC_UPDATE; + case EXEC_OACC_WAIT: + return ST_OACC_WAIT; + case EXEC_OACC_CACHE: + return ST_OACC_CACHE; + case EXEC_OACC_ENTER_DATA: + return ST_OACC_ENTER_DATA; + case EXEC_OACC_EXIT_DATA: + return ST_OACC_EXIT_DATA; + case EXEC_OACC_DECLARE: + return ST_OACC_DECLARE; default: gcc_unreachable (); } diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 532f8dbef6c..b5165efbc35 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,11 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-01-29 Jakub Jelinek + + PR fortran/93463 + * gfortran.dg/goacc/pr93463.f90: New test. + 2020-01-28 Jakub Jelinek =20 PR target/93418 diff --git a/gcc/testsuite/gfortran.dg/goacc/pr93463.f90 b/gcc/testsuite/gf= ortran.dg/goacc/pr93463.f90 new file mode 100644 index 00000000000..920892fdcda --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/pr93463.f90 @@ -0,0 +1,15 @@ +! PR fortran/93463 +! { dg-do compile { target fopenmp } } +! { dg-additional-options "-fopenmp" } + +program pr93463 + integer :: i, x, y, z + !$omp parallel do + do i =3D 1, 4 + !$acc enter data create(x) ! { dg-error "ACC ENTER DATA directive ca= nnot be specified within" } + !$acc exit data copyout(x) ! { dg-error "ACC EXIT DATA directive can= not be specified within" } + !$acc cache(y) ! { dg-error "ACC CACHE directive cannot be specifi= ed within" } + !$acc wait(1) ! { dg-error "ACC WAIT directive cannot be specified= within" } + !$acc update self(z) ! { dg-error "ACC UPDATE directive cannot be s= pecified within" } + end do +end --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6315-g8d7c0bf876fa784101f9ad9e3bba82cc065357da Content-Transfer-Encoding: quoted-printable Content-length: 3508 =46rom 4b124e3c9c35121969cc23d0aea4bcb2c406fd21 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Wed, 29 Jan 2020 09:41:42 +0100 Subject: [PATCH 04/15] openmp: c++: Consider typeinfo decls to be predetermined shared [PR91118] If the typeinfo decls appear in OpenMP default(none) regions, as we no long= er predetermine const with no mutable members, they are diagnosed as errors, but it isn't something the users can actually provide explicit sharing for = in the clauses. 2020-01-29 Jakub Jelinek PR c++/91118 * cp-gimplify.c (cxx_omp_predetermined_sharing): Return OMP_CLAUSE_DEFAULT_SHARED for typeinfo decls. * g++.dg/gomp/pr91118-1.C: New test. * g++.dg/gomp/pr91118-2.C: New test. --- gcc/cp/ChangeLog | 9 +++++++++ gcc/cp/cp-gimplify.c | 4 ++++ gcc/testsuite/ChangeLog | 4 ++++ gcc/testsuite/g++.dg/gomp/pr91118-1.C | 12 ++++++++++++ gcc/testsuite/g++.dg/gomp/pr91118-2.C | 14 ++++++++++++++ 5 files changed, 43 insertions(+) create mode 100644 gcc/testsuite/g++.dg/gomp/pr91118-1.C create mode 100644 gcc/testsuite/g++.dg/gomp/pr91118-2.C diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog index 64ca338029b..31dee033f6e 100644 --- a/gcc/cp/ChangeLog +++ b/gcc/cp/ChangeLog @@ -1,3 +1,12 @@ +2020-02-13 Jakub Jelinek + + Backported from mainline + 2020-01-29 Jakub Jelinek + + PR c++/91118 + * cp-gimplify.c (cxx_omp_predetermined_sharing): Return + OMP_CLAUSE_DEFAULT_SHARED for typeinfo decls. + 2020-01-28 Jason Merrill =20 PR c++/90546 diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index a7121b70a3b..90a315003d4 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -2107,6 +2107,10 @@ cxx_omp_predetermined_sharing (tree decl) && DECL_OMP_PRIVATIZED_MEMBER (decl))) return OMP_CLAUSE_DEFAULT_SHARED; =20 + /* Similarly for typeinfo symbols. */ + if (VAR_P (decl) && DECL_ARTIFICIAL (decl) && DECL_TINFO_P (decl)) + return OMP_CLAUSE_DEFAULT_SHARED; + return OMP_CLAUSE_DEFAULT_UNSPECIFIED; } =20 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index b5165efbc35..62df3f97f49 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -3,6 +3,10 @@ Backported from mainline 2020-01-29 Jakub Jelinek =20 + PR c++/91118 + * g++.dg/gomp/pr91118-1.C: New test. + * g++.dg/gomp/pr91118-2.C: New test. + PR fortran/93463 * gfortran.dg/goacc/pr93463.f90: New test. =20 diff --git a/gcc/testsuite/g++.dg/gomp/pr91118-1.C b/gcc/testsuite/g++.dg/g= omp/pr91118-1.C new file mode 100644 index 00000000000..f29d69db084 --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/pr91118-1.C @@ -0,0 +1,12 @@ +// PR c++/91118 +// { dg-do compile } +// { dg-additional-options "-fsanitize=3Dundefined" } + +#include + +void +foo () +{ +#pragma omp parallel default(none) shared(std::cerr) + std::cerr << "hello" << std::endl; +} diff --git a/gcc/testsuite/g++.dg/gomp/pr91118-2.C b/gcc/testsuite/g++.dg/g= omp/pr91118-2.C new file mode 100644 index 00000000000..80f1e3e45c4 --- /dev/null +++ b/gcc/testsuite/g++.dg/gomp/pr91118-2.C @@ -0,0 +1,14 @@ +// PR c++/91118 +// { dg-do compile } + +#include + +struct S { virtual ~S (); }; +void bar (const std::type_info &, const std::type_info &); + +void +foo (S *p) +{ + #pragma omp parallel default (none) firstprivate (p) + bar (typeid (*p), typeid (S)); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6358-g56b92750f83724177d2c6eae30c208e935a56a37 Content-Transfer-Encoding: quoted-printable Content-length: 3570 =46rom 329475795c6eeaa2b122672091c9119b9d6c5564 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 30 Jan 2020 21:28:17 +0100 Subject: [PATCH 05/15] combine: Punt on out of range rotate counts [PR93505] What happens on this testcase is with the out of bounds rotate we get: Trying 13 -> 16: 13: r129:SI=3Dr132:DI#0<-<0x20 REG_DEAD r132:DI 16: r123:DI=3Dr129:SI<0 REG_DEAD r129:SI Successfully matched this instruction: (set (reg/v:DI 123 [ ]) (const_int 0 [0])) during combine. So, perhaps we could also change simplify-rtx.c to punt if it is out of bounds rather than trying to optimize anything. Or, but probably GCC11 material, if we decide that ROTATE/ROTATERT doesn't have out of bounds counts or introduce targetm.rotate_truncation_mask, we should truncate the argument instead of punting. Punting is better for backports though. 2020-01-30 Jakub Jelinek PR middle-end/93505 * combine.c (simplify_comparison) : Punt on out of range rotate counts. * gcc.c-torture/compile/pr93505.c: New test. --- gcc/ChangeLog | 6 ++++++ gcc/combine.c | 3 ++- gcc/testsuite/ChangeLog | 5 +++++ gcc/testsuite/gcc.c-torture/compile/pr93505.c | 15 +++++++++++++++ 4 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr93505.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ca09488b59d..0d8e888ec3a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,12 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-01-30 Jakub Jelinek + + PR middle-end/93505 + * combine.c (simplify_comparison) : Punt on out of range + rotate counts. + 2020-01-28 Jakub Jelinek =20 PR target/93418 diff --git a/gcc/combine.c b/gcc/combine.c index 4de759a8e6b..601943d6bb0 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -12424,7 +12424,8 @@ simplify_comparison (enum rtx_code code, rtx *pop0,= rtx *pop1) bit. This will be converted into a ZERO_EXTRACT. */ if (const_op =3D=3D 0 && sign_bit_comparison_p && CONST_INT_P (XEXP (op0, 1)) - && mode_width <=3D HOST_BITS_PER_WIDE_INT) + && mode_width <=3D HOST_BITS_PER_WIDE_INT + && UINTVAL (XEXP (op0, 1)) < mode_width) { op0 =3D simplify_and_const_int (NULL_RTX, mode, XEXP (op0, 0), (HOST_WIDE_INT_1U diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 62df3f97f49..384f34a41ca 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,11 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-01-30 Jakub Jelinek + + PR middle-end/93505 + * gcc.c-torture/compile/pr93505.c: New test. + 2020-01-29 Jakub Jelinek =20 PR c++/91118 diff --git a/gcc/testsuite/gcc.c-torture/compile/pr93505.c b/gcc/testsuite/= gcc.c-torture/compile/pr93505.c new file mode 100644 index 00000000000..0627962eae5 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr93505.c @@ -0,0 +1,15 @@ +/* PR middle-end/93505 */ + +unsigned a; + +unsigned +foo (unsigned x) +{ + unsigned int y =3D 32 - __builtin_bswap64 (-a); + /* This would be UB (x << 32) at runtime. Ensure we don't + invoke UB in the compiler because of that (visible with + bootstrap-ubsan). */ + x =3D x << y | x >> (-y & 31); + x >>=3D 31; + return x; +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6444-gb843bcb89519293404bb00d2ed09aae529b54d7f Content-Transfer-Encoding: quoted-printable Content-length: 4811 =46rom d42f9eaa3e189d4228a4b3a63d02b83fed6385e7 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Wed, 5 Feb 2020 11:32:37 +0100 Subject: [PATCH 06/15] openmp: Avoid ICEs with declare simd; declare simd inbranch [PR93555] The testcases ICE because when processing the declare simd inbranch, we don't create the i =3D=3D 0 clone as it already exists, which means clone_info->nargs is not adjusted, but we then rely on it being adjusted when trying other clones. 2020-02-05 Jakub Jelinek PR middle-end/93555 * omp-simd-clone.c (expand_simd_clones): If simd_clone_mangle or simd_clone_create failed when i =3D=3D 0, adjust clone->nargs by clone->inbranch. * c-c++-common/gomp/pr93555-1.c: New test. * c-c++-common/gomp/pr93555-2.c: New test. * gfortran.dg/gomp/pr93555.f90: New test. --- gcc/ChangeLog | 7 +++++++ gcc/omp-simd-clone.c | 12 ++++++++++-- gcc/testsuite/ChangeLog | 7 +++++++ gcc/testsuite/c-c++-common/gomp/pr93555-1.c | 18 ++++++++++++++++++ gcc/testsuite/c-c++-common/gomp/pr93555-2.c | 16 ++++++++++++++++ gcc/testsuite/gfortran.dg/gomp/pr93555.f90 | 11 +++++++++++ 6 files changed, 69 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/gomp/pr93555-1.c create mode 100644 gcc/testsuite/c-c++-common/gomp/pr93555-2.c create mode 100644 gcc/testsuite/gfortran.dg/gomp/pr93555.f90 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 0d8e888ec3a..a4435086f02 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,13 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-05 Jakub Jelinek + + PR middle-end/93555 + * omp-simd-clone.c (expand_simd_clones): If simd_clone_mangle or + simd_clone_create failed when i =3D=3D 0, adjust clone->nargs by + clone->inbranch. + 2020-01-30 Jakub Jelinek =20 PR middle-end/93505 diff --git a/gcc/omp-simd-clone.c b/gcc/omp-simd-clone.c index 845443efc59..e865828f569 100644 --- a/gcc/omp-simd-clone.c +++ b/gcc/omp-simd-clone.c @@ -1703,14 +1703,22 @@ expand_simd_clones (struct cgraph_node *node) already. */ tree id =3D simd_clone_mangle (node, clone); if (id =3D=3D NULL_TREE) - continue; + { + if (i =3D=3D 0) + clone->nargs +=3D clone->inbranch; + continue; + } =20 /* Only when we are sure we want to create the clone actually clone the function (or definitions) or create another extern FUNCTION_DECL (for prototypes without definitions). */ struct cgraph_node *n =3D simd_clone_create (node); if (n =3D=3D NULL) - continue; + { + if (i =3D=3D 0) + clone->nargs +=3D clone->inbranch; + continue; + } =20 n->simdclone =3D clone; clone->origin =3D node; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 384f34a41ca..96390af9e12 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,13 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-05 Jakub Jelinek + + PR middle-end/93555 + * c-c++-common/gomp/pr93555-1.c: New test. + * c-c++-common/gomp/pr93555-2.c: New test. + * gfortran.dg/gomp/pr93555.f90: New test. + 2020-01-30 Jakub Jelinek =20 PR middle-end/93505 diff --git a/gcc/testsuite/c-c++-common/gomp/pr93555-1.c b/gcc/testsuite/c-= c++-common/gomp/pr93555-1.c new file mode 100644 index 00000000000..2eb76a2d9de --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/pr93555-1.c @@ -0,0 +1,18 @@ +/* PR middle-end/93555 */ +/* { dg-do compile } */ + +#pragma omp declare simd +#pragma omp declare simd inbranch +int +foo (int x) +{ + return x; +} + +#pragma omp declare simd inbranch +#pragma omp declare simd +int +bar (int x) +{ + return x; +} diff --git a/gcc/testsuite/c-c++-common/gomp/pr93555-2.c b/gcc/testsuite/c-= c++-common/gomp/pr93555-2.c new file mode 100644 index 00000000000..091f5bd5ff1 --- /dev/null +++ b/gcc/testsuite/c-c++-common/gomp/pr93555-2.c @@ -0,0 +1,16 @@ +/* PR middle-end/93555 */ +/* { dg-do compile } */ + +#pragma omp declare simd +#pragma omp declare simd inbranch +void +foo (void) +{ +} + +#pragma omp declare simd inbranch +#pragma omp declare simd +void +bar (void) +{ +} diff --git a/gcc/testsuite/gfortran.dg/gomp/pr93555.f90 b/gcc/testsuite/gfo= rtran.dg/gomp/pr93555.f90 new file mode 100644 index 00000000000..4a97fee07a7 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/gomp/pr93555.f90 @@ -0,0 +1,11 @@ +! PR middle-end/93555 +! { dg-do compile } + +subroutine foo + !$omp declare simd(foo) + !$omp declare simd(foo) inbranch +end +subroutine bar + !$omp declare simd(bar) inbranch + !$omp declare simd(bar) +end --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6460-g5a8ad97b6e4823d4ded00a3ce8d80e4bf93368d4 Content-Transfer-Encoding: quoted-printable Content-length: 3694 =46rom 520b364da0b20dcb492229757190cc3f30322052 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Wed, 5 Feb 2020 23:35:08 +0100 Subject: [PATCH 07/15] c++: Mark __builtin_convertvector operand as read [PR93557] In C++ we weren't calling mark_exp_read on the __builtin_convertvector first argument. I guess it could misbehave even with lambda implicit captures. Fixed by calling decay_conversion on the argument, we use the argument as rvalue so we want the standard lvalue to rvalue conversions, but as the argument must be a vector type, e.g. integral promotions aren't really needed. 2020-02-05 Jakub Jelinek PR c++/93557 * semantics.c (cp_build_vec_convert): Call decay_conversion on arg prior to passing it to c_build_vec_convert. * c-c++-common/Wunused-var-17.c: New test. --- gcc/cp/ChangeLog | 6 ++++++ gcc/cp/semantics.c | 3 ++- gcc/testsuite/ChangeLog | 3 +++ gcc/testsuite/c-c++-common/Wunused-var-17.c | 19 +++++++++++++++++++ 4 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/c-c++-common/Wunused-var-17.c diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog index 31dee033f6e..d9bb3b5d75b 100644 --- a/gcc/cp/ChangeLog +++ b/gcc/cp/ChangeLog @@ -1,6 +1,12 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-05 Jakub Jelinek + + PR c++/93557 + * semantics.c (cp_build_vec_convert): Call decay_conversion on arg + prior to passing it to c_build_vec_convert. + 2020-01-29 Jakub Jelinek =20 PR c++/91118 diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 1bd014696fc..0c727eaf2e7 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -10036,7 +10036,8 @@ cp_build_vec_convert (tree arg, location_t loc, tre= e type, =20 tree ret =3D NULL_TREE; if (!type_dependent_expression_p (arg) && !dependent_type_p (type)) - ret =3D c_build_vec_convert (cp_expr_loc_or_loc (arg, input_location),= arg, + ret =3D c_build_vec_convert (cp_expr_loc_or_loc (arg, input_location), + decay_conversion (arg, complain), loc, type, (complain & tf_error) !=3D 0); =20 if (!processing_template_decl) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 96390af9e12..2cfc06f5605 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -3,6 +3,9 @@ Backported from mainline 2020-02-05 Jakub Jelinek =20 + PR c++/93557 + * c-c++-common/Wunused-var-17.c: New test. + PR middle-end/93555 * c-c++-common/gomp/pr93555-1.c: New test. * c-c++-common/gomp/pr93555-2.c: New test. diff --git a/gcc/testsuite/c-c++-common/Wunused-var-17.c b/gcc/testsuite/c-= c++-common/Wunused-var-17.c new file mode 100644 index 00000000000..ab995f8b674 --- /dev/null +++ b/gcc/testsuite/c-c++-common/Wunused-var-17.c @@ -0,0 +1,19 @@ +/* PR c++/93557 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -Wunused-but-set-variable" } */ + +typedef int VI __attribute__((vector_size (sizeof (int) * 4))); +typedef float VF __attribute__((vector_size (sizeof (float) * 4))); + +void +foo (VI *p, VF *q) +{ + VI a =3D (VI) { 1, 2, 3, 4 }; /* { dg-bogus "set but not used" } */ + q[0] =3D __builtin_convertvector (a, VF); + VI b =3D p[1]; /* { dg-bogus "set but not used" } */ + q[1] =3D __builtin_convertvector (b, VF); + VF c =3D (VF) { 5.0f, 6.0f, 7.0f, 8.0f }; /* { dg-bogus "set but not use= d" } */ + p[2] =3D __builtin_convertvector (c, VI); + VF d =3D q[3]; /* { dg-bogus "set but not used" } */ + p[3] =3D __builtin_convertvector (d, VI); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6470-gcf785618ecc90e3f063b99572de48cb91aa5ab5d Content-Transfer-Encoding: quoted-printable Content-length: 2093 =46rom d3266b1311723841ec553277f1fb6bfddef8809d Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 6 Feb 2020 09:15:13 +0100 Subject: [PATCH 08/15] openmp: Notice reduction decl in outer contexts after adding it to shared [PR93515] If we call omp_add_variable, following omp_notice_variable will already fin= d it on that construct and not go through outer constructs, the following patch = fixes that. Note, this still doesn't follow OpenMP 5.0 semantics on target combined wit= h other constructs with reduction/lastprivate/linear clauses, will handle that for = GCC11. 2020-02-06 Jakub Jelinek PR libgomp/93515 * gimplify.c (gimplify_scan_omp_clauses) : If adding shared clause, call omp_notice_variable on outer context if any. --- gcc/ChangeLog | 6 ++++++ gcc/gimplify.c | 10 +++++++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a4435086f02..bc199e5956f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,12 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-06 Jakub Jelinek + + PR libgomp/93515 + * gimplify.c (gimplify_scan_omp_clauses) : If adding + shared clause, call omp_notice_variable on outer context if any. + 2020-02-05 Jakub Jelinek =20 PR middle-end/93555 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index a0cb6c402bc..c57113cda1d 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -9081,9 +9081,13 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq = *pre_p, =3D=3D POINTER_TYPE)))) omp_firstprivatize_variable (outer_ctx, decl); else - omp_add_variable (outer_ctx, decl, - GOVD_SEEN | GOVD_SHARED); - omp_notice_variable (outer_ctx, decl, true); + { + omp_add_variable (outer_ctx, decl, + GOVD_SEEN | GOVD_SHARED); + if (outer_ctx->outer_context) + omp_notice_variable (outer_ctx->outer_context, decl, + true); + } } } if (outer_ctx) --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6471-gcb3f06480a17f98579704b9927632627a3814c5c Content-Transfer-Encoding: quoted-printable Content-length: 5167 =46rom 05fa0de35ec63db2c3aacd30cc34a7389b3c4e5d Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 6 Feb 2020 09:19:08 +0100 Subject: [PATCH 09/15] openmp: Fix handling of non-addressable shared scala= rs in parallel nested inside of target [PR93515] As the following testcase shows, we need to consider even target to be a co= nstruct that forces not to use copy in/out for shared on parallel inside of the tar= get. E.g. for parallel nested inside another parallel or host teams, we already = avoid copy in/out and we need to treat target the same. 2020-02-06 Jakub Jelinek PR libgomp/93515 * omp-low.c (use_pointer_for_field): For nested constructs, also look for map clauses on target construct. (scan_omp_1_stmt) : Bump temporarily taskreg_nesting_level. * testsuite/libgomp.c-c++-common/pr93515.c: New test. --- gcc/ChangeLog | 6 ++++ gcc/omp-low.c | 33 +++++++++++++---- libgomp/ChangeLog | 8 +++++ .../testsuite/libgomp.c-c++-common/pr93515.c | 36 +++++++++++++++++++ 4 files changed, 76 insertions(+), 7 deletions(-) create mode 100644 libgomp/testsuite/libgomp.c-c++-common/pr93515.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index bc199e5956f..e9ce10c2870 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -3,6 +3,12 @@ Backported from mainline 2020-02-06 Jakub Jelinek =20 + PR libgomp/93515 + * omp-low.c (use_pointer_for_field): For nested constructs, also + look for map clauses on target construct. + (scan_omp_1_stmt) : Bump temporarily + taskreg_nesting_level. + PR libgomp/93515 * gimplify.c (gimplify_scan_omp_clauses) : If adding shared clause, call omp_notice_variable on outer context if any. diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 81280296a24..813cefd69b9 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -444,18 +444,30 @@ use_pointer_for_field (tree decl, omp_context *shared= _ctx) omp_context *up; =20 for (up =3D shared_ctx->outer; up; up =3D up->outer) - if (is_taskreg_ctx (up) && maybe_lookup_decl (decl, up)) + if ((is_taskreg_ctx (up) + || (gimple_code (up->stmt) =3D=3D GIMPLE_OMP_TARGET + && is_gimple_omp_offloaded (up->stmt))) + && maybe_lookup_decl (decl, up)) break; =20 if (up) { tree c; =20 - for (c =3D gimple_omp_taskreg_clauses (up->stmt); - c; c =3D OMP_CLAUSE_CHAIN (c)) - if (OMP_CLAUSE_CODE (c) =3D=3D OMP_CLAUSE_SHARED - && OMP_CLAUSE_DECL (c) =3D=3D decl) - break; + if (gimple_code (up->stmt) =3D=3D GIMPLE_OMP_TARGET) + { + for (c =3D gimple_omp_target_clauses (up->stmt); + c; c =3D OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) =3D=3D OMP_CLAUSE_MAP + && OMP_CLAUSE_DECL (c) =3D=3D decl) + break; + } + else + for (c =3D gimple_omp_taskreg_clauses (up->stmt); + c; c =3D OMP_CLAUSE_CHAIN (c)) + if (OMP_CLAUSE_CODE (c) =3D=3D OMP_CLAUSE_SHARED + && OMP_CLAUSE_DECL (c) =3D=3D decl) + break; =20 if (c) goto maybe_mark_addressable_and_ret; @@ -3348,7 +3360,14 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *ha= ndled_ops_p, break; =20 case GIMPLE_OMP_TARGET: - scan_omp_target (as_a (stmt), ctx); + if (is_gimple_omp_offloaded (stmt)) + { + taskreg_nesting_level++; + scan_omp_target (as_a (stmt), ctx); + taskreg_nesting_level--; + } + else + scan_omp_target (as_a (stmt), ctx); break; =20 case GIMPLE_OMP_TEAMS: diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog index 86af79021ed..b90fddf3ee2 100644 --- a/libgomp/ChangeLog +++ b/libgomp/ChangeLog @@ -1,3 +1,11 @@ +2020-02-13 Jakub Jelinek + + Backported from mainline + 2020-02-06 Jakub Jelinek + + PR libgomp/93515 + * testsuite/libgomp.c-c++-common/pr93515.c: New test. + 2020-01-22 Jakub Jelinek =20 Backported from mainline diff --git a/libgomp/testsuite/libgomp.c-c++-common/pr93515.c b/libgomp/tes= tsuite/libgomp.c-c++-common/pr93515.c new file mode 100644 index 00000000000..8a69088ccec --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/pr93515.c @@ -0,0 +1,36 @@ +/* PR libgomp/93515 */ + +#include +#include + +int +main () +{ + int i; + int a =3D 42; +#pragma omp target teams distribute parallel for defaultmap(tofrom: scalar) + for (i =3D 0; i < 64; ++i) + if (omp_get_team_num () =3D=3D 0) + if (omp_get_thread_num () =3D=3D 0) + a =3D 142; + if (a !=3D 142) + __builtin_abort (); + a =3D 42; +#pragma omp target parallel for defaultmap(tofrom: scalar) + for (i =3D 0; i < 64; ++i) + if (omp_get_thread_num () =3D=3D 0) + a =3D 143; + if (a !=3D 143) + __builtin_abort (); + a =3D 42; +#pragma omp target firstprivate(a) + { + #pragma omp parallel for + for (i =3D 0; i < 64; ++i) + if (omp_get_thread_num () =3D=3D 0) + a =3D 144; + if (a !=3D 144) + abort (); + } + return 0; +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=r10-6522-g79ab8c4321b2dc940bb706a7432a530e26f0df1a Content-Transfer-Encoding: quoted-printable Content-length: 5678 =46rom a91e5d88970c8d865a49f2a4ed4e17ee2c58b73f Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Sat, 8 Feb 2020 10:59:40 +0100 Subject: [PATCH 10/15] i386: Make xmm16-xmm31 call used even in ms ABI [PR65782] MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit On Tue, Feb 04, 2020 at 11:16:06AM +0100, Uros Bizjak wrote: > I guess that Comment #9 patch form the PR should be trivially correct, > but althouhg it looks obvious, I don't want to propose the patch since > I have no means of testing it. I don't have means of testing it either. https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=3Dvs= -2019 is quite explicit that [xyz]mm16-31 are call clobbered and only xmm6-15 (low 128-bits only) are call preserved. We are talking e.g. about /* { dg-options "-O2 -mabi=3Dms -mavx512vl" } */ typedef double V __attribute__((vector_size (16))); void foo (void); V bar (void); void baz (V); void qux (void) { V c; { register V a __asm ("xmm18"); V b =3D bar (); asm ("" : "=3Dx" (a) : "0" (b)); c =3D a; } foo (); { register V d __asm ("xmm18"); V e; d =3D c; asm ("" : "=3Dx" (e) : "0" (d)); baz (e); } } where according to the MSDN doc gcc incorrectly holds the c value in xmm18 register across the foo call; if foo is compiled by some Microsoft compiler (or LLVM), then it could clobber %xmm18. If all xmm18 occurrences are changed to say xmm15, then it is valid to hold the 128-bit value across the foo call (though, surprisingly, LLVM saves it into stack anyway). The other parts are I guess mainly about SEH. Consider e.g. void foo (void) { register double x __asm ("xmm14"); register double y __asm ("xmm18"); asm ("" : "=3Dx" (x)); asm ("" : "=3Dv" (y)); x +=3D y; y +=3D x; asm ("" : : "x" (x)); asm ("" : : "v" (y)); } looking at cross-compiler output, with -O2 -mavx512f this emits .file "abcdeq.c" .text .align 16 .globl foo .def foo; .scl 2; .type 32; .endef .seh_proc foo foo: subq $40, %rsp .seh_stackalloc 40 vmovaps %xmm14, (%rsp) .seh_savexmm %xmm14, 0 vmovaps %xmm18, 16(%rsp) .seh_savexmm %xmm18, 16 .seh_endprologue vaddsd %xmm18, %xmm14, %xmm14 vaddsd %xmm18, %xmm14, %xmm18 vmovaps (%rsp), %xmm14 vmovaps 16(%rsp), %xmm18 addq $40, %rsp ret .seh_endproc .ident "GCC: (GNU) 10.0.1 20200207 (experimental)" Does whatever assembler mingw64 uses even assemble this (I mean the .seh_savexmm %xmm16, 16 could be problematic)? I can find e.g. https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savex= mm-in-cygwin/43210527 which then links to https://gcc.gnu.org/PR65782 2020-02-08 Uro=C5=A1 Bizjak Jakub Jelinek PR target/65782 * config/i386/i386.h (CALL_USED_REGISTERS): Make xmm16-xmm31 call-used even in 64-bit ms-abi. * gcc.target/i386/pr65782.c: New test. Co-authored-by: Uro=C5=A1 Bizjak --- gcc/ChangeLog | 7 +++++++ gcc/config/i386/i386.h | 4 ++-- gcc/testsuite/ChangeLog | 6 ++++++ gcc/testsuite/gcc.target/i386/pr65782.c | 16 ++++++++++++++++ 4 files changed, 31 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr65782.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e9ce10c2870..72c8ee6bd67 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,13 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-08 Uro=C5=A1 Bizjak + Jakub Jelinek + + PR target/65782 + * config/i386/i386.h (CALL_USED_REGISTERS): Make + xmm16-xmm31 call-used even in 64-bit ms-abi. + 2020-02-06 Jakub Jelinek =20 PR libgomp/93515 diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 95e1733f12a..14e5a392f62 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -1082,9 +1082,9 @@ extern const char *host_detect_local_cpu (int argc, c= onst char **argv); /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/ \ 6, 6, 6, 6, 6, 6, 6, 6, \ /*xmm16,xmm17,xmm18,xmm19,xmm20,xmm21,xmm22,xmm23*/ \ - 6, 6, 6, 6, 6, 6, 6, 6, \ + 1, 1, 1, 1, 1, 1, 1, 1, \ /*xmm24,xmm25,xmm26,xmm27,xmm28,xmm29,xmm30,xmm31*/ \ - 6, 6, 6, 6, 6, 6, 6, 6, \ + 1, 1, 1, 1, 1, 1, 1, 1, \ /* k0, k1, k2, k3, k4, k5, k6, k7*/ \ 1, 1, 1, 1, 1, 1, 1, 1 } =20 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2cfc06f5605..c7b8e6a585a 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,12 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-08 Uro=C5=A1 Bizjak + Jakub Jelinek + + PR target/65782 + * gcc.target/i386/pr65782.c: New test. + 2020-02-05 Jakub Jelinek =20 PR c++/93557 diff --git a/gcc/testsuite/gcc.target/i386/pr65782.c b/gcc/testsuite/gcc.ta= rget/i386/pr65782.c new file mode 100644 index 00000000000..298dca1be97 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr65782.c @@ -0,0 +1,16 @@ +/* PR target/65782 */ +/* { dg-do assemble { target { avx512vl && { ! ia32 } } } } */ +/* { dg-options "-O2 -mavx512vl" } */ + +void +foo (void) +{ + register double x __asm ("xmm14"); + register double y __asm ("xmm18"); + asm ("" : "=3Dx" (x)); + asm ("" : "=3Dv" (y)); + x +=3D y; + y +=3D x; + asm ("" : : "x" (x)); + asm ("" : : "v" (y)); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=r10-6565-gf57aa9503ff170ff6c8549718bd736f6c8168bab Content-Transfer-Encoding: quoted-printable Content-length: 4690 =46rom b7cbce7a174292adc7c9d6db81bba6922a591d69 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Mon, 10 Feb 2020 22:44:40 +0100 Subject: [PATCH 11/15] i386: Fix -mavx -mno-mavx2 ICE with VEC_COND_EXPR [PR93637] As mentioned in the PR, for -mavx -mno-avx2 the backend does support vcondv4div4df and vcondv8siv8sf optabs (while generally 32-byte vectors aren't much supported in that case, it is performed using vandps/vandnps/vorps). The problem is that after the last generic vector lowering (where the VEC_COND_EXPR still compares two V4DF vectors and has two V4DI last operands and V4DI result and so is considered ok) fre4 folds the condition into constant, at which point the middle-end during expansion will try vcond_mask_optab and fall back to trying to expand it as the constant vector < 0 vcondv4div4di, but neither of them is supported for -mavx -mno-avx2 and thus we ICE. So, the options I see is either what the following patch does, also support vcond_mask_v4div4di and vcond_mask_v4siv4si already for TARGET_AVX, or require for vcondv4div4df and vcondv8siv8sf TARGET_AVX2 rather than current TARGET_AVX. 2020-02-10 Jakub Jelinek PR target/93637 * config/i386/sse.md (VI_256_AVX2): New mode iterator. (vcond_mask_): Use it instead of VI_256. Change condition from TARGET_AVX2 to TARGET_AVX. * gcc.target/i386/avx-pr93637.c: New test. --- gcc/ChangeLog | 7 +++++++ gcc/config/i386/sse.md | 16 +++++++++++----- gcc/testsuite/ChangeLog | 5 +++++ gcc/testsuite/gcc.target/i386/avx-pr93637.c | 17 +++++++++++++++++ 4 files changed, 40 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx-pr93637.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 72c8ee6bd67..34091354380 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,13 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-10 Jakub Jelinek + + PR target/93637 + * config/i386/sse.md (VI_256_AVX2): New mode iterator. + (vcond_mask_): Use it instead of VI_256. + Change condition from TARGET_AVX2 to TARGET_AVX. + 2020-02-08 Uro=C5=A1 Bizjak Jakub Jelinek =20 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 659cbff22f3..c0fe0eefd50 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -3180,13 +3180,19 @@ (match_operand: 3 "register_operand")))] "TARGET_AVX512BW") =20 +;; As vcondv4div4df and vcondv8siv8sf are enabled already with TARGET_AVX, +;; and their condition can be folded late into a constant, we need to +;; support vcond_mask_v4div4di and vcond_mask_v8siv8si for TARGET_AVX. +(define_mode_iterator VI_256_AVX2 [(V32QI "TARGET_AVX2") (V16HI "TARGET_AV= X2") + V8SI V4DI]) + (define_expand "vcond_mask_" - [(set (match_operand:VI_256 0 "register_operand") - (vec_merge:VI_256 - (match_operand:VI_256 1 "nonimmediate_operand") - (match_operand:VI_256 2 "nonimm_or_0_operand") + [(set (match_operand:VI_256_AVX2 0 "register_operand") + (vec_merge:VI_256_AVX2 + (match_operand:VI_256_AVX2 1 "nonimmediate_operand") + (match_operand:VI_256_AVX2 2 "nonimm_or_0_operand") (match_operand: 3 "register_operand")))] - "TARGET_AVX2" + "TARGET_AVX" { ix86_expand_sse_movcc (operands[0], operands[3], operands[1], operands[2]); diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index c7b8e6a585a..9ec4d50dac3 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,11 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-10 Jakub Jelinek + + PR target/93637 + * gcc.target/i386/avx-pr93637.c: New test. + 2020-02-08 Uro=C5=A1 Bizjak Jakub Jelinek =20 diff --git a/gcc/testsuite/gcc.target/i386/avx-pr93637.c b/gcc/testsuite/gc= c.target/i386/avx-pr93637.c new file mode 100644 index 00000000000..9e7a0a7c9c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-pr93637.c @@ -0,0 +1,17 @@ +/* PR target/93637 */ +/* { dg-do compile } */ +/* { dg-options "-mavx -mno-avx2 -O3 --param sccvn-max-alias-queries-per-a= ccess=3D3" } */ + +double +foo (void) +{ + int i; + double r =3D 7.0; + double a[] =3D { 0.0, 0.0, -0.0, 0.0, 0.0, -0.0, 1.0, 0.0, 0.0, -0.0, 1.= 0, 0.0, 1.0, 1.0 }; + + for (i =3D 0; i < sizeof (a) / sizeof (a[0]); ++i) + if (a[i] =3D=3D 0.0) + r =3D a[i]; + + return r; +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6593-g62fc0a6ce28c502fc6a7b7c09157840bf98f945f Content-Transfer-Encoding: quoted-printable Content-length: 8120 =46rom 20ac13c895c5abe7a350de0b664abf190aa28a16 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Wed, 12 Feb 2020 11:58:35 +0100 Subject: [PATCH 12/15] i386: Fix up vec_extract_lo* patterns [PR93670] The VEXTRACT* insns have way too many different CPUID feature flags (ATT syntax) vextractf128 $imm, %ymm, %xmm/mem AVX vextracti128 $imm, %ymm, %xmm/mem AVX2 vextract{f,i}32x4 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512F vextract{f,i}32x4 $imm, %zmm, %xmm/mem {k}{z} AVX512F vextract{f,i}64x2 $imm, %ymm, %xmm/mem {k}{z} AVX512VL+AVX512DQ vextract{f,i}64x2 $imm, %zmm, %xmm/mem {k}{z} AVX512DQ vextract{f,i}32x8 $imm, %zmm, %ymm/mem {k}{z} AVX512DQ vextract{f,i}64x4 $imm, %zmm, %ymm/mem {k}{z} AVX512F As the testcase shows and the patch too, we didn't get it right in all cases. The first hunk is about avx512vl_vextractf128v8s[if] incorrectly requiring TARGET_AVX512DQ. The corresponding insn is the first vextract{f,i}32x4 above, so it requires VL+F, and the builtins have it correct (TARGET_AVX512VL implies TARGET_AVX512F): BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8sf, "__= builtin_ia32_extractf32x4_256_mask", IX86_BUILTIN_EXTRACTF32X4_256, UNKNOWN= , (int) V4SF_FTYPE_V8SF_INT_V4SF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_avx512vl_vextractf128v8si, "__= builtin_ia32_extracti32x4_256_mask", IX86_BUILTIN_EXTRACTI32X4_256, UNKNOWN= , (int) V4SI_FTYPE_V8SI_INT_V4SI_UQI) We only need TARGET_AVX512DQ for avx512vl_vextractf128v4d[if]. The second hunk is about vec_extract_lo_v16s[if]{,_mask}. These are using the vextract{f,i}32x8 insns (AVX512DQ above), but we weren't requiring that, but instead incorrectly && 1 for non-masked and && (64 =3D=3D 64 && TARGET_= AVX512VL) for masked insns. This is extraction from ZMM, so it doesn't need VL for anything. The hunk actually only requires TARGET_AVX512DQ when the insn is masked, if it is not masked, when TARGET_AVX512DQ isn't available we can use vextract{f,i}64x4 instead which is available already in TARGET_AVX512F and does the same thing, extracts the low 256 bits from 512 bits vector (often we split it into just nothing, but there are some special cases like when using xmm16+ when we can't without AVX512VL). The last hunk is about vec_extract_lo_v8s[if]{,_mask}. The non-_mask suffixed ones are ok already and just split into nothing (lowpart subreg). The masked ones were incorrectly requiring TARGET_AVX512VL and TARGET_AVX512DQ, when we only need TARGET_AVX512VL. 2020-02-12 Jakub Jelinek PR target/93670 * config/i386/sse.md (VI48F_256_DQ): New mode iterator. (avx512vl_vextractf128): Use it instead of VI48F_256. Remove TARGET_AVX512DQ from condition. (vec_extract_lo_): Use instead of in condition. If TARGET_AVX512DQ is false, emit vextract*64x4 instead of vextract*32x8. (vec_extract_lo_): Drop from condition. * gcc.target/i386/avx512vl-pr93670.c: New test. --- gcc/ChangeLog | 13 ++++ gcc/config/i386/sse.md | 18 +++-- gcc/testsuite/ChangeLog | 5 ++ .../gcc.target/i386/avx512vl-pr93670.c | 77 +++++++++++++++++++ 4 files changed, 108 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 34091354380..ed83af1fcd3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,6 +1,19 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-12 Jakub Jelinek + + PR target/93670 + * config/i386/sse.md (VI48F_256_DQ): New mode iterator. + (avx512vl_vextractf128): Use it instead of VI48F_256. Remove + TARGET_AVX512DQ from condition. + (vec_extract_lo_): Use + instead of in condition. If + TARGET_AVX512DQ is false, emit vextract*64x4 instead of + vextract*32x8. + (vec_extract_lo_): Drop + from condition. + 2020-02-10 Jakub Jelinek =20 PR target/93637 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c0fe0eefd50..043665f5e8b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8220,13 +8220,16 @@ (set_attr "prefix" "evex") (set_attr "mode" "")]) =20 +(define_mode_iterator VI48F_256_DQ + [V8SI V8SF (V4DI "TARGET_AVX512DQ") (V4DF "TARGET_AVX512DQ")]) + (define_expand "avx512vl_vextractf128" [(match_operand: 0 "nonimmediate_operand") - (match_operand:VI48F_256 1 "register_operand") + (match_operand:VI48F_256_DQ 1 "register_operand") (match_operand:SI 2 "const_0_to_1_operand") (match_operand: 3 "nonimm_or_0_operand") (match_operand:QI 4 "register_operand")] - "TARGET_AVX512DQ && TARGET_AVX512VL" + "TARGET_AVX512VL" { rtx (*insn)(rtx, rtx, rtx, rtx); rtx dest =3D operands[0]; @@ -8294,14 +8297,19 @@ (const_int 4) (const_int 5) (const_int 6) (const_int 7)])))] "TARGET_AVX512F - && + && && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if ( || (!TARGET_AVX512VL && !REG_P (operands[0]) && EXT_REX_SSE_REG_P (operands[1]))) - return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + { + if (TARGET_AVX512DQ) + return "vextract32x8\t{$0x0, %1, %0|%0, %1, 0x0}"; + else + return "vextract64x4\t{$0x0, %1, %0|%0, %1, 0x0}"; + } else return "#"; } @@ -8411,7 +8419,7 @@ (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3)])))] "TARGET_AVX - && && + && && ( || !(MEM_P (operands[0]) && MEM_P (operands[1])))" { if () diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 9ec4d50dac3..5894d94ece7 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,6 +1,11 @@ 2020-02-13 Jakub Jelinek =20 Backported from mainline + 2020-02-12 Jakub Jelinek + + PR target/93670 + * gcc.target/i386/avx512vl-pr93670.c: New test. + 2020-02-10 Jakub Jelinek =20 PR target/93637 diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c b/gcc/testsui= te/gcc.target/i386/avx512vl-pr93670.c new file mode 100644 index 00000000000..3f232a96901 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-pr93670.c @@ -0,0 +1,77 @@ +/* PR target/93670 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512vl -mno-avx512dq" } */ + +#include + +__m128i +f1 (__m256i x) +{ + return _mm256_extracti32x4_epi32 (x, 0); +} + +__m128i +f2 (__m256i x, __m128i w, __mmask8 m) +{ + return _mm256_mask_extracti32x4_epi32 (w, m, x, 0); +} + +__m128i +f3 (__m256i x, __mmask8 m) +{ + return _mm256_maskz_extracti32x4_epi32 (m, x, 0); +} + +__m128 +f4 (__m256 x) +{ + return _mm256_extractf32x4_ps (x, 0); +} + +__m128 +f5 (__m256 x, __m128 w, __mmask8 m) +{ + return _mm256_mask_extractf32x4_ps (w, m, x, 0); +} + +__m128 +f6 (__m256 x, __mmask8 m) +{ + return _mm256_maskz_extractf32x4_ps (m, x, 0); +} + +__m128i +f7 (__m256i x) +{ + return _mm256_extracti32x4_epi32 (x, 1); +} + +__m128i +f8 (__m256i x, __m128i w, __mmask8 m) +{ + return _mm256_mask_extracti32x4_epi32 (w, m, x, 1); +} + +__m128i +f9 (__m256i x, __mmask8 m) +{ + return _mm256_maskz_extracti32x4_epi32 (m, x, 1); +} + +__m128 +f10 (__m256 x) +{ + return _mm256_extractf32x4_ps (x, 1); +} + +__m128 +f11 (__m256 x, __m128 w, __mmask8 m) +{ + return _mm256_mask_extractf32x4_ps (w, m, x, 1); +} + +__m128 +f12 (__m256 x, __mmask8 m) +{ + return _mm256_maskz_extractf32x4_ps (m, x, 1); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6612-gdc6d0f89d4be3ed7fde73417606a78c73d954cdf Content-Transfer-Encoding: quoted-printable Content-length: 7875 =46rom 488a947b2ddd57a6f44a6aecc32862f8cbf4ec77 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 13 Feb 2020 08:17:07 +0100 Subject: [PATCH 13/15] i386: Fix k*shift* intrinsics [PR93673] As mentioned in the PR, the intrinsics allow counts from 0 to 255, but we actually reject values from 128 to 255. That is because QImode CONST_INTs can be only -128 to 127. Fixed by using const_0_to_255_operand and dropping the modes for the operands with those predicates (the IL actually contains the CONST_INT which has VOIDmode). 2020-02-13 Jakub Jelinek PR target/93673 * config/i386/sse.md (k): Drop mode from last operand and use const_0_to_255_operand predicate instead of immediate_operand. (avx512dq_fpclass, avx512dq_vmfpclass, vgf2p8affineinvqb_, vgf2p8affineqb_): Drop mode from const_0_to_255_operand predicated operands. * gcc.target/i386/avx512f-pr93673.c: New test. * gcc.target/i386/avx512dq-pr93673.c: New test. * gcc.target/i386/avx512bw-pr93673.c: New test. --- gcc/ChangeLog | 9 ++++++ gcc/config/i386/sse.md | 10 +++---- gcc/testsuite/ChangeLog | 5 ++++ .../gcc.target/i386/avx512bw-pr93673.c | 30 +++++++++++++++++++ .../gcc.target/i386/avx512dq-pr93673.c | 20 +++++++++++++ .../gcc.target/i386/avx512f-pr93673.c | 20 +++++++++++++ 6 files changed, 89 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr93673.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512dq-pr93673.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr93673.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ed83af1fcd3..3c48d574029 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,14 @@ 2020-02-13 Jakub Jelinek =20 + PR target/93673 + * config/i386/sse.md (k): Drop mode from last operand and + use const_0_to_255_operand predicate instead of immediate_operand. + (avx512dq_fpclass, + avx512dq_vmfpclass, + vgf2p8affineinvqb_, + vgf2p8affineqb_): Drop mode from + const_0_to_255_operand predicated operands. + Backported from mainline 2020-02-12 Jakub Jelinek =20 diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 043665f5e8b..18cc39ae521 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -1610,7 +1610,7 @@ [(set (match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "=3Dk") (any_lshift:SWI1248_AVX512BWDQ (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k") - (match_operand:QI 2 "immediate_operand" "n"))) + (match_operand 2 "const_0_to_255_operand" "n"))) (unspec [(const_int 0)] UNSPEC_MASKOP)] "TARGET_AVX512F" "k\t{%2, %1, %0|%0, %1, %2}" @@ -21130,7 +21130,7 @@ [(set (match_operand: 0 "register_operand" "=3Dk") (unspec: [(match_operand:VF_AVX512VL 1 "register_operand" "v") - (match_operand:QI 2 "const_0_to_255_operand" "n")] + (match_operand 2 "const_0_to_255_operand" "n")] UNSPEC_FPCLASS))] "TARGET_AVX512DQ" "vfpclass\t{%2, %1, %0|%0, %1, %2}"; @@ -21144,7 +21144,7 @@ (and: (unspec: [(match_operand:VF_128 1 "register_operand" "v") - (match_operand:QI 2 "const_0_to_255_operand" "n")] + (match_operand 2 "const_0_to_255_operand" "n")] UNSPEC_FPCLASS) (const_int 1)))] "TARGET_AVX512DQ" @@ -21749,7 +21749,7 @@ (unspec:VI1_AVX512F [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v") (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm") - (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")] + (match_operand 3 "const_0_to_255_operand" "n,n,n")] UNSPEC_GF2P8AFFINEINV))] "TARGET_GFNI" "@ @@ -21767,7 +21767,7 @@ (unspec:VI1_AVX512F [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v") (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm") - (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")] + (match_operand 3 "const_0_to_255_operand" "n,n,n")] UNSPEC_GF2P8AFFINE))] "TARGET_GFNI" "@ diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 5894d94ece7..899255a84c6 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,10 @@ 2020-02-13 Jakub Jelinek =20 + PR target/93673 + * gcc.target/i386/avx512f-pr93673.c: New test. + * gcc.target/i386/avx512dq-pr93673.c: New test. + * gcc.target/i386/avx512bw-pr93673.c: New test. + Backported from mainline 2020-02-12 Jakub Jelinek =20 diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr93673.c b/gcc/testsui= te/gcc.target/i386/avx512bw-pr93673.c new file mode 100644 index 00000000000..dc87ed20d1d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr93673.c @@ -0,0 +1,30 @@ +/* PR target/93673 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512bw" } */ + +#include + +void +foo (__mmask32 *c, __mmask64 *d) +{ + c[0] =3D _kshiftli_mask32 (c[0], 0); + c[1] =3D _kshiftri_mask32 (c[1], 0); + c[2] =3D _kshiftli_mask32 (c[2], 1); + c[3] =3D _kshiftri_mask32 (c[3], 1); + c[4] =3D _kshiftli_mask32 (c[4], 31); + c[5] =3D _kshiftri_mask32 (c[5], 31); + c[6] =3D _kshiftli_mask32 (c[6], 0x7f); + c[7] =3D _kshiftri_mask32 (c[7], 0x7f); + c[8] =3D _kshiftli_mask32 (c[8], 0xff); + c[9] =3D _kshiftri_mask32 (c[9], 0xff); + d[0] =3D _kshiftli_mask64 (d[0], 0); + d[1] =3D _kshiftri_mask64 (d[1], 0); + d[2] =3D _kshiftli_mask64 (d[2], 1); + d[3] =3D _kshiftri_mask64 (d[3], 1); + d[4] =3D _kshiftli_mask64 (d[4], 63); + d[5] =3D _kshiftri_mask64 (d[5], 63); + d[6] =3D _kshiftli_mask64 (d[6], 0x7f); + d[7] =3D _kshiftri_mask64 (d[7], 0x7f); + d[8] =3D _kshiftli_mask64 (d[8], 0xff); + d[9] =3D _kshiftri_mask64 (d[9], 0xff); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-pr93673.c b/gcc/testsui= te/gcc.target/i386/avx512dq-pr93673.c new file mode 100644 index 00000000000..3ae1674e4a4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512dq-pr93673.c @@ -0,0 +1,20 @@ +/* PR target/93673 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512dq" } */ + +#include + +void +foo (__mmask8 *a) +{ + a[0] =3D _kshiftli_mask8 (a[0], 0); + a[1] =3D _kshiftri_mask8 (a[1], 0); + a[2] =3D _kshiftli_mask8 (a[2], 1); + a[3] =3D _kshiftri_mask8 (a[3], 1); + a[4] =3D _kshiftli_mask8 (a[4], 7); + a[5] =3D _kshiftri_mask8 (a[5], 7); + a[6] =3D _kshiftli_mask8 (a[6], 0x7f); + a[7] =3D _kshiftri_mask8 (a[7], 0x7f); + a[8] =3D _kshiftli_mask8 (a[8], 0xff); + a[9] =3D _kshiftri_mask8 (a[9], 0xff); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr93673.c b/gcc/testsuit= e/gcc.target/i386/avx512f-pr93673.c new file mode 100644 index 00000000000..963823c8a78 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr93673.c @@ -0,0 +1,20 @@ +/* PR target/93673 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f" } */ + +#include + +void +foo (__mmask16 *b) +{ + b[0] =3D _kshiftli_mask16 (b[0], 0); + b[1] =3D _kshiftri_mask16 (b[1], 0); + b[2] =3D _kshiftli_mask16 (b[2], 1); + b[3] =3D _kshiftri_mask16 (b[3], 1); + b[4] =3D _kshiftli_mask16 (b[4], 15); + b[5] =3D _kshiftri_mask16 (b[5], 15); + b[6] =3D _kshiftli_mask16 (b[6], 0x7f); + b[7] =3D _kshiftri_mask16 (b[7], 0x7f); + b[8] =3D _kshiftli_mask16 (b[8], 0xff); + b[9] =3D _kshiftri_mask16 (b[9], 0xff); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6617-gae2b8ede40a81a83f50d1e705972bc46fafd4ce5 Content-Transfer-Encoding: quoted-printable Content-length: 23662 =46rom 08cf145f991327d943d785066709f5f39d20bd85 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 13 Feb 2020 10:43:27 +0100 Subject: [PATCH 14/15] i386: Fix up _mm*_mask_popcnt_epi* [PR93696] As mentioned in the PR and as https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=3D_mask_= popcnt_epi also documents, _mm*_popcnt_epi* intrinsics are consistent with all other unary AVX512* intrinsics regarding arguments, i.e. the _mm*_whatever has just single argument (called a in the docs, and __A in the GCC headers), _mm*_mask_whatever has 3 arguments (called src, k, a in the docs and _W, __U, __A in GCC headers) and _mm*_maskz_whatever 2 arguments (called k, a in the docs and __U, __A in GCC headers). Unfortunately, whomever implemented the _mm*_popcnt_epi* intrinsics got it wrong for the _mm*_mask_popcnt_epi* ones, calling the args __A, __U, __B and not passing them in the canonical order to the builtins, making it API incompatible with ICC as well as clang (tested on godbolts clang 7/8/9/trunk and ICC 19.0.{0,1}, older clang/ICC don't understand those, so it isn't that it used to be broken even in other compilers and got changed afterwards). 2020-02-13 Jakub Jelinek PR target/93696 * config/i386/avx512bitalgintrin.h (_mm512_mask_popcnt_epi8, _mm512_mask_popcnt_epi16, _mm256_mask_popcnt_epi8, _mm256_mask_popcnt_epi16, _mm_mask_popcnt_epi8, _mm_mask_popcnt_epi16): Rename __B argument to __A and __A to __W, pass __A to the builtin followed by __W instead of __A followed by __B. * config/i386/avx512vpopcntdqintrin.h (_mm512_mask_popcnt_epi32, _mm512_mask_popcnt_epi64): Likewise. * config/i386/avx512vpopcntdqvlintrin.h (_mm_mask_popcnt_epi32, _mm256_mask_popcnt_epi32, _mm_mask_popcnt_epi64, _mm256_mask_popcnt_epi64): Likewise. * gcc.target/i386/pr93696-1.c: New test. * gcc.target/i386/pr93696-2.c: New test. * gcc.target/i386/avx512bitalg-vpopcntw-1.c (TEST): Fix argument order of _mm*_mask_popcnt_*. * gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c (TEST): Likewise. * gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c (TEST): Likewise. * gcc.target/i386/avx512bitalg-vpopcntb-1.c (TEST): Likewise. * gcc.target/i386/avx512bitalg-vpopcntb.c (foo): Likewise. * gcc.target/i386/avx512bitalg-vpopcntbvl.c (foo): Likewise. * gcc.target/i386/avx512vpopcntdq-vpopcntd.c (foo): Likewise. * gcc.target/i386/avx512bitalg-vpopcntwvl.c (foo): Likewise. * gcc.target/i386/avx512bitalg-vpopcntw.c (foo): Likewise. * gcc.target/i386/avx512vpopcntdq-vpopcntq.c (foo): Likewise. --- gcc/ChangeLog | 13 +++ gcc/config/i386/avx512bitalgintrin.h | 24 +++--- gcc/config/i386/avx512vpopcntdqintrin.h | 8 +- gcc/config/i386/avx512vpopcntdqvlintrin.h | 17 ++-- gcc/testsuite/ChangeLog | 15 ++++ .../gcc.target/i386/avx512bitalg-vpopcntb-1.c | 2 +- .../gcc.target/i386/avx512bitalg-vpopcntb.c | 2 +- .../gcc.target/i386/avx512bitalg-vpopcntbvl.c | 4 +- .../gcc.target/i386/avx512bitalg-vpopcntw-1.c | 2 +- .../gcc.target/i386/avx512bitalg-vpopcntw.c | 2 +- .../gcc.target/i386/avx512bitalg-vpopcntwvl.c | 4 +- .../i386/avx512vpopcntdq-vpopcntd-1.c | 2 +- .../i386/avx512vpopcntdq-vpopcntd.c | 6 +- .../i386/avx512vpopcntdq-vpopcntq-1.c | 2 +- .../i386/avx512vpopcntdq-vpopcntq.c | 6 +- gcc/testsuite/gcc.target/i386/pr93696-1.c | 79 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr93696-2.c | 79 +++++++++++++++++++ 17 files changed, 226 insertions(+), 41 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr93696-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr93696-2.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3c48d574029..b24a416a417 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,18 @@ 2020-02-13 Jakub Jelinek =20 + PR target/93696 + * config/i386/avx512bitalgintrin.h (_mm512_mask_popcnt_epi8, + _mm512_mask_popcnt_epi16, _mm256_mask_popcnt_epi8, + _mm256_mask_popcnt_epi16, _mm_mask_popcnt_epi8, + _mm_mask_popcnt_epi16): Rename __B argument to __A and __A to __W, + pass __A to the builtin followed by __W instead of __A followed by + __B. + * config/i386/avx512vpopcntdqintrin.h (_mm512_mask_popcnt_epi32, + _mm512_mask_popcnt_epi64): Likewise. + * config/i386/avx512vpopcntdqvlintrin.h (_mm_mask_popcnt_epi32, + _mm256_mask_popcnt_epi32, _mm_mask_popcnt_epi64, + _mm256_mask_popcnt_epi64): Likewise. + PR target/93673 * config/i386/sse.md (k): Drop mode from last operand and use const_0_to_255_operand predicate instead of immediate_operand. diff --git a/gcc/config/i386/avx512bitalgintrin.h b/gcc/config/i386/avx512b= italgintrin.h index 8b4fc8e3f67..8ad8586cc59 100644 --- a/gcc/config/i386/avx512bitalgintrin.h +++ b/gcc/config/i386/avx512bitalgintrin.h @@ -61,10 +61,10 @@ _mm512_popcnt_epi16 (__m512i __A) =20 extern __inline __m512i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_popcnt_epi8 (__m512i __A, __mmask64 __U, __m512i __B) +_mm512_mask_popcnt_epi8 (__m512i __W, __mmask64 __U, __m512i __A) { return (__m512i) __builtin_ia32_vpopcountb_v64qi_mask ((__v64qi) __A, - (__v64qi) __B, + (__v64qi) __W, (__mmask64) __U); } =20 @@ -79,10 +79,10 @@ _mm512_maskz_popcnt_epi8 (__mmask64 __U, __m512i __A) } extern __inline __m512i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_popcnt_epi16 (__m512i __A, __mmask32 __U, __m512i __B) +_mm512_mask_popcnt_epi16 (__m512i __W, __mmask32 __U, __m512i __A) { return (__m512i) __builtin_ia32_vpopcountw_v32hi_mask ((__v32hi) __A, - (__v32hi) __B, + (__v32hi) __W, (__mmask32) __U); } =20 @@ -127,10 +127,10 @@ _mm512_mask_bitshuffle_epi64_mask (__mmask64 __M, __m= 512i __A, __m512i __B) =20 extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_popcnt_epi8 (__m256i __A, __mmask32 __U, __m256i __B) +_mm256_mask_popcnt_epi8 (__m256i __W, __mmask32 __U, __m256i __A) { return (__m256i) __builtin_ia32_vpopcountb_v32qi_mask ((__v32qi) __A, - (__v32qi) __B, + (__v32qi) __W, (__mmask32) __U); } =20 @@ -222,10 +222,10 @@ _mm_popcnt_epi16 (__m128i __A) =20 extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_popcnt_epi16 (__m256i __A, __mmask16 __U, __m256i __B) +_mm256_mask_popcnt_epi16 (__m256i __W, __mmask16 __U, __m256i __A) { return (__m256i) __builtin_ia32_vpopcountw_v16hi_mask ((__v16hi) __A, - (__v16hi) __B, + (__v16hi) __W, (__mmask16) __U); } =20 @@ -241,10 +241,10 @@ _mm256_maskz_popcnt_epi16 (__mmask16 __U, __m256i __A) =20 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_popcnt_epi8 (__m128i __A, __mmask16 __U, __m128i __B) +_mm_mask_popcnt_epi8 (__m128i __W, __mmask16 __U, __m128i __A) { return (__m128i) __builtin_ia32_vpopcountb_v16qi_mask ((__v16qi) __A, - (__v16qi) __B, + (__v16qi) __W, (__mmask16) __U); } =20 @@ -259,10 +259,10 @@ _mm_maskz_popcnt_epi8 (__mmask16 __U, __m128i __A) } extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_popcnt_epi16 (__m128i __A, __mmask8 __U, __m128i __B) +_mm_mask_popcnt_epi16 (__m128i __W, __mmask8 __U, __m128i __A) { return (__m128i) __builtin_ia32_vpopcountw_v8hi_mask ((__v8hi) __A, - (__v8hi) __B, + (__v8hi) __W, (__mmask8) __U); } =20 diff --git a/gcc/config/i386/avx512vpopcntdqintrin.h b/gcc/config/i386/avx5= 12vpopcntdqintrin.h index 3569430baa7..119bdf41887 100644 --- a/gcc/config/i386/avx512vpopcntdqintrin.h +++ b/gcc/config/i386/avx512vpopcntdqintrin.h @@ -43,10 +43,10 @@ _mm512_popcnt_epi32 (__m512i __A) =20 extern __inline __m512i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_popcnt_epi32 (__m512i __A, __mmask16 __U, __m512i __B) +_mm512_mask_popcnt_epi32 (__m512i __W, __mmask16 __U, __m512i __A) { return (__m512i) __builtin_ia32_vpopcountd_v16si_mask ((__v16si) __A, - (__v16si) __B, + (__v16si) __W, (__mmask16) __U); } =20 @@ -69,10 +69,10 @@ _mm512_popcnt_epi64 (__m512i __A) =20 extern __inline __m512i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_mask_popcnt_epi64 (__m512i __A, __mmask8 __U, __m512i __B) +_mm512_mask_popcnt_epi64 (__m512i __W, __mmask8 __U, __m512i __A) { return (__m512i) __builtin_ia32_vpopcountq_v8di_mask ((__v8di) __A, - (__v8di) __B, + (__v8di) __W, (__mmask8) __U); } =20 diff --git a/gcc/config/i386/avx512vpopcntdqvlintrin.h b/gcc/config/i386/av= x512vpopcntdqvlintrin.h index b974d09338a..2765592f0ae 100644 --- a/gcc/config/i386/avx512vpopcntdqvlintrin.h +++ b/gcc/config/i386/avx512vpopcntdqvlintrin.h @@ -43,10 +43,10 @@ _mm_popcnt_epi32 (__m128i __A) =20 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_popcnt_epi32 (__m128i __A, __mmask16 __U, __m128i __B) +_mm_mask_popcnt_epi32 (__m128i __W, __mmask16 __U, __m128i __A) { return (__m128i) __builtin_ia32_vpopcountd_v4si_mask ((__v4si) __A, - (__v4si) __B, + (__v4si) __W, (__mmask16) __U); } =20 @@ -69,10 +69,10 @@ _mm256_popcnt_epi32 (__m256i __A) =20 extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_popcnt_epi32 (__m256i __A, __mmask16 __U, __m256i __B) +_mm256_mask_popcnt_epi32 (__m256i __W, __mmask16 __U, __m256i __A) { return (__m256i) __builtin_ia32_vpopcountd_v8si_mask ((__v8si) __A, - (__v8si) __B, + (__v8si) __W, (__mmask16) __U); } =20 @@ -95,10 +95,10 @@ _mm_popcnt_epi64 (__m128i __A) =20 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_mask_popcnt_epi64 (__m128i __A, __mmask8 __U, __m128i __B) +_mm_mask_popcnt_epi64 (__m128i __W, __mmask8 __U, __m128i __A) { return (__m128i) __builtin_ia32_vpopcountq_v2di_mask ((__v2di) __A, - (__v2di) __B, + (__v2di) __W, (__mmask8) __U); } =20 @@ -121,10 +121,10 @@ _mm256_popcnt_epi64 (__m256i __A) =20 extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_mask_popcnt_epi64 (__m256i __A, __mmask8 __U, __m256i __B) +_mm256_mask_popcnt_epi64 (__m256i __W, __mmask8 __U, __m256i __A) { return (__m256i) __builtin_ia32_vpopcountq_v4di_mask ((__v4di) __A, - (__v4di) __B, + (__v4di) __W, (__mmask8) __U); } =20 @@ -144,4 +144,3 @@ _mm256_maskz_popcnt_epi64 (__mmask8 __U, __m256i __A) #endif /* __DISABLE_AVX512VPOPCNTDQVL__ */ =20 #endif /* _AVX512VPOPCNTDQVLINTRIN_H_INCLUDED */ - diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 899255a84c6..114a1087936 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,20 @@ 2020-02-13 Jakub Jelinek =20 + PR target/93696 + * gcc.target/i386/pr93696-1.c: New test. + * gcc.target/i386/pr93696-2.c: New test. + * gcc.target/i386/avx512bitalg-vpopcntw-1.c (TEST): Fix argument order + of _mm*_mask_popcnt_*. + * gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c (TEST): Likewise. + * gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c (TEST): Likewise. + * gcc.target/i386/avx512bitalg-vpopcntb-1.c (TEST): Likewise. + * gcc.target/i386/avx512bitalg-vpopcntb.c (foo): Likewise. + * gcc.target/i386/avx512bitalg-vpopcntbvl.c (foo): Likewise. + * gcc.target/i386/avx512vpopcntdq-vpopcntd.c (foo): Likewise. + * gcc.target/i386/avx512bitalg-vpopcntwvl.c (foo): Likewise. + * gcc.target/i386/avx512bitalg-vpopcntw.c (foo): Likewise. + * gcc.target/i386/avx512vpopcntdq-vpopcntq.c (foo): Likewise. + PR target/93673 * gcc.target/i386/avx512f-pr93673.c: New test. * gcc.target/i386/avx512dq-pr93673.c: New test. diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c b/gcc/= testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c index 3dcd48f7e2a..697757b8b73 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb-1.c @@ -41,7 +41,7 @@ TEST (void) } =20 res1.x =3D INTRINSIC (_popcnt_epi8) (src.x); - res2.x =3D INTRINSIC (_mask_popcnt_epi8) (src.x, mask, src0.x); + res2.x =3D INTRINSIC (_mask_popcnt_epi8) (src0.x, mask, src.x); res3.x =3D INTRINSIC (_maskz_popcnt_epi8) (mask, src.x); =20 if (UNION_CHECK (AVX512F_LEN, i_b) (res1, res_ref)) diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c b/gcc/te= stsuite/gcc.target/i386/avx512bitalg-vpopcntb.c index b23da58dbaf..246f925eede 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntb.c @@ -13,7 +13,7 @@ int foo () __mmask16 msk; __m512i c =3D _mm512_popcnt_epi8 (z); asm volatile ("" : "+v" (c)); - c =3D _mm512_mask_popcnt_epi8 (z, msk, z1); + c =3D _mm512_mask_popcnt_epi8 (z1, msk, z); asm volatile ("" : "+v" (c)); c =3D _mm512_maskz_popcnt_epi8 (msk, z); asm volatile ("" : "+v" (c)); diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c b/gcc/= testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c index e6d60f7596c..8c7f45fc5f7 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntbvl.c @@ -18,13 +18,13 @@ int foo () __mmask16 msk16; __m256i c256 =3D _mm256_popcnt_epi8 (y); asm volatile ("" : "+v" (c256)); - c256 =3D _mm256_mask_popcnt_epi8 (y, msk32, y_1); + c256 =3D _mm256_mask_popcnt_epi8 (y_1, msk32, y); asm volatile ("" : "+v" (c256)); c256 =3D _mm256_maskz_popcnt_epi8 (msk32, y); asm volatile ("" : "+v" (c256)); __m128i c128 =3D _mm_popcnt_epi8 (x); asm volatile ("" : "+v" (c128)); - c128 =3D _mm_mask_popcnt_epi8 (x, msk16, x_1); + c128 =3D _mm_mask_popcnt_epi8 (x_1, msk16, x); asm volatile ("" : "+v" (c128)); c128 =3D _mm_maskz_popcnt_epi8 (msk16, x); asm volatile ("" : "+v" (c128)); diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c b/gcc/= testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c index 4f866db2f7a..0a725fe012a 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw-1.c @@ -41,7 +41,7 @@ TEST (void) } =20 res1.x =3D INTRINSIC (_popcnt_epi16) (src.x); - res2.x =3D INTRINSIC (_mask_popcnt_epi16) (src.x, mask, src0.x); + res2.x =3D INTRINSIC (_mask_popcnt_epi16) (src0.x, mask, src.x); res3.x =3D INTRINSIC (_maskz_popcnt_epi16) (mask, src.x); =20 if (UNION_CHECK (AVX512F_LEN, i_w) (res1, res_ref)) diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c b/gcc/te= stsuite/gcc.target/i386/avx512bitalg-vpopcntw.c index 2c49583b597..90663f480fc 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntw.c @@ -13,7 +13,7 @@ int foo () __mmask16 msk; __m512i c =3D _mm512_popcnt_epi16 (z); asm volatile ("" : "+v" (c)); - c =3D _mm512_mask_popcnt_epi16 (z, msk, z1); + c =3D _mm512_mask_popcnt_epi16 (z1, msk, z); asm volatile ("" : "+v" (c)); c =3D _mm512_maskz_popcnt_epi16 (msk, z); asm volatile ("" : "+v" (c)); diff --git a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c b/gcc/= testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c index b55adc6023a..3a646b57282 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c +++ b/gcc/testsuite/gcc.target/i386/avx512bitalg-vpopcntwvl.c @@ -18,13 +18,13 @@ int foo () __mmask8 msk8; __m256i c256 =3D _mm256_popcnt_epi16 (y); asm volatile ("" : "+v" (c256)); - c256 =3D _mm256_mask_popcnt_epi16 (y, msk16, y_1); + c256 =3D _mm256_mask_popcnt_epi16 (y_1, msk16, y); asm volatile ("" : "+v" (c256)); c256 =3D _mm256_maskz_popcnt_epi16 (msk16, y); asm volatile ("" : "+v" (c256)); __m128i c128 =3D _mm_popcnt_epi16 (x); asm volatile ("" : "+v" (c128)); - c128 =3D _mm_mask_popcnt_epi16 (x, msk8, x_1); + c128 =3D _mm_mask_popcnt_epi16 (x_1, msk8, x); asm volatile ("" : "+v" (c128)); c128 =3D _mm_maskz_popcnt_epi16 (msk8, x); asm volatile ("" : "+v" (c128)); diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c b/g= cc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c index 245dcd4d534..e7d6bb4dd53 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd-1.c @@ -40,7 +40,7 @@ TEST (void) } =20 res1.x =3D INTRINSIC (_popcnt_epi32) (src.x); - res2.x =3D INTRINSIC (_mask_popcnt_epi32) (src.x, mask, src0.x); + res2.x =3D INTRINSIC (_mask_popcnt_epi32) (src0.x, mask, src.x); res3.x =3D INTRINSIC (_maskz_popcnt_epi32) (mask, src.x); =20 if (UNION_CHECK (AVX512F_LEN, i_d) (res1, res_ref)) diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c b/gcc= /testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c index c70f226824e..b4d82f97032 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntd.c @@ -22,19 +22,19 @@ int foo () __mmask8 msk8; __m128i a =3D _mm_popcnt_epi32 (x); asm volatile ("" : "+v" (a)); - a =3D _mm_mask_popcnt_epi32 (x, msk8, x_1); + a =3D _mm_mask_popcnt_epi32 (x_1, msk8, x); asm volatile ("" : "+v" (a)); a =3D _mm_maskz_popcnt_epi32 (msk8, x); asm volatile ("" : "+v" (a)); __m256i b =3D _mm256_popcnt_epi32 (y); asm volatile ("" : "+v" (b)); - b =3D _mm256_mask_popcnt_epi32 (y, msk8, y_1); + b =3D _mm256_mask_popcnt_epi32 (y_1, msk8, y); asm volatile ("" : "+v" (b)); b =3D _mm256_maskz_popcnt_epi32 (msk8, y); asm volatile ("" : "+v" (b)); __m512i c =3D _mm512_popcnt_epi32 (z); asm volatile ("" : "+v" (c)); - c =3D _mm512_mask_popcnt_epi32 (z, msk, z_1); + c =3D _mm512_mask_popcnt_epi32 (z_1, msk, z); asm volatile ("" : "+v" (c)); c =3D _mm512_maskz_popcnt_epi32 (msk, z); asm volatile ("" : "+v" (c)); diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c b/g= cc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c index 27555c496d6..2144cf32c0d 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq-1.c @@ -40,7 +40,7 @@ TEST (void) } =20 res1.x =3D INTRINSIC (_popcnt_epi64) (src.x); - res2.x =3D INTRINSIC (_mask_popcnt_epi64) (src.x, mask, src0.x); + res2.x =3D INTRINSIC (_mask_popcnt_epi64) (src0.x, mask, src.x); res3.x =3D INTRINSIC (_maskz_popcnt_epi64) (mask, src.x); =20 if (UNION_CHECK (AVX512F_LEN, i_q) (res1, res_ref)) diff --git a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c b/gcc= /testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c index 9f400c005f3..e87d6c999b6 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c +++ b/gcc/testsuite/gcc.target/i386/avx512vpopcntdq-vpopcntq.c @@ -21,19 +21,19 @@ int foo () __mmask8 msk;=20 __m128i a =3D _mm_popcnt_epi64 (x); asm volatile ("" : "+v" (a)); - a =3D _mm_mask_popcnt_epi64 (x, msk, x_1); + a =3D _mm_mask_popcnt_epi64 (x_1, msk, x); asm volatile ("" : "+v" (a)); a =3D _mm_maskz_popcnt_epi64 (msk, x); asm volatile ("" : "+v" (a)); __m256i b =3D _mm256_popcnt_epi64 (y); asm volatile ("" : "+v" (b)); - b =3D _mm256_mask_popcnt_epi64 (y, msk, y_1); + b =3D _mm256_mask_popcnt_epi64 (y_1, msk, y); asm volatile ("" : "+v" (b)); b =3D _mm256_maskz_popcnt_epi64 (msk, y); asm volatile ("" : "+v" (b)); __m512i c =3D _mm512_popcnt_epi64 (z); asm volatile ("" : "+v" (c)); - c =3D _mm512_mask_popcnt_epi64 (z, msk, z_1); + c =3D _mm512_mask_popcnt_epi64 (z_1, msk, z); asm volatile ("" : "+v" (c)); c =3D _mm512_maskz_popcnt_epi64 (msk, z);=20 asm volatile ("" : "+v" (c)); diff --git a/gcc/testsuite/gcc.target/i386/pr93696-1.c b/gcc/testsuite/gcc.= target/i386/pr93696-1.c new file mode 100644 index 00000000000..128bb98c066 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr93696-1.c @@ -0,0 +1,79 @@ +/* PR target/93696 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512bitalg -mavx512vpopcntdq -mavx512vl -mavx512b= w -masm=3Datt" } */ +/* { dg-final { scan-assembler-times "vpopcnt\[bwdq]\t%\[xyz]mm1, %\[xyz]m= m0\{%k\[0-7]\}\[^\{]" 12 } } */ +/* { dg-final { scan-assembler-not "vmovdq\[au]\[0-9]" } } */ + +#include + +__m128i +f1 (__m128i x, __mmask8 m, __m128i y) +{ + return _mm_mask_popcnt_epi64 (x, m, y); +} + +__m128i +f2 (__m128i x, __mmask8 m, __m128i y) +{ + return _mm_mask_popcnt_epi32 (x, m, y); +} + +__m128i +f3 (__m128i x, __mmask8 m, __m128i y) +{ + return _mm_mask_popcnt_epi16 (x, m, y); +} + +__m128i +f4 (__m128i x, __mmask16 m, __m128i y) +{ + return _mm_mask_popcnt_epi8 (x, m, y); +} + +__m256i +f5 (__m256i x, __mmask8 m, __m256i y) +{ + return _mm256_mask_popcnt_epi64 (x, m, y); +} + +__m256i +f6 (__m256i x, __mmask8 m, __m256i y) +{ + return _mm256_mask_popcnt_epi32 (x, m, y); +} + +__m256i +f7 (__m256i x, __mmask16 m, __m256i y) +{ + return _mm256_mask_popcnt_epi16 (x, m, y); +} + +__m256i +f8 (__m256i x, __mmask32 m, __m256i y) +{ + return _mm256_mask_popcnt_epi8 (x, m, y); +} + +__m512i +f9 (__m512i x, __mmask8 m, __m512i y) +{ + return _mm512_mask_popcnt_epi64 (x, m, y); +} + +__m512i +f10 (__m512i x, __mmask16 m, __m512i y) +{ + return _mm512_mask_popcnt_epi32 (x, m, y); +} + +__m512i +f11 (__m512i x, __mmask32 m, __m512i y) +{ + return _mm512_mask_popcnt_epi16 (x, m, y); +} + +__m512i +f12 (__m512i x, __mmask64 m, __m512i y) +{ + return _mm512_mask_popcnt_epi8 (x, m, y); +} diff --git a/gcc/testsuite/gcc.target/i386/pr93696-2.c b/gcc/testsuite/gcc.= target/i386/pr93696-2.c new file mode 100644 index 00000000000..25a298aea18 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr93696-2.c @@ -0,0 +1,79 @@ +/* PR target/93696 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512bitalg -mavx512vpopcntdq -mavx512vl -mavx512b= w -masm=3Datt" } */ +/* { dg-final { scan-assembler-times "vpopcnt\[bwdq]\t%\[xyz]mm1, %\[xyz]m= m0\{%k\[0-7]\}\{z\}" 12 } } */ +/* { dg-final { scan-assembler-not "vmovdq\[au]\[0-9]" } } */ + +#include + +__m128i +f1 (__m128i x, __mmask8 m, __m128i y) +{ + return _mm_maskz_popcnt_epi64 (m, y); +} + +__m128i +f2 (__m128i x, __mmask8 m, __m128i y) +{ + return _mm_maskz_popcnt_epi32 (m, y); +} + +__m128i +f3 (__m128i x, __mmask8 m, __m128i y) +{ + return _mm_maskz_popcnt_epi16 (m, y); +} + +__m128i +f4 (__m128i x, __mmask16 m, __m128i y) +{ + return _mm_maskz_popcnt_epi8 (m, y); +} + +__m256i +f5 (__m256i x, __mmask8 m, __m256i y) +{ + return _mm256_maskz_popcnt_epi64 (m, y); +} + +__m256i +f6 (__m256i x, __mmask8 m, __m256i y) +{ + return _mm256_maskz_popcnt_epi32 (m, y); +} + +__m256i +f7 (__m256i x, __mmask16 m, __m256i y) +{ + return _mm256_maskz_popcnt_epi16 (m, y); +} + +__m256i +f8 (__m256i x, __mmask32 m, __m256i y) +{ + return _mm256_maskz_popcnt_epi8 (m, y); +} + +__m512i +f9 (__m512i x, __mmask8 m, __m512i y) +{ + return _mm512_maskz_popcnt_epi64 (m, y); +} + +__m512i +f10 (__m512i x, __mmask16 m, __m512i y) +{ + return _mm512_maskz_popcnt_epi32 (m, y); +} + +__m512i +f11 (__m512i x, __mmask32 m, __m512i y) +{ + return _mm512_maskz_popcnt_epi16 (m, y); +} + +__m512i +f12 (__m512i x, __mmask64 m, __m512i y) +{ + return _mm512_maskz_popcnt_epi8 (m, y); +} --=20 2.20.1 --x4pBfXISqBoDm8sr Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=r10-6625-gbacdd5e978dad84e9c547b0d5c7fed14b8d75157 Content-Transfer-Encoding: quoted-printable Content-length: 3366 =46rom 7276dd4c7480dd952f0d4a9322ca04ca29f5126f Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Thu, 13 Feb 2020 21:00:09 +0100 Subject: [PATCH 15/15] c: Fix ICE with cast to VLA [93576] The following testcase ICEs, because the PR84305 changes try to evaluate the size earlier. If size has side-effects, that is desirable, and the side-effects will actually be wrapped in a SAVE_EXPR. The problem on this testcase is that there are no side-effects, and c_fully_fold doesn't fold those COMPOUND_EXPRs to constant, and while before gimplification we unshare trees found in the expressions, the unsharing doesn't involve TYPE_SIZE etc. of used types. Gimplification is destructive though, so when we gimplify the two nested COMPOUND_EXPRs and then try to gimplify it the second time for the TYPE_SIZEs, we ICE. Now, we could use unshare_expr in what we push to *expr, SAVE_EXPRs and their operands in there aren't unshared, but I really don't see a point of evaluating expressions that don't have side-effects before, so instead this just pushes there expressions that do have side-effects. 2020-02-13 Jakub Jelinek PR c/93576 * c-decl.c (grokdeclarator): If this_size_varies, only push size into *expr if it has side effects. * gcc.dg/pr93576.c: New test. --- gcc/c/ChangeLog | 6 ++++++ gcc/c/c-decl.c | 13 ++++++++----- gcc/testsuite/ChangeLog | 3 +++ gcc/testsuite/gcc.dg/pr93576.c | 10 ++++++++++ 4 files changed, 27 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/pr93576.c diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog index b17ed98a9ae..5ba4516e114 100644 --- a/gcc/c/ChangeLog +++ b/gcc/c/ChangeLog @@ -1,3 +1,9 @@ +2020-02-13 Jakub Jelinek + + PR c/93576 + * c-decl.c (grokdeclarator): If this_size_varies, only push size into + *expr if it has side effects. + 2020-01-22 Joseph Myers =20 Backport from mainline: diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index f77fb1739ca..859a6241258 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -6368,11 +6368,14 @@ grokdeclarator (const struct c_declarator *declarat= or, } if (this_size_varies) { - if (*expr) - *expr =3D build2 (COMPOUND_EXPR, TREE_TYPE (size), - *expr, size); - else - *expr =3D size; + if (TREE_SIDE_EFFECTS (size)) + { + if (*expr) + *expr =3D build2 (COMPOUND_EXPR, TREE_TYPE (size), + *expr, size); + else + *expr =3D size; + } *expr_const_operands &=3D size_maybe_const; } } diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 114a1087936..24e5e88d763 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,8 @@ 2020-02-13 Jakub Jelinek =20 + PR c/93576 + * gcc.dg/pr93576.c: New test. + PR target/93696 * gcc.target/i386/pr93696-1.c: New test. * gcc.target/i386/pr93696-2.c: New test. diff --git a/gcc/testsuite/gcc.dg/pr93576.c b/gcc/testsuite/gcc.dg/pr93576.c new file mode 100644 index 00000000000..13c34f3771f --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr93576.c @@ -0,0 +1,10 @@ +/* PR c/93576 */ +/* { dg-do compile } */ +/* { dg-options "" } */ + +void +foo (void) +{ + int b[] =3D { 0 }; + (char (*)[(1, 7, 2)]) 0; +} --=20 2.20.1 --x4pBfXISqBoDm8sr--