From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id EB7DC3858D35 for ; Wed, 28 Jun 2023 13:17:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EB7DC3858D35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 0724C218B0 for ; Wed, 28 Jun 2023 13:17:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1687958230; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=8Y9qQKXwqwZHH+ROvmRnBOG5Ire5E/mGfA3pIbJ0VqE=; b=CBu4RCMjtMk7ousiAYY0jMJnQHEca0lsDaHU9oQtni0xTDPslKWnYGiBHuLqDEginsYFb3 o27zMH3xebFqHarrJRTHt3fqciIjXMTz+ppiennGI6Ety4QB6YVXUjvdhwd4fYhkNC+sqc k817WhheV9BnaaAN5aAoifNYT45/lhM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1687958230; h=from:from:reply-to:date:date:to:to:cc:mime-version:mime-version: content-type:content-type; bh=8Y9qQKXwqwZHH+ROvmRnBOG5Ire5E/mGfA3pIbJ0VqE=; b=K0Y9xv324NCWfQcKhVUMl9NIWlZhX7IvUcFVRfBWvqvWsA1wRng7lbQxdxZ5u9+cfrw+Jr In5dj9a7Ng7MB0CQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id EB11E2C314 for ; Wed, 28 Jun 2023 13:17:09 +0000 (UTC) Date: Wed, 28 Jun 2023 13:17:09 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org Subject: [PATCH] middle-end/110452 - bad code generation with AVX512 mask splat User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,MISSING_MID,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Message-ID: <20230628131709.7viTykkEg4lSWJH34Q50swX_9AakhPfTfhPLRCQKdvM@z> The following adds an alternate way of expanding a uniform mask vector constructor like _55 = _2 ? -1 : 0; vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55}; when the mask mode is a scalar int mode like for AVX512 or GCN. Instead of piecewise building the result via shifts and ors we can take advantage of uniformity and signedness of the component and simply sign-extend to the result. Instead of cmpl $3, %edi sete %cl movl %ecx, %esi leal (%rsi,%rsi), %eax leal 0(,%rsi,4), %r9d leal 0(,%rsi,8), %r8d orl %esi, %eax orl %r9d, %eax movl %ecx, %r9d orl %r8d, %eax movl %ecx, %r8d sall $4, %r9d sall $5, %r8d sall $6, %esi orl %r9d, %eax orl %r8d, %eax movl %ecx, %r8d orl %esi, %eax sall $7, %r8d orl %r8d, %eax kmovb %eax, %k1 we then get cmpl $3, %edi sete %cl negl %ecx kmovb %ecx, %k1 Code generation for non-uniform masks remains bad, but at least I see no easy way out for the most general case here. Bootstrapped and tested on x86_64-unknown-linux-gnu. Will apply tomorrow after double-checking SPEC results and if no comments appear. Richard. PR middle-end/110452 * expr.cc (store_constructor): Handle uniform boolean vectors with integer mode specially. --- gcc/expr.cc | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pr110452.c | 13 +++++++++++++ 2 files changed, 26 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr110452.c diff --git a/gcc/expr.cc b/gcc/expr.cc index 62cd8facf75..b7f4e2fda9e 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -7447,6 +7447,19 @@ store_constructor (tree exp, rtx target, int cleared, poly_int64 size, emit_move_insn (target, ops[0].value); break; } + /* Use sign-extension for uniform boolean vectors with + integer modes. */ + if (!TREE_SIDE_EFFECTS (exp) + && VECTOR_BOOLEAN_TYPE_P (type) + && SCALAR_INT_MODE_P (mode) + && (elt = uniform_vector_p (exp)) + && !VECTOR_TYPE_P (TREE_TYPE (elt))) + { + rtx op0 = force_reg (TYPE_MODE (TREE_TYPE (elt)), + expand_normal (elt)); + convert_move (target, op0, 0); + break; + } n_elts = TYPE_VECTOR_SUBPARTS (type); if (REG_P (target) diff --git a/gcc/testsuite/gcc.target/i386/pr110452.c b/gcc/testsuite/gcc.target/i386/pr110452.c new file mode 100644 index 00000000000..8a3e2e560d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr110452.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -mavx512f -mprefer-vector-width=512" } */ + +double a[1024], b[1024], c[1024]; + +void foo (int flag, int n) +{ + _Bool x = flag == 3; + for (int i = 0; i < n; ++i) + a[i] = (x ? b[i] : c[i]) * 42.; +} + +/* { dg-final { scan-assembler-not "\[^x\]orl" } } */ -- 2.35.3