From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) by sourceware.org (Postfix) with ESMTPS id 507E93858C62 for ; Wed, 17 Jan 2024 03:03:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 507E93858C62 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 507E93858C62 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::b2b ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705460617; cv=none; b=JT0jx365rlJIFxBg1b9LtJffSgTY9HPjfot0CkbkTOVzBE57kFFzUhcgQDcUKx7Y7zAtKn/3iW0sPvYx5ahFrKNa84tkLlH8IGB4FlXYy45DxRONBGyOmQZADFnDqrnafnpfDXLWmDYoTRK6OOyRUoXcYYdPgbsFkDIcl33FNJg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705460617; c=relaxed/simple; bh=xkqM5pNMqlnvYNYfRHPjD7sXWkJ25yJZa9bYt0mPCuE=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=t7hAFOGkbI5ODqQfxnYiGP58FzuDNHKJhbNxeeCNNvRdG7BLTIca9xTv+5QJsY0JqAErakaL1t5pPM7DFvdcC9f0+m5PNz/7xlu96fq8+O9MUXIT5TTHikfregyg1Gnk/tk4i5UNumDmM15+Xd6G7mhb9gw9KMpcVoZ6ts7Khqo= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-dc2281ee81fso1313030276.2 for ; Tue, 16 Jan 2024 19:03:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705460614; x=1706065414; darn=gcc.gnu.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pzgCDA4OSWJuhPnEGromDBnsqkUlS499d7fHvFAwzKE=; b=Ypl+OnEdRWdt8D1FYCcbSvEHXX8ehBF996wGlFBGGqvrrkJtKrpqbHdW+sMsUjXGV6 uQrutXFeM6pmiQXIj3pxS49FbBbLTjA23meLq90h1DwNNXTxpflWTBKqkQcevyjIyljU PQ22Xm1afl9EkkyrmUoohtxOfQHxL0SkRmyT07ImsvQqrn0VAhoPVMMn4gxwfsHvSnkV xHaS6Q+jj0OdvODUGm5d67pbA4ONy2OYF2XBex/lxQlApEX+IFI8R+1aizxCKD1fPy1H bIOCxldnVoLzewBHmNdRsismrkzadUfeR0Mohkppwu9GVjjIaanZbXAfJfPUrKdj1SlG Bhow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705460614; x=1706065414; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pzgCDA4OSWJuhPnEGromDBnsqkUlS499d7fHvFAwzKE=; b=i9akPcXEq7wGWsFUQOI78rwgwNO4R4I55OOfSOxFI9F5UyCxTEeHtwUGMxFB02yhrX 6ZZgKRGyTErt6iXeIFB41n7OwPmU+3wjb+R3fvbSH4EocX8rlnSQXXOh2DScxODwnfjZ uR/Z8UowqAdgi7BmI76x1l9PwUR/02J6p3Q+WGTk6G1bH19dCX1noouDyIAbiRMY17Ag +6KRFVZnafqwph+VLk7Pj5LuHtLjANMUZ6rrLJZ1+C6L2FT/NpLKj5pCeGtiw6Oui8Jo dQbbcLIolB6EXM3wzAWMXvQjFRqKfnvnH4/0cyKwV//FaBAep/Gsm6PvpvCePRYsEIpo uScw== X-Gm-Message-State: AOJu0YztJGmxgxPPF1U/A1I4drsGXn2t+06pDCyW+8kPAWhysCRJJmmM Q8UjyfwPCcRBKTw7oV4GHCTxCqWm/qTYBUwhDhI= X-Google-Smtp-Source: AGHT+IEqfr5tsKXBZVNkNgfIbA67pKsC6DjSnI3UFdf9M4WFmfXISkEoGnW4WsFLcVl4uIHSzIX4NNJHaqyC53nIoMs= X-Received: by 2002:a25:5f45:0:b0:dc2:28f0:37fa with SMTP id h5-20020a255f45000000b00dc228f037famr1372000ybm.91.1705460614604; Tue, 16 Jan 2024 19:03:34 -0800 (PST) MIME-Version: 1.0 References: <031901da48c7$42c37b10$c84a7130$@nextmovesoftware.com> In-Reply-To: <031901da48c7$42c37b10$c84a7130$@nextmovesoftware.com> From: Hongtao Liu Date: Wed, 17 Jan 2024 11:13:08 +0800 Message-ID: Subject: Re: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization. To: Roger Sayle Cc: gcc-patches@gcc.gnu.org, Uros Bizjak Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jan 17, 2024 at 5:59=E2=80=AFAM Roger Sayle wrote: > > > I thought I'd just missed the bug fixing season of stage3, but there > appears to a little latitude in early stage4 (for vector patches), so > I'll post this now. > > This patch resolves PR target/106060 by providing efficient methods for > materializing/synthesizing special "vector" constants on x86. Currently > there are three methods of materializing a vector constant; the most > general is to load a vector from the constant pool, secondly "duplicated" > constants can be synthesized by moving an integer between units and > broadcasting (or shuffling it), and finally the special cases of the > all-zeros vector and all-ones vectors can be loaded via a single SSE > instruction. This patch handles additional cases that can be synthesize= d > in two instructions, loading an all-ones vector followed by another SSE > instruction. Following my recent patch for PR target/112992, there's > conveniently a single place in i386-expand.cc where these special cases > can be handled. > > Two examples are given in the original bugzilla PR for 106060. > > __m256i > should_be_cmpeq_abs () > { > return _mm256_set1_epi8 (1); > } > > is now generated (with -O3 -march=3Dx86-64-v3) as: > > vpcmpeqd %ymm0, %ymm0, %ymm0 > vpabsb %ymm0, %ymm0 > ret > > and > > __m256i > should_be_cmpeq_add () > { > return _mm256_set1_epi8 (-2); > } > > is now generated as: > > vpcmpeqd %ymm0, %ymm0, %ymm0 > vpaddb %ymm0, %ymm0, %ymm0 > ret > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=3Dunix{-m32} > with no new failures. Ok for mainline? > > > 2024-01-16 Roger Sayle > > gcc/ChangeLog > PR target/106060 > * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. > (struct ix86_vec_bcast_map_simode_t): New type for table below. > (ix86_vec_bcast_map_simode): Table of SImode constants that may > be efficiently synthesized by a ix86_vec_bcast_alg method. > (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. > (ix86_vector_duplicate_simode_const): Efficiently synthesize > V4SImode and V8SImode constants that duplicate special constants. > (ix86_vector_duplicate_value): Attempt to synthesize "special" > vector constants using ix86_vector_duplicate_simode_const. > * config/i386/i386.cc (ix86_rtx_costs) : ABS of a > vector integer mode costs with a single SSE instruction. > + switch (entry->alg) + { + case VEC_BCAST_PXOR: + if (mode =3D=3D V8SImode && !TARGET_AVX2) + return false; + emit_move_insn (target, CONST0_RTX (mode)); + return true; + case VEC_BCAST_PCMPEQ: + if ((mode =3D=3D V4SImode && !TARGET_SSE2) + || (mode =3D=3D V8SImode && !TARGET_AVX2)) + return false; + emit_move_insn (target, CONSTM1_RTX (mode)); + return true; I think we need to prevent those standard_sse_constant_p getting in ix86_expand_vector_init_duplicate by below codes. /* If all values are identical, broadcast the value. */ if (all_same && (nvars !=3D 0 || !standard_sse_constant_p (gen_rtx_CONST_VECTOR (mode, XVEC (vals, 0)), mode)) && ix86_expand_vector_init_duplicate (mmx_ok, mode, target, XVECEXP (vals, 0, 0))) return; + case VEC_BCAST_PABSB: + if (mode =3D=3D V4SImode) + { + tmp1 =3D gen_reg_rtx (V16QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); + tmp2 =3D gen_reg_rtx (V16QImode); + emit_insn (gen_absv16qi2 (tmp2, tmp1)); Shouldn't it rely on TARGET_SSE2? + case VEC_BCAST_PADDB: + if (mode =3D=3D V4SImode) + { + tmp1 =3D gen_reg_rtx (V16QImode); + emit_move_insn (tmp1, CONSTM1_RTX (V16QImode)); + tmp2 =3D gen_reg_rtx (V16QImode); + emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1)); Ditto here and for all logic shift cases. + } + + if ((mode =3D=3D V4SImode || mode =3D=3D V8SImode) + && CONST_INT_P (val) + && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val))) + return true; + The alternative way is adding a pre_reload define_insn_and_split to match specific const_vector and splitt it into new instructions. In theoritically, the constant info can be retained before combine and will enable more simplication. Also the patch can be extend to V16SImode, but it can be a separate patch. > gcc/testsuite/ChangeLog > PR target/106060 > * gcc.target/i386/auto-init-8.c: Update test case. > * gcc.target/i386/avx512fp16-3.c: Likewise. > * gcc.target/i386/pr100865-9a.c: Likewise. > * gcc.target/i386/pr106060-1.c: New test case. > * gcc.target/i386/pr106060-2.c: Likewise. > * gcc.target/i386/pr106060-3.c: Likewise. > * gcc.target/i386/pr70314-3.c: Update test case. > * gcc.target/i386/vect-shiftv4qi.c: Likewise. > * gcc.target/i386/vect-shiftv8qi.c: Likewise. > > > Thanks in advance, > Roger > -- > --=20 BR, Hongtao