From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 290D1385B50B; Mon, 16 Jan 2023 10:43:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 290D1385B50B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1673865780; bh=Vad1dIwuOpw0H/l4bGSXYu7uX7eT0z8Eba3zTbjaBT0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=CM5YSafAvQWGhjc0ReoklzBY2oncw2J66kvLfuzeYrp0kT1ftPmPHsrpdbqr14Gc0 S0Zgbo9QzJycBnBMFsrN5bdDJt9JtkgtWJiQ/FQ5j/4mL5NAqjd2xXFjMhOtxZwH75 e6QCmnK7GY7gfgbQgtpQUAJCT/pkem+fzmw5EcKE= From: "andysem at mail dot ru" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108401] gcc defeats vector constant generation with intrinsics Date: Mon, 16 Jan 2023 10:42:59 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.3.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: andysem at mail dot ru X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108401 --- Comment #7 from andysem at mail dot ru --- To be clear, I'm not asking the compiler to recognize the particular patter= n of alternating 0x00 and 0xFF bytes. Because hardcoding this particular pattern won't improve generated code in other cases. Rather, I'm asking to tune down code transformations for intrinsics. If the developer wrote a sequence of intrinsics to generate a constant then he probably wanted that sequence instead of a simple _mm_set1_epi32 or a load = from memory. But, if you're going to improve constant generation, please make it so that= it can recognize not only the particular pattern described in this bug. More importantly, it should recognize the all-ones case (as a single pcmpeq) as a starting point. Then it can apply shifts to achieve the final result from t= he all-ones vector - shifts of any width, length or direction, including psrldq/pslldq. This would improve generated code in a wider range of cases.=