From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D90703858C60; Wed, 22 Dec 2021 19:34:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D90703858C60 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/103797] Clang vectorized LightPixel while GCC does not Date: Wed, 22 Dec 2021 19:34:45 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: unknown X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Dec 2021 19:34:45 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103797 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org, | |uros at gcc dot gnu.org --- Comment #10 from Jakub Jelinek --- At least on your short testcase clang doesn't use divps either. We do support mulv2sf3, addv2sf3 etc. but not divv2sf3 I bet because with TARGET_MMX_WITH_SSE it would divide by zero in the 3rd and 4th elts, but perhaps we could insert 1.0f, 1.0f into those elements of the divisor before using divps? Another question is if we could teach SLP to vectorize even factors not pow= er of 2, say loads/stores could be done (and with e.g. AVX512 almost everythin= g) could be done with masked loads/stores, most arithmetics could be done norm= ally and we'd just need to watch what values we'll get in the extra elts and make sure it doesn't generate exceptions etc.=