From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B4A993858C56; Mon, 28 Mar 2022 10:15:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B4A993858C56 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/105075] [nvptx] Generate sad insn (sum of absolute differences) Date: Mon, 28 Mar 2022 10:15:40 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Mar 2022 10:15:40 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105075 --- Comment #5 from Richard Biener --- @cindex @code{ssad@var{m}} instruction pattern @item @samp{ssad@var{m}} @cindex @code{usad@var{m}} instruction pattern @item @samp{usad@var{m}} Compute the sum of absolute differences of two signed/unsigned elements. Operand 1 and operand 2 are of the same mode. Their absolute difference, wh= ich is of a wider mode, is computed and added to operand 3. Operand 3 is of a m= ode equal or wider than the mode of the absolute difference. The result is plac= ed in operand 0, which is of the same mode as operand 3. That cruically "misses" a detail for the vector case where the sum will also sum across (unspecified!) lanes when operand 3 is wider than the absolute difference and has a lower number of lanes than the input vectors. The unspecified part makes it a hart fit for pattern matching (unrolled) code when actual output lanes are used and they are not being reduced to a single scalar in the end. For scalar instruction matching the patterns should be usable. Note the SAD_EXPR on GENERIC has the same issue when vectors types are used - the exact semantics are unspecified. The same is true for DOT_PROD_EXPR and WIDEN_SUM_EXPR and a bunch of others. These days we'd go for matching them to direct internal function calls using the {u,s}sad optabs and I don't see any reason to not allow scalar modes for them. I'd rather get rid of all the tree codes we have for vectorizer reduction patterns in favor of those so if you can avoid introducing new ones or adding more uses of existing ones that would be nic= e.=