From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 38D1D3858C83 for ; Tue, 21 Feb 2023 15:39:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 38D1D3858C83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676993960; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=H0GXUU9TKYf+TwVEbz/z88s1d7P1HoY+RtxhPPnsWwQ=; b=c7o7QoRCrZIbf2r3hj0UQnAgTQEL/ghCmtOGAvc1JnxoBYD1G0/gnRIyd++KE7CJCogbOu 0Zt9iCfzz3M35fK3b19qQlYpxBoB/kWIa/JgJr4T3Rdrw644Nub9pVtJcEJ0/8YxWAKQ8f Q2g0wzyAwJ60zzmJbfwR4MAnJsf4QhY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-397-nrVUdTraM4qscSmf4Py1tw-1; Tue, 21 Feb 2023 10:39:19 -0500 X-MC-Unique: nrVUdTraM4qscSmf4Py1tw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BB2E285CBE5; Tue, 21 Feb 2023 15:39:18 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.62]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 95F51140EBF4; Tue, 21 Feb 2023 15:39:17 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 31LFdEGt2013535 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 21 Feb 2023 16:39:14 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 31LFdCXZ2013534; Tue, 21 Feb 2023 16:39:12 +0100 Date: Tue, 21 Feb 2023 16:39:12 +0100 From: Jakub Jelinek To: Jeff Law , "'Segher Boessenkool'" , Roger Sayle Cc: "'GCC Patches'" Subject: Re: [PATCH] Fix RTL simplifications of FFS, POPCOUNT and PARITY. Message-ID: Reply-To: Jakub Jelinek References: <002e01d91df9$79df2670$6d9d7350$@nextmovesoftware.com> MIME-Version: 1.0 In-Reply-To: <002e01d91df9$79df2670$6d9d7350$@nextmovesoftware.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, Jan 01, 2023 at 03:55:26PM -0000, Roger Sayle wrote: > In 2011, the rtl.texi documentation was updated to reflect that the > modes of the RTX unary operations FFS, POPCOUNT and PARITY must > match those of their operands. Unfortunately, some of the transformations > in simplify-rtx.cc predate this tightening of RTL semantics, and have > not (until now) been updated/fixed. i.e. The POPCOUNT and PARITY > optimizations were "correct" when I originally added them back in 2007. > > Segher requested that I split this piece out from a fix for PR 106594 in > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601501.html > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2023-01-01 Roger Sayle > > gcc/ChangeLog > * gcc/simplify-rtx.cc (simplify_unary_operation_1) : BTW, gcc/ prefix shouldn't be in the ChangeLog entry. > Avoid generating FFS with mismatched operand and result modes, by > using an explicit SIGN_EXTEND/ZERO_EXTEND instead. > : Likewise, for POPCOUNT of ZERO_EXTEND. > : Likewise, for PARITY of {ZERO,SIGN}_EXTEND. Can we make progress on this? I've looked with make mddump grep '(\(ffs\|popcount\|parity\):' tmp-mddump.md | grep -v '(\(ffs\|popcount\|parity\):\([A-Z0-9x]*\) (match_operand:\2' at various targets. This prints nothing at all on aarch64, arm, gcn, powerpc, rl78, visium, on x86 it prints (popcount:SI (zero_extend:SI (match_operand:HI 1 ("nonimmediate_operand") (""))))) (popcount:SI (match_dup 1))) (popcount:DI (match_dup 1))) (and:DI (subreg:DI (popcount:SI (match_dup 1)) 0) (zero_extend:DI (popcount:SI (match_dup 1)))) (popcount:SI (zero_extend:SI (match_operand:HI 1 ("nonimmediate_operand") (""))))) (popcount:SI (match_dup 0))) where the (popcount:SI (zero_extend:SI cases are *popcounthi2_1 define_insn_and_split but we have *popcounthi2_2 which has (zero_extend:SI (popcount:HI and so is ok, and I've manually checked the match_dup cases and the match_operand corresponding to match_dup has the right mode, on mips it prints (popcount:SI (truncate:SI (match_operand:DI 1 ("register_operand") ("d"))))) which is also ok, on sparc (truncate:SI (popcount:DI (match_dup 2)))) in popcountsi2 also looks ok, on s390x (popcount:DI (match_dup 2))) (popcount:DI (match_dup 2))) where I've also manually verified popcountsi2 and popcounthi2 to be ok. And then riscv, see below. Now, for the above arches, everything thus matches the rtl.texi documentation and popcount/parity/ffs match or create only matching modes between the RTL and its operand and Roger's patch seems to be IMHO the way to go, because what simplify-rtx.cc does now for those cases the patch touches is that it will always result in something that won't match because it is invalid. With the patch there is at least some chance it might match something. Now, riscv is the only target for which I have tmp-mddump.md around and which doesn't have matching modes in the machine description: (popcount:SI (unspec:VNx1BI [ (popcount:SI (unspec:VNx2BI [ (popcount:SI (unspec:VNx4BI [ (popcount:SI (unspec:VNx8BI [ (popcount:SI (unspec:VNx16BI [ (popcount:SI (unspec:VNx32BI [ (popcount:SI (unspec:VNx64BI [ (popcount:DI (unspec:VNx1BI [ (popcount:DI (unspec:VNx2BI [ (popcount:DI (unspec:VNx4BI [ (popcount:DI (unspec:VNx8BI [ (popcount:DI (unspec:VNx16BI [ (popcount:DI (unspec:VNx32BI [ (popcount:DI (unspec:VNx64BI [ (plus:SI (ffs:SI (unspec:VNx1BI [ (plus:SI (ffs:SI (unspec:VNx2BI [ (plus:SI (ffs:SI (unspec:VNx4BI [ (plus:SI (ffs:SI (unspec:VNx8BI [ (plus:SI (ffs:SI (unspec:VNx16BI [ (plus:SI (ffs:SI (unspec:VNx32BI [ (plus:SI (ffs:SI (unspec:VNx64BI [ (plus:DI (ffs:DI (unspec:VNx1BI [ (plus:DI (ffs:DI (unspec:VNx2BI [ (plus:DI (ffs:DI (unspec:VNx4BI [ (plus:DI (ffs:DI (unspec:VNx8BI [ (plus:DI (ffs:DI (unspec:VNx16BI [ (plus:DI (ffs:DI (unspec:VNx32BI [ (plus:DI (ffs:DI (unspec:VNx64BI [ While invalid against current documentation and to me it isn't clear what they actually do, sum up popcounts of all vector elements and return index of first non-zero element + ffs in it or something similar, I think Roger's patch shouldn't change anything for these either because those patterns use unspecs with vector modes, so there won't be zero_extend/sign_extend from thos to integral mode that the patch could swap around. > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index fc0d6c3..698ca6e 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -1404,22 +1404,32 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > break; > > case FFS: > - /* (ffs (*_extend )) = (ffs ) */ > + /* (ffs (*_extend )) = (*_extend (ffs )). */ > if (GET_CODE (op) == SIGN_EXTEND > || GET_CODE (op) == ZERO_EXTEND) > - return simplify_gen_unary (FFS, mode, XEXP (op, 0), > - GET_MODE (XEXP (op, 0))); > + { > + temp = simplify_gen_unary (FFS, GET_MODE (XEXP (op, 0)), > + XEXP (op, 0), GET_MODE (XEXP (op, 0))); > + return simplify_gen_unary (GET_CODE (op), mode, temp, > + GET_MODE (temp)); > + } > break; > > case POPCOUNT: > switch (GET_CODE (op)) > { > case BSWAP: > - case ZERO_EXTEND: > - /* (popcount (zero_extend )) = (popcount ) */ > + /* (popcount (bswap )) = (popcount ). */ > return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0), > GET_MODE (XEXP (op, 0))); > > + case ZERO_EXTEND: > + /* (popcount (zero_extend )) = (zero_extend (popcount )). */ > + temp = simplify_gen_unary (POPCOUNT, GET_MODE (XEXP (op, 0)), > + XEXP (op, 0), GET_MODE (XEXP (op, 0))); > + return simplify_gen_unary (ZERO_EXTEND, mode, temp, > + GET_MODE (temp)); > + > case ROTATE: > case ROTATERT: > /* Rotations don't affect popcount. */ > @@ -1438,11 +1448,16 @@ simplify_context::simplify_unary_operation_1 (rtx_code code, machine_mode mode, > { > case NOT: > case BSWAP: > - case ZERO_EXTEND: > - case SIGN_EXTEND: > return simplify_gen_unary (PARITY, mode, XEXP (op, 0), > GET_MODE (XEXP (op, 0))); > > + case ZERO_EXTEND: > + case SIGN_EXTEND: > + temp = simplify_gen_unary (PARITY, GET_MODE (XEXP (op, 0)), > + XEXP (op, 0), GET_MODE (XEXP (op, 0))); > + return simplify_gen_unary (GET_CODE (op), mode, temp, > + GET_MODE (temp)); > + > case ROTATE: > case ROTATERT: > /* Rotations don't affect parity. */ Jakub