From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id AA7A13857B9B for ; Thu, 8 Jun 2023 05:55:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AA7A13857B9B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4906821A33; Thu, 8 Jun 2023 05:55:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686203719; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2CiR0fstvyOFDHgci2r4nfONz6XMh8szUzVh/a528+Y=; b=j8FEW2Z/pWscVlE+0Q/x2+PcmNnnlDyG7HxpOIKYYZHmRqNqvbxkz+xl8oJwCj8dQTwA0z 4mlf8CeyCTbNDdTehqtCW3AJqLnUorvatJUM8zGxDsIk7hCT1Szz5prVYafyR6kNTPvPN/ z1CVzm6XU6VaJxEqAuWNE6kCwRKWMqU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686203719; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2CiR0fstvyOFDHgci2r4nfONz6XMh8szUzVh/a528+Y=; b=Qf/dozNZG6QyupCLhVT6UrVrv1BK5cYAqKVLNcwv9xlbBiRhvGZ6ph1qOCnmTUX2Wc4EgT YKY7hTlzr/ZsK0Bg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 37A86138E6; Thu, 8 Jun 2023 05:55:19 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XWSJDUdtgWRNEAAAMHmgww (envelope-from ); Thu, 08 Jun 2023 05:55:19 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] optabs: Implement double-word ctz and ffs expansion Date: Thu, 8 Jun 2023 07:55:08 +0200 Message-Id: References: Cc: gcc-patches@gcc.gnu.org In-Reply-To: To: Jakub Jelinek X-Mailer: iPhone Mail (20F66) X-Spam-Status: No, score=-5.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 07.06.2023 um 18:59 schrieb Jakub Jelinek via Gcc-patches : >=20 > =EF=BB=BFHi! >=20 > We have expand_doubleword_clz for a couple of years, where we emit > double-word CLZ as if (high_word =3D=3D 0) return CLZ (low_word) + word_si= ze; > else return CLZ (high_word); > We can do something similar for CTZ and FFS IMHO, just with the 2 > words swapped. So if (low_word =3D=3D 0) return CTZ (high_word) + word_si= ze; > else return CTZ (low_word); for CTZ and > if (low_word =3D=3D 0) { return high_word ? FFS (high_word) + word_size : 0= ; > else return FFS (low_word); >=20 > The following patch implements that. >=20 > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard=20 > Note, on some targets which implement both word_mode ctz and ffs patterns,= > it might be better to incrementally implement those double-word ffs expans= ion > patterns in md files, because we aren't able to optimize it correctly; > nothing can detect we have just made sure that argument is not 0 and so > don't need to bother with handling that case. So, on ia32 just using > CTZ patterns would be better there, but I think we can even do better and > instead of doing the comparisons of the operands against 0 do the CTZ > expansion followed by testing of flags. >=20 > 2023-06-07 Jakub Jelinek >=20 > * optabs.cc (expand_ffs): Add forward declaration. > (expand_doubleword_clz): Rename to ... > (expand_doubleword_clz_ctz_ffs): ... this. Add UNOPTAB argument, > handle also doubleword CTZ and FFS in addition to CLZ. > (expand_unop): Adjust caller. Also call it for doubleword > ctz_optab and ffs_optab. >=20 > * gcc.target/i386/ctzll-1.c: New test. > * gcc.target/i386/ffsll-1.c: New test. >=20 > --- gcc/optabs.cc.jj 2023-06-07 09:42:14.701130305 +0200 > +++ gcc/optabs.cc 2023-06-07 14:35:04.909879272 +0200 > @@ -2697,10 +2697,14 @@ expand_clrsb_using_clz (scalar_int_mode > return temp; > } >=20 > -/* Try calculating clz of a double-word quantity as two clz's of word-siz= ed > - quantities, choosing which based on whether the high word is nonzero. = */ > +static rtx expand_ffs (scalar_int_mode, rtx, rtx); > + > +/* Try calculating clz, ctz or ffs of a double-word quantity as two clz, c= tz or > + ffs operations on word-sized quantities, choosing which based on wheth= er the > + high (for clz) or low (for ctz and ffs) word is nonzero. */ > static rtx > -expand_doubleword_clz (scalar_int_mode mode, rtx op0, rtx target) > +expand_doubleword_clz_ctz_ffs (scalar_int_mode mode, rtx op0, rtx target,= > + optab unoptab) > { > rtx xop0 =3D force_reg (mode, op0); > rtx subhi =3D gen_highpart (word_mode, xop0); > @@ -2709,6 +2713,7 @@ expand_doubleword_clz (scalar_int_mode m > rtx_code_label *after_label =3D gen_label_rtx (); > rtx_insn *seq; > rtx temp, result; > + int addend =3D 0; >=20 > /* If we were not given a target, use a word_mode register, not a > 'mode' register. The result will fit, and nobody is expecting > @@ -2721,6 +2726,9 @@ expand_doubleword_clz (scalar_int_mode m > 'target' to tag a REG_EQUAL note on. */ > result =3D gen_reg_rtx (word_mode); >=20 > + if (unoptab !=3D clz_optab) > + std::swap (subhi, sublo); > + > start_sequence (); >=20 > /* If the high word is not equal to zero, > @@ -2728,7 +2736,13 @@ expand_doubleword_clz (scalar_int_mode m > emit_cmp_and_jump_insns (subhi, CONST0_RTX (word_mode), EQ, 0, > word_mode, true, hi0_label); >=20 > - temp =3D expand_unop_direct (word_mode, clz_optab, subhi, result, true)= ; > + if (optab_handler (unoptab, word_mode) !=3D CODE_FOR_nothing) > + temp =3D expand_unop_direct (word_mode, unoptab, subhi, result, true)= ; > + else > + { > + gcc_assert (unoptab =3D=3D ffs_optab); > + temp =3D expand_ffs (word_mode, subhi, result); > + } > if (!temp) > goto fail; >=20 > @@ -2739,14 +2753,32 @@ expand_doubleword_clz (scalar_int_mode m > emit_barrier (); >=20 > /* Else clz of the full value is clz of the low word plus the number > - of bits in the high word. */ > + of bits in the high word. Similarly for ctz/ffs of the high word, > + except that ffs should be 0 when both words are zero. */ > emit_label (hi0_label); >=20 > - temp =3D expand_unop_direct (word_mode, clz_optab, sublo, 0, true); > + if (unoptab =3D=3D ffs_optab) > + { > + convert_move (result, const0_rtx, true); > + emit_cmp_and_jump_insns (sublo, CONST0_RTX (word_mode), EQ, 0, > + word_mode, true, after_label); > + } > + > + if (optab_handler (unoptab, word_mode) !=3D CODE_FOR_nothing) > + temp =3D expand_unop_direct (word_mode, unoptab, sublo, NULL_RTX, tru= e); > + else > + { > + gcc_assert (unoptab =3D=3D ffs_optab); > + temp =3D expand_unop_direct (word_mode, ctz_optab, sublo, NULL_RTX,= true); > + addend =3D 1; > + } > + > if (!temp) > goto fail; > + > temp =3D expand_binop (word_mode, add_optab, temp, > - gen_int_mode (GET_MODE_BITSIZE (word_mode), word_mode), > + gen_int_mode (GET_MODE_BITSIZE (word_mode) + addend, > + word_mode), > result, true, OPTAB_DIRECT); > if (!temp) > goto fail; > @@ -2759,7 +2791,7 @@ expand_doubleword_clz (scalar_int_mode m > seq =3D get_insns (); > end_sequence (); >=20 > - add_equal_note (seq, target, CLZ, xop0, NULL_RTX, mode); > + add_equal_note (seq, target, optab_to_code (unoptab), xop0, NULL_RTX, m= ode); > emit_insn (seq); > return target; >=20 > @@ -3252,7 +3284,8 @@ expand_unop (machine_mode mode, optab un > if (GET_MODE_SIZE (int_mode) =3D=3D 2 * UNITS_PER_WORD > && optab_handler (unoptab, word_mode) !=3D CODE_FOR_nothing) > { > - temp =3D expand_doubleword_clz (int_mode, op0, target); > + temp =3D expand_doubleword_clz_ctz_ffs (int_mode, op0, target, > + unoptab); > if (temp) > return temp; > } > @@ -3499,6 +3532,18 @@ expand_unop (machine_mode mode, optab un > if (temp) > return temp; > } > + > + if ((unoptab =3D=3D ctz_optab || unoptab =3D=3D ffs_optab) > + && optimize_insn_for_speed_p () > + && is_a (mode, &int_mode) > + && GET_MODE_SIZE (int_mode) =3D=3D 2 * UNITS_PER_WORD > + && (optab_handler (unoptab, word_mode) !=3D CODE_FOR_nothing > + || optab_handler (ctz_optab, word_mode) !=3D CODE_FOR_nothing)) > + { > + temp =3D expand_doubleword_clz_ctz_ffs (int_mode, op0, target, unop= tab); > + if (temp) > + return temp; > + } >=20 > try_libcall: > /* Now try a library call in this mode. */ > --- gcc/testsuite/gcc.target/i386/ctzll-1.c.jj 2023-06-07 14:38:58.7496= 48164 +0200 > +++ gcc/testsuite/gcc.target/i386/ctzll-1.c 2023-06-07 14:41:22.6766594= 39 +0200 > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "__ctzdi2" } } */ > + > +int > +foo (unsigned long long x) > +{ > + return __builtin_ctzll (x); > +} > --- gcc/testsuite/gcc.target/i386/ffsll-1.c.jj 2023-06-07 14:40:00.8597= 89953 +0200 > +++ gcc/testsuite/gcc.target/i386/ffsll-1.c 2023-06-07 14:41:15.1047640= 68 +0200 > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > +/* { dg-final { scan-assembler-not "__ffsdi2" } } */ > + > +int > +foo (unsigned long long x) > +{ > + return __builtin_ffsll (x); > +} >=20 > Jakub >=20