From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1251) id 225133858D33; Sun, 15 Oct 2023 13:55:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 225133858D33 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1697378138; bh=znkYp+g2Gbf+Img8A9Z1UIYFsXYlQXBpcmXJPokMJwk=; h=From:To:Subject:Date:From; b=WCaorB3l3kJZVbvFNFyXGsGFvmtgo2ZYlioabUES7wMmy0SVx1MVGqnDPyB4W+0wT D8tZG6gwMJuA7KKEUTpEAL+gAywBCdYtfBs1HC/d9XJU8HLNWliBfetBUY+JtfTQ3o mRA2F4nndunmQ1pSu4HODLyJNuiSlkcoCxQwbZvo= MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="utf-8" From: Roger Sayle To: gcc-cvs@gcc.gnu.org Subject: [gcc r14-4647] middle-end: Improved RTL expansion of 1LL << x. X-Act-Checkin: gcc X-Git-Author: Roger Sayle X-Git-Refname: refs/heads/master X-Git-Oldrev: 5c46cd8507041c9c029f3745017944a46fd0afe6 X-Git-Newrev: accccbf5ae45941215f7f7825f5c447171a35214 Message-Id: <20231015135538.225133858D33@sourceware.org> Date: Sun, 15 Oct 2023 13:55:38 +0000 (GMT) List-Id: https://gcc.gnu.org/g:accccbf5ae45941215f7f7825f5c447171a35214 commit r14-4647-gaccccbf5ae45941215f7f7825f5c447171a35214 Author: Roger Sayle Date: Sun Oct 15 14:53:22 2023 +0100 middle-end: Improved RTL expansion of 1LL << x. This patch improves the initial RTL expanded for double word shifts on architectures with conditional moves, so that later passes don't need to clean-up unnecessary and/or unused instructions. Consider the general case, x << y, which is expanded well as: t1 = y & 32; t2 = 0; t3 = x_lo >> 1; t4 = y ^ ~0; t5 = t3 >> t4; tmp_hi = x_hi << y; tmp_hi |= t5; tmp_lo = x_lo << y; out_hi = t1 ? tmp_lo : tmp_hi; out_lo = t1 ? t2 : tmp_lo; which is nearly optimal, the only thing that can be improved is that using a unary NOT operation "t4 = ~y" is better than XOR with -1, on targets that support it. [Note the one_cmpl_optab expander didn't fall back to XOR when this code was originally written, but has been improved since]. Now consider the relatively common idiom of 1LL << y, which currently produces the RTL equivalent of: t1 = y & 32; t2 = 0; t3 = 1 >> 1; t4 = y ^ ~0; t5 = t3 >> t4; tmp_hi = 0 << y; tmp_hi |= t5; tmp_lo = 1 << y; out_hi = t1 ? tmp_lo : tmp_hi; out_lo = t1 ? t2 : tmp_lo; Notice here that t3 is always zero, so the assignment of t5 is a variable shift of zero, which expands to a loop on many smaller targets, a similar shift by zero in the first tmp_hi assignment (another loop), that the value of t4 is no longer required (as t3 is zero), and that the ultimate value of tmp_hi is always zero. Fortunately, for many (but perhaps not all) targets this mess gets cleaned up by later optimization passes. However, this patch avoids generating unnecessary RTL at expand time, by calling simplify_expand_binop instead of expand_binop, and avoiding generating dead or unnecessary code when intermediate values are known to be zero. For the 1LL << y test case above, we now generate: t1 = y & 32; t2 = 0; tmp_hi = 0; tmp_lo = 1 << y; out_hi = t1 ? tmp_lo : tmp_hi; out_lo = t1 ? t2 : tmp_lo; On arc-elf, for example, there are 18 RTL INSN_P instructions generated by expand before this patch, but only 12 with this patch (improving both compile-time and memory usage). 2023-10-15 Roger Sayle gcc/ChangeLog * optabs.cc (expand_subword_shift): Call simplify_expand_binop instead of expand_binop. Optimize cases (i.e. avoid generating RTL) when CARRIES or INTO_INPUT is zero. Use one_cmpl_optab (i.e. NOT) instead of xor_optab with ~0 to calculate ~OP1. Diff: --- gcc/optabs.cc | 47 ++++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/gcc/optabs.cc b/gcc/optabs.cc index e1898da2280..f0a048a6bdb 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -533,15 +533,13 @@ expand_subword_shift (scalar_int_mode op1_mode, optab binoptab, has unknown behavior. Do a single shift first, then shift by the remainder. It's OK to use ~OP1 as the remainder if shift counts are truncated to the mode size. */ - carries = expand_binop (word_mode, reverse_unsigned_shift, - outof_input, const1_rtx, 0, unsignedp, methods); - if (shift_mask == BITS_PER_WORD - 1) - { - tmp = immed_wide_int_const - (wi::minus_one (GET_MODE_PRECISION (op1_mode)), op1_mode); - tmp = simplify_expand_binop (op1_mode, xor_optab, op1, tmp, - 0, true, methods); - } + carries = simplify_expand_binop (word_mode, reverse_unsigned_shift, + outof_input, const1_rtx, 0, + unsignedp, methods); + if (carries == const0_rtx) + tmp = const0_rtx; + else if (shift_mask == BITS_PER_WORD - 1) + tmp = expand_unop (op1_mode, one_cmpl_optab, op1, 0, true); else { tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, @@ -552,22 +550,29 @@ expand_subword_shift (scalar_int_mode op1_mode, optab binoptab, } if (tmp == 0 || carries == 0) return false; - carries = expand_binop (word_mode, reverse_unsigned_shift, - carries, tmp, 0, unsignedp, methods); + if (carries != const0_rtx && tmp != const0_rtx) + carries = simplify_expand_binop (word_mode, reverse_unsigned_shift, + carries, tmp, 0, unsignedp, methods); if (carries == 0) return false; - /* Shift INTO_INPUT logically by OP1. This is the last use of INTO_INPUT - so the result can go directly into INTO_TARGET if convenient. */ - tmp = expand_binop (word_mode, unsigned_shift, into_input, op1, - into_target, unsignedp, methods); - if (tmp == 0) - return false; + if (into_input != const0_rtx) + { + /* Shift INTO_INPUT logically by OP1. This is the last use of + INTO_INPUT so the result can go directly into INTO_TARGET if + convenient. */ + tmp = simplify_expand_binop (word_mode, unsigned_shift, into_input, + op1, into_target, unsignedp, methods); + if (tmp == 0) + return false; - /* Now OR in the bits carried over from OUTOF_INPUT. */ - if (!force_expand_binop (word_mode, ior_optab, tmp, carries, - into_target, unsignedp, methods)) - return false; + /* Now OR in the bits carried over from OUTOF_INPUT. */ + if (!force_expand_binop (word_mode, ior_optab, tmp, carries, + into_target, unsignedp, methods)) + return false; + } + else + emit_move_insn (into_target, carries); /* Use a standard word_mode shift for the out-of half. */ if (outof_target != 0)