From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <roger@nextmovesoftware.com>
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 0E86D385840A
 for <gcc-patches@gcc.gnu.org>; Fri, 29 Jul 2022 06:57:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E86D385840A
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Transfer-Encoding:Content-Type:
 MIME-Version:Message-ID:Date:Subject:In-Reply-To:References:Cc:To:From:Sender
 :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From:
 Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:
 List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=UYOhiT92dR8zXbpKiqDBdY9aeb2mTZwfn5FflyUrNPo=; b=ZmCq5+Ybf7n0zPjqJP3Ni8Z1BZ
 MYOiU6BfTm5bBLSeW0dfhC+Aj8/ytpS3Wf2KGC4Dw1RYx2YUI5LquMcRL6xUH5V42/aAR35jePPwK
 dMN67nOTNSajXBhIJi6BoAUCfV2h7PzirvELP6YOz6Uvz40GPZ8lJf4WSmoacKNFfsb5pbQ/jZsav
 +bgAB+x6qw/YTqYmsQW1hQAFcJ5w9yZpUM68fj+MBeubBauyZeEsrgIdxS/MfXqCtqhKR0eGFf0WF
 96+ro6cSAoEffG50jNMXZzeocDU84W2PXgD7ChE0E/9tv9Mdu6IkOKWCN57yrxqa8WQfXRAwdcLQT
 26IAm3mQ==;
Received: from host86-169-41-119.range86-169.btcentralplus.com
 ([86.169.41.119]:62161 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <roger@nextmovesoftware.com>)
 id 1oHJwb-0005L3-BE; Fri, 29 Jul 2022 02:57:57 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Segher Boessenkool'" <segher@kernel.crashing.org>
Cc: <gcc-patches@gcc.gnu.org>
References: <009501d8a1be$b6199e20$224cda60$@nextmovesoftware.com>
 <20220727202336.GE25951@gate.crashing.org>
In-Reply-To: <20220727202336.GE25951@gate.crashing.org>
Subject: RE: [PATCH] Some additional zero-extension related optimizations in
 simplify-rtx.
Date: Fri, 29 Jul 2022 07:57:51 +0100
Message-ID: <041201d8a318$8850d110$98f27330$@nextmovesoftware.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQGqgpDoQOd+6+TzNywqEkVR4Bop4QGnqx6WrePktGA=
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_BARRACUDACENTRAL,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Jul 2022 06:57:59 -0000


Hi Segher,
 
> On Wed, Jul 27, 2022 at 02:42:25PM +0100, Roger Sayle wrote:
> > This patch implements some additional zero-extension and
> > sign-extension related optimizations in simplify-rtx.cc.  The original
> > motivation comes from PR rtl-optimization/71775, where in comment #2
> Andrew Pinski sees:
> >
> > Failed to match this instruction:
> > (set (reg:DI 88 [ _1 ])
> >     (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> >
> > On many platforms the result of DImode CTZ is constrained to be a
> > small unsigned integer (between 0 and 64), hence the truncation to
> > 32-bits (using a SUBREG) and the following sign extension back to
> > 64-bits are effectively a no-op, so the above should ideally (often)
> > be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".
> 
> And you can also do that if ctz is undefined for a zero argument!

Forgive my perhaps poor use of terminology.  The case of ctz 0 on
x64_64 isn't "undefined behaviour" (UB) in the C/C++ sense that
would allow us to do anything, but implementation defined (which
Intel calls "undefined" in their documentation).  Hence, we don't
know which DI value is placed in the result register.  In this case,
truncating to SI mode, then sign extending the result is not a no-op,
as the top bits will/must now all be the same [though admittedly to an
unknown undefined signbit].  Hence the above optimization would 
be invalid, as it doesn't guarantee the result would be sign-extended.

> > To implement this, and some closely related transformations, we build
> > upon the existing val_signbit_known_clear_p predicate.  In the first
> > chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
> > bit set,
> 
> Is that guaranteed in all cases?  Also at -O0, also for args bigger than
> 64 bits?

val_signbit_known_clear_p should work for any size/precision arg.
I'm not sure if the results are affected by -O0, but even if they are, this
will
not affect correctness only whether these optimizations are performed,
which is precisely what -O0 controls.
 
> > +      /* (sign_extend:DI (subreg:SI (ctz:DI ...))) is (ctz:DI ...).  */
> > +      if (GET_CODE (op) == SUBREG
> > +	  && subreg_lowpart_p (op)
> > +	  && GET_MODE (SUBREG_REG (op)) == mode
> > +	  && is_a <scalar_int_mode> (mode, &int_mode)
> > +	  && is_a <scalar_int_mode> (GET_MODE (op), &op_mode)
> > +	  && GET_MODE_PRECISION (int_mode) <= HOST_BITS_PER_WIDE_INT
> > +	  && GET_MODE_PRECISION (op_mode) < GET_MODE_PRECISION
> (int_mode)
> > +	  && (nonzero_bits (SUBREG_REG (op), mode)
> > +	      & ~(GET_MODE_MASK (op_mode)>>1)) == 0)
> 
> (spaces around >> please)

Doh! Good catch, thanks.

> Please use val_signbit_known_{set,clear}_p?

Alas, it's not just the SI mode's signbit that we care about, but all of the
bits above it in the DImode operand/result.  These all need to be zero,
for the operand to already be zero-extended/sign_extended.

> > +	return SUBREG_REG (op);
> 
> Also, this is not correct for C[LT]Z_DEFINED_VALUE_AT_ZERO non-zero if the
> value it returns in its second arg does not survive sign extending
unmodified (if it
> is 0xffffffff for an extend from SI to DI for example).

Fortunately, C[LT]Z_DEFINED_VALUE_AT_ZERO being defined to return a negative
result, such as -1 is already handled (accounted for) in nonzero_bits.  The
relevant
code in rtlanal.cc's nonzero_bits1 is:

    case CTZ:
      /* If CTZ has a known value at zero, then the nonzero bits are
         that value, plus the number of bits in the mode minus one.  */
      if (CTZ_DEFINED_VALUE_AT_ZERO (mode, nonzero))
        nonzero
          |= (HOST_WIDE_INT_1U << (floor_log2 (mode_width))) - 1;
      else
        nonzero = -1;
      break;

Hence, any bits set by the constant returned by the target's
DEFINED_VALUE_AT_ZERO will be set in the result of nonzero_bits.
So if this is negative, say -1, then val_signbit_known_clear_p (or the
more complex tests above) will return false.

I'm currently bootstrapping and regression testing the whitespace 
change/correction suggested above.

Thanks,
Roger
--