From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6EE133858420; Fri, 4 Nov 2022 13:40:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6EE133858420 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1667569258; bh=metG6ygsnXhWJHFHp4TU5wrcTrSJ6SYGL0dBArf031Y=; h=From:To:Subject:Date:In-Reply-To:References:From; b=jI9iyF9D2Pxht5TJDB2ar05nK067mLGYhwSGbOhjK1W8HV32P3AcBO5bZxk99k4NO xGfHSX9Xyc4r+v16nU2oM1m/Dk+iZj6bxJNxCtplCGSTcGViee0sSSfPu+VGxYN/lP XZVe555rCysaUXOxuLySXLbLq2Q4ZXUPn2AxYwys= From: "amacleod at redhat dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/55157] Missed VRP with != 0 and multiply Date: Fri, 04 Nov 2022 13:40:41 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.8.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: amacleod at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D55157 --- Comment #6 from Andrew Macleod --- (In reply to Aldy Hernandez from comment #4) >=20 > The patch below does this, but it does have a 3% penalty for VRP (though = no > penalty to overall compilation). I'm inclined to pursue this route, since > it makes nonzero mask optimization more pervasive across the board. >=20 > What do you think Andrew? >=20 1) Why wouldn't this be done in set_range_from_nonzero_bits()? That call is just above that spot in the code. Or is the name misleading and it does something else? 2) That seems expensive.. we must be doing unnecessary work. Maybe it would speed up if we checked if either the ctz or clz would cause it to do anythi= ng first. Thus avoiding creating a couple of ranges and performing a union and intersection in cases where neither the leading nor trailing bit is a zero? 3) It also seems to me that you then only need to add the zero/union iff the trailing bit has zeros. ie, if the are no trailing zeros, then just set the= lb to 0, and calculate the UB based on the clz. I should think that would speed things up a bit.=