From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id 41D253854149 for ; Fri, 16 Sep 2022 13:38:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 41D253854149 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=ispras.ru Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru Received: from [10.10.3.121] (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTPS id C839840755C5; Fri, 16 Sep 2022 13:38:01 +0000 (UTC) Date: Fri, 16 Sep 2022 16:38:01 +0300 (MSK) From: Alexander Monakov To: Uros Bizjak cc: Jeff Law , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg In-Reply-To: Message-ID: References: <20220916010659.37555-1-hongtao.liu@intel.com> <261569e3-d4e9-5b64-b97d-8120b49b92a9@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 16 Sep 2022, Uros Bizjak via Gcc-patches wrote: > On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches > wrote: > > > > > > On 9/15/22 19:06, liuhongt via Gcc-patches wrote: > > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem, > > > reg + test reg, reg. I don't know exact reason why gcc do this. > > > > > > For latest x86 processors, ciscization should help processor frontend > > > also codesize, for processor backend, they should be the same(has same > > > uops). > > > > > > So the patch deleted the peephole2, and also modify another splitter to > > > generate more cmp mem, 0 for 32-bit target. > > > > > > It will help instruction fetch. > > > > > > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's no > > > comparison to 1 or -1, so adjust the testcase since under 32-bit > > > target, we now generate cmp mem, 0 instead of load + test. > > > > > > Similar for pr78035.c. > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > > > No performance impact for SPEC2017 on ICX/Znver3. > > > > > It was almost certainly for PPro/P2 given it was rth's work from > > 1999. Probably should have been conditionalized on PPro/P2 at the > > time. No worries losing it now... > > Please add a tune flag in x86-tune.def under "Historical relics" and > use it in the relevant peephole2 instead of deleting it. When the next instruction after 'load mem; test reg, reg' is a conditional branch, this disables macro-op fusion because Intel CPUs do not macro-fuse 'cmp mem, imm; jcc'. It would be nice to rephrase the commit message to acknowledge this (the statement 'has same uops' is not always true with this considered). AMD CPUs can fuse some 'cmp mem, imm; jcc' under some conditions, so this should be beneficial for AMD. Alexander