From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-xb2f.google.com (mail-yb1-xb2f.google.com [IPv6:2607:f8b0:4864:20::b2f]) by sourceware.org (Postfix) with ESMTPS id 42A9D382F09E for ; Fri, 16 Sep 2022 01:11:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 42A9D382F09E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yb1-xb2f.google.com with SMTP id t184so30360576yba.4 for ; Thu, 15 Sep 2022 18:11:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=T1vOwc8wxSxbJQgXImPcm2i0ocf86rt3Ll4iXvaCt08=; b=mTk20JXvfvvJ1FlCzeUdlEyIDAv2656HFLALajnPU4Yc2kQVE4CXkV926BlnAxrYBF 9uptUJ47NL4ICC7H3tDw1jx9Sgt8YHEaZUCFuKxxqXOWMR7PCqTGIBYDtyIBweot8AsM knk0d/OqT5Y4RQ3maNBrnldk2pS2CTNLwUTlgNzr91kDYP809q5kEAArGOSwxHXAo1nl duCtotJ0leU8Vg5mfxtXiyWQhyZL9CA6aP2tvq4q4CRk2R7rp3OLPopualsfD0HPXurU iEih/C8ViNBIXMZ1AHaKHnMlGBPLDRkpd6P445WLblNOyzbU+aCObsVcbJqZULfnjySV sL9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=T1vOwc8wxSxbJQgXImPcm2i0ocf86rt3Ll4iXvaCt08=; b=Q5P897Ay2G7ut5H5F63TskaS6vlTmJe9rD1R1QPBiV5EpOVo16v5id4+aid2ClrzNq R4zHlCtT2wIqd56mjhP4lNaeS2N29ppuepLlxKvPrPsWajmT+NjxfSo8kYVHhv3bOhCr OHz4kt+op2NjE380L0DEc3hfBwhtwySIdglVfBIFFCefxcrW19OYEfB13HNu13KZlaNS e8zMi/kNEZrUCEkn72UX19NsY/axiWgBsbq72iWBF1Q707ehqdGKvJut8kSmY/poj8PG 4VQFP65OuB6IsbLHtq1L3KTIdgCSg3krR+NEFflpsT30o60+0r2tKbM+QBUs40UVBNdt Xv0w== X-Gm-Message-State: ACrzQf3h07UoXXKpsb99O4CT6DjRD0LmmwxXa1RO3lRXjciA+FLQQxq9 6Y4WvksvF0tUO8byfxIysxjYMtrUXoVJdlPeX38R2b0L X-Google-Smtp-Source: AMsMyM4ebbINWkMVFcizhBj3Y5Xrx9evV5ztjJOfd0lFgA47hW1dApDWiZENkeoDR86YV2xAuleOhiH3LSe9a0GZoGs= X-Received: by 2002:a25:d686:0:b0:6a8:e9a8:54f7 with SMTP id n128-20020a25d686000000b006a8e9a854f7mr2345157ybg.611.1663290676549; Thu, 15 Sep 2022 18:11:16 -0700 (PDT) MIME-Version: 1.0 References: <20220916010659.37555-1-hongtao.liu@intel.com> In-Reply-To: <20220916010659.37555-1-hongtao.liu@intel.com> From: Hongtao Liu Date: Fri, 16 Sep 2022 09:13:51 +0800 Message-ID: Subject: Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org, "H. J. Lu" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Sep 16, 2022 at 9:09 AM liuhongt via Gcc-patches wrote: > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem, > reg + test reg, reg. I don't know exact reason why gcc do this. > > For latest x86 processors, ciscization should help processor frontend > also codesize, for processor backend, they should be the same(has same > uops). > > So the patch deleted the peephole2, and also modify another splitter to > generate more cmp mem, 0 for 32-bit target. > > It will help instruction fetch. > > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's no > comparison to 1 or -1, so adjust the testcase since under 32-bit > target, we now generate cmp mem, 0 instead of load + test. > > Similar for pr78035.c. > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > No performance impact for SPEC2017 on ICX/Znver3. > > Ok for trunk? > > gcc/ChangeLog: > > * config/i386/i386.md (*3_1): Replace > register_operand with nonimmediate_operand for operand 1. Also > force_reg it when mode is QImode. > (define_peephole2): Deleted related peephole2. > > gcc/testsuite/ChangeLog: > > * gcc.target/i386/minmax-1.c: Scan-assemble-not for cmp with 1 > or -1, also don't scan-assembler test for ia32. > * gcc.target/i386/minmax-10.c: Ditto. > * gcc.target/i386/minmax-2.c: Ditto. > * gcc.target/i386/pr78035.c: Ditto. > * gcc.target/i386/pr96861.c: Scan either cmp or test 3 times. > --- > gcc/config/i386/i386.md | 18 +++++------------- > gcc/testsuite/gcc.target/i386/minmax-1.c | 4 ++-- > gcc/testsuite/gcc.target/i386/minmax-10.c | 4 ++-- > gcc/testsuite/gcc.target/i386/minmax-2.c | 4 ++-- > gcc/testsuite/gcc.target/i386/pr78035.c | 2 +- > gcc/testsuite/gcc.target/i386/pr96861.c | 4 ++-- > 6 files changed, 14 insertions(+), 22 deletions(-) > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md > index 1be9b669909..93b905beb72 100644 > --- a/gcc/config/i386/i386.md > +++ b/gcc/config/i386/i386.md > @@ -21871,7 +21871,7 @@ (define_insn_and_split "*3_doubleword" > (define_insn_and_split "*3_1" > [(set (match_operand:SWI 0 "register_operand") > (maxmin:SWI > - (match_operand:SWI 1 "register_operand") > + (match_operand:SWI 1 "nonimmediate_operand") > (match_operand:SWI 2 "general_operand"))) > (clobber (reg:CC FLAGS_REG))] > "TARGET_CMOVE > @@ -21886,9 +21886,12 @@ (define_insn_and_split "*3_1" > { > machine_mode mode = mode; > rtx cmp_op = operands[2]; > - > operands[2] = force_reg (mode, cmp_op); > > + /* movqicc_noc only support register_operand for op1. */ > + if (mode == QImode) > + operands[1] = force_reg (mode, operands[1]); > + > enum rtx_code code = ; > > if (cmp_op == const1_rtx) > @@ -22482,17 +22485,6 @@ (define_peephole2 > [(set (match_dup 2) (match_dup 1)) > (set (match_dup 0) (match_dup 2))]) > > -;; Don't compare memory with zero, load and use a test instead. > -(define_peephole2 > - [(set (match_operand 0 "flags_reg_operand") > - (match_operator 1 "compare_operator" > - [(match_operand:SI 2 "memory_operand") > - (const_int 0)])) > - (match_scratch:SI 3 "r")] > - "optimize_insn_for_speed_p () && ix86_match_ccmode (insn, CCNOmode)" > - [(set (match_dup 3) (match_dup 2)) > - (set (match_dup 0) (match_op_dup 1 [(match_dup 3) (const_int 0)]))]) > - > ;; NOT is not pairable on Pentium, while XOR is, but one byte longer. > ;; Don't split NOTs with a displacement operand, because resulting XOR > ;; will not be pairable anyway. > diff --git a/gcc/testsuite/gcc.target/i386/minmax-1.c b/gcc/testsuite/gcc.target/i386/minmax-1.c > index 0ec35b1c5a1..840b32c5414 100644 > --- a/gcc/testsuite/gcc.target/i386/minmax-1.c > +++ b/gcc/testsuite/gcc.target/i386/minmax-1.c > @@ -1,7 +1,7 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -march=opteron -mno-stv" } */ > -/* { dg-final { scan-assembler "test" } } */ > -/* { dg-final { scan-assembler-not "cmp" } } */ > +/* { dg-final { scan-assembler "test" { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-not {(?n)cmp.*[$]+1} } } */ > #define max(a,b) (((a) > (b))? (a) : (b)) > int > t(int a) > diff --git a/gcc/testsuite/gcc.target/i386/minmax-10.c b/gcc/testsuite/gcc.target/i386/minmax-10.c > index b044462c5a9..1dd2eedf435 100644 > --- a/gcc/testsuite/gcc.target/i386/minmax-10.c > +++ b/gcc/testsuite/gcc.target/i386/minmax-10.c > @@ -34,5 +34,5 @@ unsigned int umin1(unsigned int x) > return min(x,1); > } > > -/* { dg-final { scan-assembler-times "test" 6 } } */ > -/* { dg-final { scan-assembler-not "cmp" } } */ > +/* { dg-final { scan-assembler-times "test" 6 { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-not {(?n)cmp.*1} } } */ > diff --git a/gcc/testsuite/gcc.target/i386/minmax-2.c b/gcc/testsuite/gcc.target/i386/minmax-2.c > index af9baeaaf7c..2c82f6cecb9 100644 > --- a/gcc/testsuite/gcc.target/i386/minmax-2.c > +++ b/gcc/testsuite/gcc.target/i386/minmax-2.c > @@ -1,7 +1,7 @@ > /* { dg-do compile } */ > /* { dg-options "-O2 -mno-stv" } */ > -/* { dg-final { scan-assembler "test" } } */ > -/* { dg-final { scan-assembler-not "cmp" } } */ > +/* { dg-final { scan-assembler "test" { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-not {(?n)cmp.*[$]1} } } */ > #define max(a,b) (((a) > (b))? (a) : (b)) > unsigned int > t(unsigned int a) > diff --git a/gcc/testsuite/gcc.target/i386/pr78035.c b/gcc/testsuite/gcc.target/i386/pr78035.c > index 7d3a983b218..d543d3f1d38 100644 > --- a/gcc/testsuite/gcc.target/i386/pr78035.c > +++ b/gcc/testsuite/gcc.target/i386/pr78035.c > @@ -22,4 +22,4 @@ int bar () > } > > /* We should not optimize away either comparison. */ > -/* { dg-final { scan-assembler-times "cmp" 2 } } */ > +/* { dg-final { scan-assembler-times "(?:cmp|test)" 3 } } */ > diff --git a/gcc/testsuite/gcc.target/i386/pr96861.c b/gcc/testsuite/gcc.target/i386/pr96861.c > index 7b7aeccb83c..8c0f0841f7d 100644 > --- a/gcc/testsuite/gcc.target/i386/pr96861.c > +++ b/gcc/testsuite/gcc.target/i386/pr96861.c > @@ -34,5 +34,5 @@ unsigned int umin1(unsigned int x) > return min(x,1); > } > > -/* { dg-final { scan-assembler-times "test" 6 } } */ > -/* { dg-final { scan-assembler-not "cmp" } } */ > +/* { dg-final { scan-assembler-times "test" 6 { target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-not {(?n)cmp.*[$]+1} } } */ > -- > 2.18.1 > -- BR, Hongtao