From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by sourceware.org (Postfix) with ESMTPS id 2AD103858D28 for ; Tue, 20 Sep 2022 02:24:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2AD103858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-333a4a5d495so11022747b3.10 for ; Mon, 19 Sep 2022 19:24:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=6ptZTebgZOJdDFXiSchfKC3Ey61L1jw33lkcnItNPXQ=; b=OYRNTNWZ+FF6NF5j/Ucgda8sAsWtn848AH9DhgCjWbRvrRXNbgff6HyEQAp6DbsPAF 0i8yE3YzgpTjdXZOqhubNcwC/dBJdAs2TUnbdPfvzCUbuiL0kpJ8Mp7CfrDq6g3Ldsti z8vcItUplOMCHOuwjlXuDBP/54OhQtPy8W9SLjGTmmjuI7YLkwzJSxrCNkeaWT9jfjLR Sdgvx0lAmaV3zsYw2N+7v2X8lwyL8p97qhhs6JZ/0f0ASvtcpRRl9/3jbeX9WuNogI/j HA8701wMIPb2jyaEAfSLHLJjZYapt85oYVcj5bf6h2Qjgo7RV/l+hGQmXmAAzwYrk7Zj 4yng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=6ptZTebgZOJdDFXiSchfKC3Ey61L1jw33lkcnItNPXQ=; b=6xdoHu+7ZpoFeINwgbvaF61rYwyFw6JPfHufbF2FMc04geaZ19hF/3bxHVG41qhcn9 BHLxt+TDP49D7cdgBpcF6nfKFmndMA+jIxe/EROMbzDhrE5/prKw0/Yng0yZ+hbhtvVY fYXegJiY2nMMUF/OzF+kMOxX12RJsBr/o/YYU+xV3kgbL+FbbOQmeK59Q4J8bYD8VKPm nf51O0iyMAMOHuXjgCXlSIO+Qx2Nulu9oIy1Ay0nvEKHmBQ0PttifFwIpdkYqTxJ4PgL O2HnvArSU6kWL5FksL8aZA/NM4vqdc7+8vswCbe8ctEdEYCrXUKDQEHdmnlK4y88Rcl6 D93Q== X-Gm-Message-State: ACrzQf3lNlu1hFU6Mjs98gCkSegNMxLyo4rwDZdwmvg7rte+1nE4dCdR tNtTmeyXbiV/JcYYzV245I7xgU8YnD+fSEXX2Wc= X-Google-Smtp-Source: AMsMyM6qN+qOpmGb2WnEc9wI3j5kgXsM6NPvRKJRtNc3NtMniTM8Xdbo6ykrUj88Y9h5/1L3OtQg7BpUS8ADa/ZwoNk= X-Received: by 2002:a81:46c4:0:b0:345:2b23:17d6 with SMTP id t187-20020a8146c4000000b003452b2317d6mr17754029ywa.344.1663640696414; Mon, 19 Sep 2022 19:24:56 -0700 (PDT) MIME-Version: 1.0 References: <20220916010659.37555-1-hongtao.liu@intel.com> <261569e3-d4e9-5b64-b97d-8120b49b92a9@gmail.com> In-Reply-To: From: Hongtao Liu Date: Tue, 20 Sep 2022 10:27:35 +0800 Message-ID: Subject: Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg To: Alexander Monakov Cc: Uros Bizjak , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Sep 16, 2022 at 9:38 PM Alexander Monakov via Gcc-patches wrote: > > On Fri, 16 Sep 2022, Uros Bizjak via Gcc-patches wrote: > > > On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches > > wrote: > > > > > > > > > On 9/15/22 19:06, liuhongt via Gcc-patches wrote: > > > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem, > > > > reg + test reg, reg. I don't know exact reason why gcc do this. > > > > > > > > For latest x86 processors, ciscization should help processor frontend > > > > also codesize, for processor backend, they should be the same(has same > > > > uops). > > > > > > > > So the patch deleted the peephole2, and also modify another splitter to > > > > generate more cmp mem, 0 for 32-bit target. > > > > > > > > It will help instruction fetch. > > > > > > > > for minmax-1.c minmax-2.c minmax-10, pr96891.c, it's supposed to scan there's no > > > > comparison to 1 or -1, so adjust the testcase since under 32-bit > > > > target, we now generate cmp mem, 0 instead of load + test. > > > > > > > > Similar for pr78035.c. > > > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} > > > > No performance impact for SPEC2017 on ICX/Znver3. > > > > > > > It was almost certainly for PPro/P2 given it was rth's work from > > > 1999. Probably should have been conditionalized on PPro/P2 at the > > > time. No worries losing it now... > > > > Please add a tune flag in x86-tune.def under "Historical relics" and > > use it in the relevant peephole2 instead of deleting it. > > When the next instruction after 'load mem; test reg, reg' is a conditional > branch, this disables macro-op fusion because Intel CPUs do not macro-fuse > 'cmp mem, imm; jcc'. > Oh, i didn't realize it, thanks for your reply. I'll hold on the patch until more investigation. > It would be nice to rephrase the commit message to acknowledge this (the > statement 'has same uops' is not always true with this considered). > > AMD CPUs can fuse some 'cmp mem, imm; jcc' under some conditions, so this > should be beneficial for AMD. > > Alexander -- BR, Hongtao