From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa2f.google.com (mail-vk1-xa2f.google.com [IPv6:2607:f8b0:4864:20::a2f]) by sourceware.org (Postfix) with ESMTPS id 8FB47385B506 for ; Mon, 9 Jan 2023 16:03:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8FB47385B506 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vk1-xa2f.google.com with SMTP id z190so4133357vka.4 for ; Mon, 09 Jan 2023 08:03:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9GYuO2HWzjMOEO/X7MMMeApXQXHpp7XG3JqXbLwGb1I=; b=mFTMTLv2EtQ3KHZqfyi/NzkAsYtgSRPJ30NVxL1kOzaCdA04cdB2Q/avl70eajW9Fx QaFifjfe5utqOzbEQcHEkyvXijBniIjtqETnnb5odcLBzZp51g3Zzb4lVwjDSgqwhSC7 BWTs5cG1LqC7ntDBLSa4ZuAX57i8Jcay+460FxbYlei0blremhzrgrL1ZWX0yzOM5XLQ EXJZRGcdcztESo2VswkZTpGr64hmNnSl9yL+cAyfQ6kLJCTE/sFGtLh8tfE48CQ/CMSB BY8JoCc122kNRuwn7LOl27K8C9wsTwxbaXCWj7OW384IFoffxe9ZaLlwXWkfeT4Z5FOy 00+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9GYuO2HWzjMOEO/X7MMMeApXQXHpp7XG3JqXbLwGb1I=; b=ssbgVqrbHMHjhUJ1g+kS64veLf42T/QQpiQ2DYUgdEtZ+SrmEM6oF4UzSUEfElzZh7 lc+4EzCt6op2oHyVnf+2zD74VKbR47GREs5unilV7jmRBMCU5b/KiYYsSbMWj6n96utc F0RGQ+0KtLo3gvYnaCBG3aXllwZ6FsbwUOhQMhRDnJQRdtJTzfdntJPCyIpGqSCeqUIZ mo+9oI9sHFsA+4eKh58fPzvStUAfKDIHbAkemiHnfZY5SyKfSxk34TlwjmoShjPx2v8a NCoS6DdlHEGqj0FiJYrkag+iCV5BhgFdhewVcwF89zP24NQktlC85MQled2NbJ5KNKL9 q2Nw== X-Gm-Message-State: AFqh2koebRsyPYNUGfk5FYm4GD94p++f5lQL06+ciS3M78vxnn2XZ5uu 9gcY3Cedan+WfOVAiC8V5Usa2Hzi1Aj+2z64X4EXnjghf20= X-Google-Smtp-Source: AMrXdXtDS4dTc1J60Dx3TX2n32U2V1OjPtnlqwnwrULvXE+M5fZkUkjCQB16Cx6gTBqbZwTfqMJSJMsIq8pv9fYFpR0= X-Received: by 2002:a1f:2ed0:0:b0:3d5:5b1c:7e9 with SMTP id u199-20020a1f2ed0000000b003d55b1c07e9mr6826410vku.40.1673280189749; Mon, 09 Jan 2023 08:03:09 -0800 (PST) MIME-Version: 1.0 References: <011401d9243b$3782ce10$a6886a30$@nextmovesoftware.com> In-Reply-To: <011401d9243b$3782ce10$a6886a30$@nextmovesoftware.com> From: Uros Bizjak Date: Mon, 9 Jan 2023 17:02:58 +0100 Message-ID: Subject: Re: [x86 PATCH] PR rtl-optimization/107991: peephole2 to tweak register allocation. To: GCC Patches Cc: Roger Sayle , Richard Sandiford Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, Jan 9, 2023 at 4:01 PM Roger Sayle wrote: > > > This patch addresses PR rtl-optimization/107991, which is a P2 regression > where GCC currently requires more "mov" instructions than GCC 7. > > The x86's two address ISA creates some interesting challenges for reload. > For example, the tricky "x = y - x" usually needs to be implemented on x86 > as > > tmp = x > x = y > x -= tmp > > where a scratch register and two mov's are required to work around > the lack of a subf (subtract from) or rsub (reverse subtract) insn. > > Not uncommonly, if y is dead after this subtraction, register allocation > can be improved by clobbering y. > > y -= x > x = y > > For the testcase in PR 107991, things are slightly more complicated, > where y is not itself dead, but is assigned from (i.e. equivalent to) > a value that is dead. Hence we have something like: > > y = z > x = y - x > > so, GCC's reload currently generates the expected shuffle (as y is live): > > y = z > tmp = x > x = y > x -= tmp > > but we can use a peephole2 that understands that y and z are equivalent, > and that z is dead, to produce the shorter sequence: > > y = z > z -= x > x = z > > In practice, for the new testcase from PR 107991, which before produced: > > foo: movl %edx, %ecx > movl %esi, %edx > movl %esi, %eax > subl %ecx, %edx > testb %dil, %dil > cmovne %edx, %eax > ret > > with this patch/peephole2 we now produce the much improved: > > foo: movl %esi, %eax > subl %edx, %esi > testb %dil, %dil > cmovne %esi, %eax > ret > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? Looking at the PR, it looks to me that Richard S (CC'd) wants to solve this issue in the register allocator. This would be preferred (compared to a very specialized peephole2), since peephole2 pass comes very late in the game, so one freed register does not contribute to lower the register pressure at all. Peephole2 should be used to clean after reload only in rare cases when target ISA prevents generic solution. From your description, a generic solution would benefit all targets with destructive subtraction (or perhaps also for other noncommutative operations). So, please coordinate with Richard S regarding this issue. Thanks, Uros. > > > 2023-01-09 Roger Sayle > > gcc/ChangeLog > PR rtl-optimization/107991 > * config/i386/i386.md (peephole2): New peephole2 to avoid register > shuffling before a subtraction, after a register-to-register move. > > gcc/testsuite/ChangeLog > PR rtl-optimization/107991 > * gcc.target/i386/pr107991.c: New test case. > > > Thanks in advance, > Roger > -- >