From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.guenther@gmail.com>
Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com
 [IPv6:2607:f8b0:4864:20::82c])
 by sourceware.org (Postfix) with ESMTPS id 84C51395A455
 for <gcc-patches@gcc.gnu.org>; Thu,  2 Jun 2022 10:55:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 84C51395A455
Received: by mail-qt1-x82c.google.com with SMTP id j8so307663qtn.13
 for <gcc-patches@gcc.gnu.org>; Thu, 02 Jun 2022 03:55:46 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=214kwwWgl4XO4MA9OKgWnda3Laox37/F/l92lxTeiQQ=;
 b=HYJikv0UB2oFJXyIfXtdbY7HzEPi385zKeMydImSYcvh7a621Pg+gT5BYseOAaRg+c
 m5PJfTBiEucOyI3d70A9iLlcUgZZTP70CmX9JoXtvmonfNDAHunhd9Kl3MlkWg9dL/1M
 omNTIR09Dp9Xo/XHRWM1JToQ5dfrcQwiuxGz/L6MquCuZGfZECLI+9BQS3K0X79aWQIc
 ZekNqmLQsSTgii5jDI+DcgsTDXuk1rDdGo3HT5/BvYXw87X5T4I96YxeirX178OH5O8o
 E5otU2Qkp/rSD2p2YNAzk0+GgBl+canrQHvlSMeNAXDen1gqtqD3lJbBcpfbDrNjmgn1
 IfYQ==
X-Gm-Message-State: AOAM530MJGuQGQJBcpChTrbhBTDOhEcQ12oX6mG0eTanBbmOR2DHUNk+
 /chG00hSio4/KAbdH0QyiS0AxENRWlt6E8bhr1J/QhoQPc4OSg==
X-Google-Smtp-Source: ABdhPJzZBYVsOvKtF6j6nQUav2w7EeDgqRqrpwPQxMavn18d3VJdI/1ecKfSAs1Xk/oIJ6/AmDJtza0gEL2BloakUMQ=
X-Received: by 2002:ac8:5ccb:0:b0:304:baf5:1990 with SMTP id
 s11-20020ac85ccb000000b00304baf51990mr3049321qta.224.1654167345812; Thu, 02
 Jun 2022 03:55:45 -0700 (PDT)
MIME-Version: 1.0
References: <03c201d8766a$450d9a80$cf28cf80$@nextmovesoftware.com>
In-Reply-To: <03c201d8766a$450d9a80$cf28cf80$@nextmovesoftware.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Thu, 2 Jun 2022 12:55:34 +0200
Message-ID: <CAFiYyc1VqPKuL2rv3RdBaP=Q+NUVymX8n2Q_gddWZP45fJ_L8g@mail.gmail.com>
Subject: Re: [PATCH/RFC] cprop_hardreg... Third time's a charm.
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
 SPF_HELO_NONE, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jun 2022 10:55:48 -0000

On Thu, Jun 2, 2022 at 12:20 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
>
> This middle-end patch proposes the "hard register constant propagation"
>
> pass be performed up to three times on each basic block (up from the
>
> current two times) if the second pass successfully made changes.
>
>
>
> The motivation for three passes is to handle the "swap idiom" (i.e.
>
> t = x; x = y; y = t;" sequences) that get generated by register allocation
>
> (reload).
>
>
>
> Consider the x86_64 test case for __int128 addition recently discussed
>
> on gcc-patches.  With that proposed patch, the input to the cprop_hardreg
>
> pass looks like:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rsi, %rdi
>
>         movq    %r8, %rsi
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %rsi %rax
>
>         adcq    %rdi, %rdx
>
>         ret
>
>
>
> where the first three instructions effectively swap %rsi and %rdi.
>
>
>
> On the first pass of cprop_hardreg, we notice that the third insn,
>
> %rsi := %r8, is redundant and can eliminated/propagated to produce:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rsi, %rdi
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %r8 %rax
>
>         adcq    %rdi, %rdx
>
>         ret
>
>
>
> Because a successful propagation was found, cprop_hardreg then runs
>
> a second pass/sweep on affected basic blocks (using worklist), and
>
> on this second pass notices that the second instruction, %rdi := %rsi,
>
> may now be propagated (%rsi was killed in the before the first transform),
>
> and after a second pass, we now end up with:
>
>
>
>         movq    %rdi, %r8
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %r8, %rax
>
>         adcq    %rsi, %rdx
>
>         ret
>
>
>
> which is the current behaviour on mainline.  However, a third and final
>
> pass would now notice that the first insn, "%r8 := %rdi" is also now
>
> eliminable, and a third iteration would produce optimal code:
>
>
>
>         movq    %rdx, %rax
>
>         movq    %rcx, %rdx
>
>         addq    %rdi, %rax
>
>         adcq    %rsi, %rdx
>
>         ret
>
>
>
> The patch below creates an additional worklist, third_pass, that is
>
> populated with the basic block id's of blocks that were updated during
>
> the second pass.  Does the motivation for three passes (reload doesn't
>
> generate more or less than three instructions to swap a pair of registers)
>
> seem reasonable for all targets?  If cprop_hardreg is considered an
>
> expensive pass, this change could be gated based on basic block count
>
> or similar.  Finally, I should point out that this a regression fix;
>
> GCC 4.8 generated optimal code with two moves (whereas GCC 12 required
>
> 5 moves, up from GCC 11's 4 moves).
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
>
> and make -k check, both with and without --target_board=unix{-m32} with
>
> no new failures.  Thoughts?  Ok for mainline?

Can you instead refactor the code to have

auto_vec<int> worklist1, worklist2;
vec<int> *worklist = &worklist1;

and alternate between worklist1 and worklist2 N times (const int N = 2
is OK for now I guess, OTOH for -O1 we might want to stick to 1).

it might be tempting to maintain the number of times we visited a block
and just directly iterate on changed BBs.  There's also the comment
in cprop_hardreg_bb:

  /* If a block has a single predecessor, that we've already
     processed, begin with the value data that was live at
     the end of the predecessor block.  */
  /* ??? Ought to use more intelligent queuing of blocks.  */

which suggests we want to improve on its dataflow (like iterating
in RPO order so we can intersect the live value data of predecessors?)
That said, calling df_analyze () between iterations is what makes
iterating expensive I guess, as well as cprop_hardreg_debug.

Richard.

>
>
>
>
> 2022-06-02  Roger Sayle  <roger@nextmovesoftware.com>
>
>
>
> gcc/ChangeLog
>
>         * regcprop.cc (pass_cprop_hardreg::execute): Perform a third
>
>         iteration over each basic block that was updated by the second
>
>         iteration.
>
>
>
>
>
> Thanks in advance,
>
> Roger
>
> --
>
>
>