From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nh604-vm8.bullet.mail.ssk.yahoo.co.jp (nh604-vm8.bullet.mail.ssk.yahoo.co.jp [182.22.90.65]) by sourceware.org (Postfix) with SMTP id 925A83858400 for ; Wed, 3 Aug 2022 11:17:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 925A83858400 Received: from [182.22.66.106] by nh604.bullet.mail.ssk.yahoo.co.jp with NNFMP; 03 Aug 2022 11:17:25 -0000 Received: from [182.22.91.207] by t604.bullet.mail.ssk.yahoo.co.jp with NNFMP; 03 Aug 2022 11:17:25 -0000 Received: from [127.0.0.1] by omp610.mail.ssk.yahoo.co.jp with NNFMP; 03 Aug 2022 11:17:25 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 37209.42446.bm@omp610.mail.ssk.yahoo.co.jp Received: (qmail 12968 invoked by alias); 3 Aug 2022 11:17:25 -0000 Received: from unknown (HELO ?192.168.2.3?) (175.177.45.188 with ) by smtp6008.mail.ssk.ynwp.yahoo.co.jp with SMTP; 3 Aug 2022 11:17:24 -0000 X-YMail-JAS: mWREDaYVM1l4S3OuR2FevuY_bkLwhjNtaGSi4dvZGNwY3mY8IXQJ3IshBCANiFZPCJvqLPb08rmq_GNRN2N.4ww3HQo.R3UaUZR9atjq8iLWUJo3ObkIQk9LsWAiYNY2t2TjzcUCag-- X-Apparently-From: X-YMail-OSG: YWBaAVMVM1kiHERoB6uhSkXT.xczVLYrhHOCTNVdwmlgtfs 5fpFBwMgyOZFeMPRm2KvIjtJvLyM8AZ6mbdzTqkVkk8FTVF3AwVMAnBc8bCP GSmfS0VFtYnmvnZyzsdT4aawoBxj7EvhUA.ZKdXTvLrklVReepZehMF3ftF3 nii2VIwSeuceRqdAJ_khDj3mYukh4Yh41YdTZhsQ08og_IobexPpXGgWxU3q BXkAaXbnOAU9jQyI6Gen8DXFHWlhg0AoSk4Za1amEW0G25BmszhDgJJXjyyJ Vf61dIYL5XDk.zGKbYBOvVwPCMUYOlq0utikt2t.cpjzm1Zlq4Hr_L5qTh.Z pVCLuQxTlz330UDkcbFvnklG3nCI1urX.8M6GI7m2mhuW7U2RLvS5loFBb06 cbidZ6Bi8GFkGEGDEMzbOWhQ1eSPmZq64NWIxzO7TbHJj5ydke6jS2cPR0G5 A3tQzyIqjbv7WyvvbGV_nISXnJ6OKbVYNPBph3nwR8sc1xWIshUBb8dlR9Ru 5JsJ.RlkAkwILKz01OsSOWODFaXJ6H9xVaZ1K0yawAGa.wg3zs48C8IR2BZz AzvVcP4_BHC7GrStIv9hd2DZt8F3anyu7u52WOwlZFYvMrPv0vJRiP0nM6RO KldHtuZQtX41Mx1UpcMaAvv1ygD8a468V6pslg9IfK4IFQe.meD_WJLj4wzm 7qGxBGg3V9IZVSYUrPRmA_Mn8pGG.K2oh_SZURSbuhhepD3yQK4j1Yl_N8ZV lEoNtzwOk.Aqq7ASzamhm41.E7hLkJXnzdYe0CLoIM00eKPmeazKgjyqKsPk wCxBqNxriwkkZnfMaD1ppGcCHAxDnOJrawEM9GtRU1cRkVG5Jks6xlDyVj4I pZX1mbmcx6LIncp019pD2LpsIwwQ51u3azClhJ4.Rd1T4_UmxsSmaKWegpb3 6b_EQmYaiJmqH Message-ID: <7e3fe210-6dbc-fc29-dbb8-b951e89cf7e9@yahoo.co.jp> Date: Wed, 3 Aug 2022 20:17:23 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.1.0 Subject: Re: [PATCH] lower-subreg, expr: Mitigate inefficiencies derived from "(clobber (reg X))" followed by "(set (subreg (reg X)) (...))" To: Richard Sandiford References: Cc: GCC Patches From: Takayuki 'January June' Suwa In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Aug 2022 11:17:31 -0000 Thanks for your response. On 2022/08/03 16:52, Richard Sandiford wrote: > Takayuki 'January June' Suwa via Gcc-patches writes: >> Emitting "(clobber (reg X))" before "(set (subreg (reg X)) (...))" keeps >> data flow consistent, but it also increases register allocation pressure >> and thus often creates many unwanted register-to-register moves that >> cannot be optimized away. > > There are two things here: > > - If emit_move_complex_parts emits a clobber of a hard register, > then that's probably a bug/misfeature. The point of the clobber is > to indicate that the register has no useful contents. That's useful > for wide pseudos that are written to in parts, since it avoids the > need to track the liveness of each part of the pseudo individually. > But it shouldn't be necessary for hard registers, since subregs of > hard registers are simplified to hard registers wherever possible > (which on most targets is "always"). > > So I think the emit_move_complex_parts clobber should be restricted > to !HARD_REGISTER_P, like the lower-subreg clobber is. If that helps > (if only partly) then it would be worth doing as its own patch. > > - I think it'd be worth looking into more detail why a clobber makes > a difference to register pressure. A clobber of a pseudo register R > shouldn't make R conflict with things that are live at the point of > the clobber. I agree with its worth. In fact, aside from other ports, on the xtensa one, RA in code with frequent D[FC]mode pseudos is terribly bad. For example, in __muldc3 on libgcc2, the size of the stack frame reserved will almost double depending on whether or not this patch is applied. > >> It seems just analogous to partial register >> stall which is a famous problem on processors that do register renaming. >> >> In my opinion, when the register to be clobbered is a composite of hard >> ones, we should clobber the individual elements separetely, otherwise >> clear the entire to zero prior to use as the "init-regs" pass does (like >> partial register stall workarounds on x86 CPUs). Such redundant zero >> constant assignments will be removed later in the "cprop_hardreg" pass. > > I don't think we should rely on the zero being optimised away later. > > Emitting the zero also makes it harder for the register allocator > to elide the move. For example, if we have: > > (set (subreg:SI (reg:DI P) 0) (reg:SI R0)) > (set (subreg:SI (reg:DI P) 4) (reg:SI R1)) > > then there is at least a chance that the RA could assign hard registers > R0:R1 to P, which would turn the moves into nops. If we emit: > > (set (reg:DI P) (const_int 0)) > > beforehand then that becomes impossible, since R0 and R1 would then > conflict with P. Ah, surely, as you pointed out for targets where "(reg: DI)" corresponds to one hard register. > > TBH I'm surprised we still run init_regs for LRA. I thought there was > a plan to stop doing that, but perhaps I misremember. Sorry I am not sure about the status of LRA... because the xtensa port is still using reload. As conclusion, trying to tweak the common code side may have been a bit premature. I'll consider if I can deal with those issues on the side of the target-specific code. > > Thanks, > Richard > >> This patch may give better output code quality for the reasons above, >> especially on architectures that don't have DFmode hard registers >> (On architectures with such hard registers, this patch changes virtually >> nothing). >> >> For example (Espressif ESP8266, Xtensa without FP hard regs): >> >> /* example */ >> double _Complex conjugate(double _Complex z) { >> __imag__(z) *= -1; >> return z; >> } >> >> ;; before >> conjugate: >> movi.n a6, -1 >> slli a6, a6, 31 >> mov.n a8, a2 >> mov.n a9, a3 >> mov.n a7, a4 >> xor a6, a5, a6 >> mov.n a2, a8 >> mov.n a3, a9 >> mov.n a4, a7 >> mov.n a5, a6 >> ret.n >> >> ;; after >> conjugate: >> movi.n a6, -1 >> slli a6, a6, 31 >> xor a6, a5, a6 >> mov.n a5, a6 >> ret.n >> >> gcc/ChangeLog: >> >> * lower-subreg.cc (resolve_simple_move): >> Add zero clear of the entire register immediately after >> the clobber. >> * expr.cc (emit_move_complex_parts): >> Change to clobber the real and imaginary parts separately >> instead of the whole complex register if possible. >> --- >> gcc/expr.cc | 26 ++++++++++++++++++++------ >> gcc/lower-subreg.cc | 7 ++++++- >> 2 files changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/gcc/expr.cc b/gcc/expr.cc >> index 80bb1b8a4c5..9732e8fd4e5 100644 >> --- a/gcc/expr.cc >> +++ b/gcc/expr.cc >> @@ -3775,15 +3775,29 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y) >> rtx_insn * >> emit_move_complex_parts (rtx x, rtx y) >> { >> - /* Show the output dies here. This is necessary for SUBREGs >> - of pseudos since we cannot track their lifetimes correctly; >> - hard regs shouldn't appear here except as return values. */ >> - if (!reload_completed && !reload_in_progress >> - && REG_P (x) && !reg_overlap_mentioned_p (x, y)) >> - emit_clobber (x); >> + rtx_insn *re_insn, *im_insn; >> >> write_complex_part (x, read_complex_part (y, false), false, true); >> + re_insn = get_last_insn (); >> write_complex_part (x, read_complex_part (y, true), true, false); >> + im_insn = get_last_insn (); >> + >> + /* Show the output dies here. This is necessary for SUBREGs >> + of pseudos since we cannot track their lifetimes correctly. */ >> + if (can_create_pseudo_p () >> + && REG_P (x) && ! reg_overlap_mentioned_p (x, y)) >> + { >> + /* Hard regs shouldn't appear here except as return values. */ >> + if (HARD_REGISTER_P (x) && REG_NREGS (x) % 2 == 0) >> + { >> + emit_insn_before (gen_clobber (SET_DEST (PATTERN (re_insn))), >> + re_insn); >> + emit_insn_before (gen_clobber (SET_DEST (PATTERN (im_insn))), >> + im_insn); >> + } >> + else >> + emit_insn_before (gen_clobber (x), re_insn); >> + } >> >> return get_last_insn (); >> } >> diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc >> index 03e9326c663..4ff0a7d1556 100644 >> --- a/gcc/lower-subreg.cc >> +++ b/gcc/lower-subreg.cc >> @@ -1086,7 +1086,12 @@ resolve_simple_move (rtx set, rtx_insn *insn) >> unsigned int i; >> >> if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest))) >> - emit_clobber (dest); >> + { >> + emit_clobber (dest); >> + /* We clear the entire of dest with zero after the clobber, >> + similar to the "init-regs" pass. */ >> + emit_move_insn (dest, CONST0_RTX (GET_MODE (dest))); >> + } >> >> for (i = 0; i < words; ++i) >> {