From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sonicconh6002-vm2.mail.ssk.yahoo.co.jp (sonicconh6002-vm2.mail.ssk.yahoo.co.jp [182.22.37.27]) by sourceware.org (Postfix) with ESMTPS id EFAEC3858D20 for ; Sat, 21 Jan 2023 04:39:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org EFAEC3858D20 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=yahoo.co.jp Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=yahoo.co.jp X-YMail-OSG: LIOkriYVM1k7TAB5a.6_p.JlG_D.84u9XGaTrtnXsDfLGTk5hD8ZXQvkPaO4emk qFg.hcsU2vaRWgIHKfCIVpyoyRiMabMg_jxPu8U2TMXrHOpV.SOay2Dx_BV79Ed2oz60G_EPlnMG j0ZUpverxKiR1ycMOovXR_UtAPBeV7qQFHBHjq56Buq1XyPsw4vJQMV4oFEVWg3XAkJIjsVjalNp 3dTA10SOwNHGCblIs8eXwk4_St4niZtuYAyAO9tvVOTFKGd4M0c3YGEELKmwvnZDT6Ahg4skE9Wh V1QdA7KHCYZwTarLngeRiPLUFxm5A7aNaNuT6EM4cX92sZQvqFPThWhpxm7.NARvdZI29nuo664r 7iiwFB5D3oYNqFUUMc9U9xyI.oQOUsINh5O6xqTtr50Hr7em5b2p1xxv4tJ1H.iPkc2.t83cN6uD MuUpz0EPsUCDOS.Wai0DaX1KktpIjMvq.G1N98xPM0t97PF5IIulnUfOIjFZmbQ3LyC5dgPLJ_1a LBajnJ80kirjodZKSlpl1vEKuzPROncqC.OIFblCFrDrRmPOk_9H5Bm87rR2VAERVpOhy7fQvVSl wUjGXEZe8_AYcZt4_PDzpyOKisPG4geoM4JPxRadOt1EA4KehVi_MmOVjc1AaoSis.yyYR_yPdGQ nfjtJxtiY91j8W9UX02g0KBZZfBihEY_3m.46V9AHlgqQTx7MBPEnKw_Xvq.Kk5lz9pqExJ_AtGk ExevtUAZoPml.4ew8.dnJ_y5sGDskFi6ToveDSmbWd8T17wXacyn83EI4fidaDvt.5R0cTONv4ct cRdLwCJruyp3ZOOW1blls1.AffVPKNOe8nRW5JZICUby_D2oZ5MxS6xiA180Fp2l1g5TKlRguen5 3DeZ0gC5NtPxPKsaxthgeDamUbOXo_euK5O.BauasVYy0ODxx0gIAMMWJAyKFSGKFbq7c4bOrMCn 9Gg-- Received: from sonicgw.mail.yahoo.co.jp by sonicconh6002.mail.ssk.yahoo.co.jp with HTTP; Sat, 21 Jan 2023 04:39:50 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1674275990; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:Subject:To:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding; bh=a8GVxLSTlGcRnsBtHPU+LebJiSry6JdIsyera6LESyk=; b=pjW+vyrPzp/O7sz/E9SsPNcb8NwOiZy66/vX9nkK3oHikpcl8+yZek2tPRIpc1iy RhYf/Vb3M6H4U6ABoMZXCaCd8ZINnWCN+fcNXdwnKwz2y7TXQ6BVdlkTq5EMRCoAFq5 PA/aiKUZaMvwAEZW/eq4bq9qhreTc5B8W9cQ1uQQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=jOVbnYvH4+2KwAJh9zXXOZ4GyZDrnjZplhVm7p2Jvj5LbqSuXLEJzfpDKbT8GDiq SNINsevJESTBXiVWMWsS92zysXy8E1VTGwmK+bYYiO9Y1rppHdYn1YALa+TPironZUO 6LwRkNz3Odtl+CF5fPx3y6bkeOLgbxrI28jddres=; Received: by smtphe6003.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID 913af0c57ef84f96e46d4f3e5499915d; Sat, 21 Jan 2023 13:39:44 +0900 (JST) Message-ID: Date: Sat, 21 Jan 2023 13:39:43 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once To: Max Filippov References: <465b0cbe-73ca-f5a0-661d-d34217e29b4d.ref@yahoo.co.jp> <465b0cbe-73ca-f5a0-661d-d34217e29b4d@yahoo.co.jp> Cc: GCC Patches From: Takayuki 'January June' Suwa In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2023/01/21 0:14, Max Filippov wrote: > Hi Suwa-san, Hi! > > On Wed, Jan 18, 2023 at 7:50 PM Takayuki 'January June' Suwa > wrote: >> >> In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL). >> >> ===== >> In the case of the CALL0 ABI, values that must be retained before and >> after function calls are placed in the callee-saved registers (A12 >> through A15) and referenced later. However, it is often the case that >> the save and the reference are each only once and a simple register- >> register move (the frame pointer is needed to recover the stack pointer >> and must be excluded). >> >> e.g. in the following example, if there are no other occurrences of >> register A14: >> >> ;; before >> ; prologue { >> ... >> s32i.n a14, sp, 16 >> ... >> ; } prologue >> ... >> mov.n a14, a6 >> ... >> call0 foo >> ... >> mov.n a8, a14 >> ... >> ; epilogue { >> ... >> l32i.n a14, sp, 16 >> ... >> ; } epilogue >> >> It can be possible like this: >> >> ;; after >> ; prologue { >> ... >> (deleted) >> ... >> ; } prologue >> ... >> s32i.n a6, sp, 16 >> ... >> call0 foo >> ... >> l32i.n a8, sp, 16 >> ... >> ; epilogue { >> ... >> (deleted) >> ... >> ; } epilogue >> >> This patch introduces a new peephole2 pattern that implements the above. >> >> gcc/ChangeLog: >> >> * config/xtensa/xtensa.md: New peephole2 pattern that eliminates >> the use of callee-saved register that saves and restores only once >> for other register, by using its stack slot directly. >> --- >> gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 62 insertions(+) > > There are still issues with this change in the libgomp: > > FAIL: libgomp.c/examples-4/target-1.c execution test > FAIL: libgomp.c/examples-4/target-2.c execution test > > They come from the following function: > > code produced before the change: > .literal_position > .literal .LC8, init@PLT > .literal .LC9, 400000 > .literal .LC10, 100000 > .literal .LC11, -800000 > .literal .LC12, 800000 > .align 4 > .global vec_mult_ref > .type vec_mult_ref, @function > vec_mult_ref: > l32r a9, .LC11 > addi sp, sp, -16 > l32r a10, .LC9 > s32i.n a12, sp, 8 > s32i.n a13, sp, 4 > s32i.n a0, sp, 12 > add.n sp, sp, a9 > add.n a12, sp, a10 > l32r a9, .LC8 > mov.n a13, a2 > mov.n a3, sp > mov.n a2, a12 > callx0 a9 > l32r a7, .LC10 > mov.n a10, a12 > mov.n a11, sp > mov.n a2, a13 > loop a7, .L17_LEND > .L17: > l32i.n a9, a10, 0 > l32i.n a6, a11, 0 > addi.n a10, a10, 4 > mull a9, a9, a6 > addi.n a11, a11, 4 > s32i.n a9, a2, 0 > addi.n a2, a2, 4 > .L17_LEND: > l32r a9, .LC12 > add.n sp, sp, a9 > l32i.n a0, sp, 12 > l32i.n a12, sp, 8 > l32i.n a13, sp, 4 > addi sp, sp, 16 > ret.n > > -------------------- > > with the change: > .literal_position > .literal .LC8, init@PLT > .literal .LC9, 400000 > .literal .LC10, 100000 > .literal .LC11, -800000 > .literal .LC12, 800000 > .align 4 > .global vec_mult_ref > .type vec_mult_ref, @function > vec_mult_ref: > l32r a9, .LC11 > l32r a10, .LC9 > addi sp, sp, -16 > s32i.n a12, sp, 8 > s32i.n a0, sp, 12 > add.n sp, sp, a9 > add.n a12, sp, a10 > l32r a9, .LC8 > s32i.n a2, sp, 4 > mov.n a3, sp > mov.n a2, a12 > callx0 a9 > l32r a7, .LC10 > l32i.n a2, sp, 4 > mov.n a10, a12 > mov.n a11, sp > loop a7, .L17_LEND > .L17: > l32i.n a9, a10, 0 > l32i.n a6, a11, 0 > addi.n a10, a10, 4 > mull a9, a9, a6 > addi.n a11, a11, 4 > s32i.n a9, a2, 0 > addi.n a2, a2, 4 > .L17_LEND: > l32r a9, .LC12 > add.n sp, sp, a9 > l32i.n a0, sp, 12 > l32i.n a12, sp, 8 > addi sp, sp, 16 > ret.n > > the stack pointer is modified after saving callee-saved registers, > but the stack offset where a2 is stored and reloaded does not take > this into an account. > > After having this many attempts and getting to the issues that are > really hard to detect I wonder if the target backend is the right place > for this optimization? > I guess they are not hard to detect but just issues I didn't anticipate (and I just need a little more work). And where else should it be done? What about implementing a target-specific pass just for one-point optimization?