From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-x112f.google.com (mail-yw1-x112f.google.com [IPv6:2607:f8b0:4864:20::112f]) by sourceware.org (Postfix) with ESMTPS id 9880C3858D1E for ; Fri, 20 Jan 2023 15:14:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9880C3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-yw1-x112f.google.com with SMTP id 00721157ae682-501c3a414acso12035617b3.7 for ; Fri, 20 Jan 2023 07:14:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fT0SP3bKiXJvz36IF+AEtcdJtZ+MW/GKweryx3EuQrw=; b=NYOkwDeLuHSjavcjjf2HQ+xu6Kz5buakUCciarVXzo60cAGQoYoIJsWPTjISPL2pm8 JRnwI0qJmQ1Lj9IsQ99Z+tNaQpq/C+5cBbBddjqEp/6MJ2NRqpmjw/J6HRkkLOEWVXcg ewyAKmSVRnAOaqNZFD5hR5G1l2RONpK2XCiaJFkeLBL7lWx3M7weQ3A+8/4TI4Md43H2 04dgNfU+5lMnRWdjpaTTKK5mXcScZVvx9cSZ/8KU1W/xDsGHOoIJC5JINbZAAqg9r17W jRYHWOgGJsSQil7y9c20h3VlREITESKsAMh04XQLNxXs1oUOHDThnGxdW7Xx9GqXSulv 6Dtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fT0SP3bKiXJvz36IF+AEtcdJtZ+MW/GKweryx3EuQrw=; b=aZt3u6QTgYb9Tirr7cFuCewTKWyBhWeH4k2IdACcJp0IJHR31jLDRVzhX7dUKqcY2X Y7nyzzHcPhV512/8TzuFZ77i1f5S+nq+e6Sn/GfEYivsDiKJTK0fFS3Bz8vuiidIhEU/ UjbjyQpJJOP6RkVZXfyeMns2Rpt9DbLGH3Y8ubj9qf6u3k8oz8YdVBvj1iFEKECm7Ygv 6wRR8N6bgp0oRjzE3laiUoSMgwgVOMuPbwY5C9FHMc2XiWP39TtEjLpRBUjg4fxewzVG RxY1qyflzOxEQq9V3SoNQxmNSFOoDzZ5M/l0A7D9YpvID63hv9Vurloohewx/0b5uFZs FmMg== X-Gm-Message-State: AFqh2kpNozGGyfus7ZdRWJ2Ub/jn58YHMyPkjLqylWmrwyydgc97zGcw DgWeypeyjDFxtPF1o9+F0qgoiIPbaNXS4FXkqtg= X-Google-Smtp-Source: AMrXdXvdHkaNSvSYjVe/dDdP3F44aN8Zmv8xB+OmgHdg/J4afiYKZ3JEe3gfxGjobLYH+fUiwvuP0KhtfPIFTTNIz/A= X-Received: by 2002:a0d:e3c5:0:b0:501:1309:18fc with SMTP id m188-20020a0de3c5000000b00501130918fcmr200271ywe.504.1674227668822; Fri, 20 Jan 2023 07:14:28 -0800 (PST) MIME-Version: 1.0 References: <465b0cbe-73ca-f5a0-661d-d34217e29b4d.ref@yahoo.co.jp> <465b0cbe-73ca-f5a0-661d-d34217e29b4d@yahoo.co.jp> In-Reply-To: <465b0cbe-73ca-f5a0-661d-d34217e29b4d@yahoo.co.jp> From: Max Filippov Date: Fri, 20 Jan 2023 07:14:17 -0800 Message-ID: Subject: Re: [PATCH v4] xtensa: Eliminate the use of callee-saved register that saves and restores only once To: "Takayuki 'January June' Suwa" Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,FROM_LOCAL_NOVOWEL,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Suwa-san, On Wed, Jan 18, 2023 at 7:50 PM Takayuki 'January June' Suwa wrote: > > In the previous patch, if insn is JUMP_INSN or CALL_INSN, it bypasses the reg check (possibly FAIL). > > ===== > In the case of the CALL0 ABI, values that must be retained before and > after function calls are placed in the callee-saved registers (A12 > through A15) and referenced later. However, it is often the case that > the save and the reference are each only once and a simple register- > register move (the frame pointer is needed to recover the stack pointer > and must be excluded). > > e.g. in the following example, if there are no other occurrences of > register A14: > > ;; before > ; prologue { > ... > s32i.n a14, sp, 16 > ... > ; } prologue > ... > mov.n a14, a6 > ... > call0 foo > ... > mov.n a8, a14 > ... > ; epilogue { > ... > l32i.n a14, sp, 16 > ... > ; } epilogue > > It can be possible like this: > > ;; after > ; prologue { > ... > (deleted) > ... > ; } prologue > ... > s32i.n a6, sp, 16 > ... > call0 foo > ... > l32i.n a8, sp, 16 > ... > ; epilogue { > ... > (deleted) > ... > ; } epilogue > > This patch introduces a new peephole2 pattern that implements the above. > > gcc/ChangeLog: > > * config/xtensa/xtensa.md: New peephole2 pattern that eliminates > the use of callee-saved register that saves and restores only once > for other register, by using its stack slot directly. > --- > gcc/config/xtensa/xtensa.md | 62 +++++++++++++++++++++++++++++++++++++ > 1 file changed, 62 insertions(+) There are still issues with this change in the libgomp: FAIL: libgomp.c/examples-4/target-1.c execution test FAIL: libgomp.c/examples-4/target-2.c execution test They come from the following function: code produced before the change: .literal_position .literal .LC8, init@PLT .literal .LC9, 400000 .literal .LC10, 100000 .literal .LC11, -800000 .literal .LC12, 800000 .align 4 .global vec_mult_ref .type vec_mult_ref, @function vec_mult_ref: l32r a9, .LC11 addi sp, sp, -16 l32r a10, .LC9 s32i.n a12, sp, 8 s32i.n a13, sp, 4 s32i.n a0, sp, 12 add.n sp, sp, a9 add.n a12, sp, a10 l32r a9, .LC8 mov.n a13, a2 mov.n a3, sp mov.n a2, a12 callx0 a9 l32r a7, .LC10 mov.n a10, a12 mov.n a11, sp mov.n a2, a13 loop a7, .L17_LEND .L17: l32i.n a9, a10, 0 l32i.n a6, a11, 0 addi.n a10, a10, 4 mull a9, a9, a6 addi.n a11, a11, 4 s32i.n a9, a2, 0 addi.n a2, a2, 4 .L17_LEND: l32r a9, .LC12 add.n sp, sp, a9 l32i.n a0, sp, 12 l32i.n a12, sp, 8 l32i.n a13, sp, 4 addi sp, sp, 16 ret.n -------------------- with the change: .literal_position .literal .LC8, init@PLT .literal .LC9, 400000 .literal .LC10, 100000 .literal .LC11, -800000 .literal .LC12, 800000 .align 4 .global vec_mult_ref .type vec_mult_ref, @function vec_mult_ref: l32r a9, .LC11 l32r a10, .LC9 addi sp, sp, -16 s32i.n a12, sp, 8 s32i.n a0, sp, 12 add.n sp, sp, a9 add.n a12, sp, a10 l32r a9, .LC8 s32i.n a2, sp, 4 mov.n a3, sp mov.n a2, a12 callx0 a9 l32r a7, .LC10 l32i.n a2, sp, 4 mov.n a10, a12 mov.n a11, sp loop a7, .L17_LEND .L17: l32i.n a9, a10, 0 l32i.n a6, a11, 0 addi.n a10, a10, 4 mull a9, a9, a6 addi.n a11, a11, 4 s32i.n a9, a2, 0 addi.n a2, a2, 4 .L17_LEND: l32r a9, .LC12 add.n sp, sp, a9 l32i.n a0, sp, 12 l32i.n a12, sp, 8 addi sp, sp, 16 ret.n the stack pointer is modified after saving callee-saved registers, but the stack offset where a2 is stored and reloaded does not take this into an account. After having this many attempts and getting to the issues that are really hard to detect I wonder if the target backend is the right place for this optimization? -- Thanks. -- Max