From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc43.google.com (mail-oo1-xc43.google.com [IPv6:2607:f8b0:4864:20::c43]) by sourceware.org (Postfix) with ESMTPS id B286E3857838 for ; Thu, 8 Oct 2020 09:08:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B286E3857838 Received: by mail-oo1-xc43.google.com with SMTP id 4so1295650ooh.11 for ; Thu, 08 Oct 2020 02:08:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MOERhF4Q4c0QXWoTxlo6y8CeGDFOAfwGMVwk3ctnZ/E=; b=SMKpjsCiXMdz9hovnrROJBF9x3x4EohGa35u1yXnGJZrlJG7DOA+1IEqz1YOcWsCLG Y8a7QnW8J82KdgOWymuLgBBG0zhNoyEVo9YEZndnfRJ4aExVQerWYl8yHyd3N5W6N1v2 UbXuZSmnD0qdiuVz+OEJVnGokOnnimxDSSEjxC67qZYQeDnkk0ZqiIuL7CJ+HYsuzdz0 dNv0NVbBZtnc8mw2z9xEMmCiCk0lunyLoQ7H5PfOWB3hipUItTldQZN3p0CcvsMvFLZi Gz6bFG3zBUzdxhHRQsNs/yFWqedZzbQM6bZEUIEUU/nEtNs3XILAuKPNgYqkCt/1A01y bO7g== X-Gm-Message-State: AOAM533yEJII6eP6/HQOx+ZragTNUcB9w+ztOuBo9nCYEMYPSNgAD5bT PbR/JeJf/MiHtddtIyBo0nFf4NIe/yKtvXBNYQFegg== X-Google-Smtp-Source: ABdhPJwImpUvS8G0PHIoVUQ7JCC3HAELbQR4U1uuBnfmGEh1e/CUwSFIDb9Y0CU4S5I8ljALzKLeiQDDNMnAhIN3BE4= X-Received: by 2002:a4a:5a06:: with SMTP id v6mr4771958ooa.22.1602148080857; Thu, 08 Oct 2020 02:08:00 -0700 (PDT) MIME-Version: 1.0 References: <1601409053-17310-1-git-send-email-christophe.lyon@linaro.org> <3d50cde7-aaa1-5d09-89e6-20e1b82ddf8f@foss.arm.com> In-Reply-To: <3d50cde7-aaa1-5d09-89e6-20e1b82ddf8f@foss.arm.com> From: Christophe Lyon Date: Thu, 8 Oct 2020 11:07:49 +0200 Message-ID: Subject: Re: [PATCH] arm: Fix multiple inheritance thunks for thumb-1 with -mpure-code To: Richard Earnshaw Cc: gcc Patches Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Oct 2020 09:08:03 -0000 On Tue, 6 Oct 2020 at 18:02, Richard Earnshaw wrote: > > On 29/09/2020 20:50, Christophe Lyon via Gcc-patches wrote: > > When mi_delta is > 255 and -mpure-code is used, we cannot load delta > > from code memory (like we do without -mpure-code). > > > > This patch builds the value of mi_delta into r3 with a series of > > movs/adds/lsls. > > > > We also do some cleanup by not emitting the function address and delta > > via .word directives at the end of the thunk since we don't use them > > with -mpure-code. > > > > No need for new testcases, this bug was already identified by > > eg. pr46287-3.C > > > > 2020-09-29 Christophe Lyon > > > > gcc/ > > * config/arm/arm.c (arm_thumb1_mi_thunk): Build mi_delta in r3 and > > do not emit function address and delta when -mpure-code is used. > Hi Richard, Thanks for your comments. > There are some optimizations you can make to this code. > > Firstly, for values between 256 and 510 (inclusive), it would be better > to just expand a mov of 255 followed by an add. I now see the splitted for the "Pe" constraint which I hadn't noticed before, so I can write something similar indeed. However, I'm note quite sure to understand the benefit in the split when -mpure-code is NOT used. Consider: int f3_1 (void) { return 510; } int f3_2 (void) { return 511; } Compile with -O2 -mcpu=cortex-m0: f3_1: movs r0, #255 lsls r0, r0, #1 bx lr f3_2: ldr r0, .L4 bx lr The splitter makes the code bigger, does it "compensate" for this by not having to load the constant? Actually the constant uses 4 more bytes, which should be taken into account when comparing code size, so f3_1 uses 6 bytes, and f3_2 uses 8, so as you say below three thumb1 instructions would be equivalent in size compared to loading from the literal pool. Should the 256-510 range be extended? > This is also true for > the literal pools alternative as well, so should be handled before all > this. I am not sure what you mean: with -mpure-code, the above sample is compiled as: f3_1: movs r0, #255 lsls r0, r0, #1 bx lr f3_2: movs r0, #1 lsls r0, r0, #8 adds r0, r0, #255 bx lr so the "return 510" case is already handled as without -mpure-code. > I also suspect (but haven't check) that the base adjustment will > most commonly be a multiple of the machine word size (ie 4). If that is > the case then you could generate n/4 and then shift it left by 2 for an > even greater range of literals. I can see there is provision for this in the !TARGET_THUMB1_ONLY case, I'll update my patch. > More generally, any sequence of up to > three thumb1 instructions will be no larger, and probably as fast as the > existing literal pool fall back. > > Secondly, if the value is, for example, 65536 (0x10000), your code will > emit a mov followed by two shift-by-8 instructions; the two shifts could > be merged into a single shift-by-16. Right, I'll try to make use of thumb_shiftable_const. > Finally, I'd really like to see some executable tests for this, if at > all possible. I mentioned pr46287-3.C, but that's not the only existing testcase that showed the problem. There are also: g++.dg/opt/thunk1.C g++.dg/ipa/pr46984.C g++.dg/torture/pr46287.C g++.dg/torture/pr45699.C Do you want that I copy one of these in the arm subdir and add -mpure-code in dg-options? Thanks, Christophe > R. > > > > > k# (use "git pull" to merge the remote branch into yours) > > --- > > gcc/config/arm/arm.c | 91 +++++++++++++++++++++++++++++++++++++--------------- > > 1 file changed, 66 insertions(+), 25 deletions(-) > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index ceeb91f..62abeb5 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -28342,9 +28342,43 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta, > > { > > if (mi_delta > 255) > > { > > - fputs ("\tldr\tr3, ", file); > > - assemble_name (file, label); > > - fputs ("+4\n", file); > > + /* With -mpure-code, we cannot load delta from the constant > > + pool: we build it explicitly. */ > > + if (target_pure_code) > > + { > > + bool mov_done_p = false; > > + int i; > > + > > + /* Emit upper 3 bytes if needed. */ > > + for (i = 0; i < 3; i++) > > + { > > + int byte = (mi_delta >> (8 * (3 - i))) & 0xff; > > + > > + if (byte) > > + { > > + if (mov_done_p) > > + asm_fprintf (file, "\tadds\tr3, #%d\n", byte); > > + else > > + asm_fprintf (file, "\tmovs\tr3, #%d\n", byte); > > + mov_done_p = true; > > + } > > + > > + if (mov_done_p) > > + asm_fprintf (file, "\tlsls\tr3, #8\n"); > > + } > > + > > + /* Emit lower byte if needed. */ > > + if (!mov_done_p) > > + asm_fprintf (file, "\tmovs\tr3, #%d\n", mi_delta & 0xff); > > + else if (mi_delta & 0xff) > > + asm_fprintf (file, "\tadds\tr3, #%d\n", mi_delta & 0xff); > > + } > > + else > > + { > > + fputs ("\tldr\tr3, ", file); > > + assemble_name (file, label); > > + fputs ("+4\n", file); > > + } > > asm_fprintf (file, "\t%ss\t%r, %r, r3\n", > > mi_op, this_regno, this_regno); > > } > > @@ -28380,30 +28414,37 @@ arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta, > > fputs ("\tpop\t{r3}\n", file); > > > > fprintf (file, "\tbx\tr12\n"); > > - ASM_OUTPUT_ALIGN (file, 2); > > - assemble_name (file, label); > > - fputs (":\n", file); > > - if (flag_pic) > > + > > + /* With -mpure-code, we don't need to emit literals for the > > + function address and delta since we emitted code to build > > + them. */ > > + if (!target_pure_code) > > { > > - /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn". */ > > - rtx tem = XEXP (DECL_RTL (function), 0); > > - /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC > > - pipeline offset is four rather than eight. Adjust the offset > > - accordingly. */ > > - tem = plus_constant (GET_MODE (tem), tem, > > - TARGET_THUMB1_ONLY ? -3 : -7); > > - tem = gen_rtx_MINUS (GET_MODE (tem), > > - tem, > > - gen_rtx_SYMBOL_REF (Pmode, > > - ggc_strdup (labelpc))); > > - assemble_integer (tem, 4, BITS_PER_WORD, 1); > > - } > > - else > > - /* Output ".word .LTHUNKn". */ > > - assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1); > > + ASM_OUTPUT_ALIGN (file, 2); > > + assemble_name (file, label); > > + fputs (":\n", file); > > + if (flag_pic) > > + { > > + /* Output ".word .LTHUNKn-[3,7]-.LTHUNKPCn". */ > > + rtx tem = XEXP (DECL_RTL (function), 0); > > + /* For TARGET_THUMB1_ONLY the thunk is in Thumb mode, so the PC > > + pipeline offset is four rather than eight. Adjust the offset > > + accordingly. */ > > + tem = plus_constant (GET_MODE (tem), tem, > > + TARGET_THUMB1_ONLY ? -3 : -7); > > + tem = gen_rtx_MINUS (GET_MODE (tem), > > + tem, > > + gen_rtx_SYMBOL_REF (Pmode, > > + ggc_strdup (labelpc))); > > + assemble_integer (tem, 4, BITS_PER_WORD, 1); > > + } > > + else > > + /* Output ".word .LTHUNKn". */ > > + assemble_integer (XEXP (DECL_RTL (function), 0), 4, BITS_PER_WORD, 1); > > > > - if (TARGET_THUMB1_ONLY && mi_delta > 255) > > - assemble_integer (GEN_INT(mi_delta), 4, BITS_PER_WORD, 1); > > + if (TARGET_THUMB1_ONLY && mi_delta > 255) > > + assemble_integer (GEN_INT(mi_delta), 4, BITS_PER_WORD, 1); > > + } > > } > > else > > { > > >