From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) by sourceware.org (Postfix) with ESMTPS id DF0653858D38 for ; Mon, 9 Jan 2023 08:05:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DF0653858D38 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x133.google.com with SMTP id bp15so11700485lfb.13 for ; Mon, 09 Jan 2023 00:05:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ofkqcfeC/yOYZgyvyzJUNq6USdaBhCq+uL6DjcyrFa0=; b=NmeckYcc9LH1RsmKti9Yj9AKLDmSA24UcwleDlej/POHfL9LqIqjWvkvTW8MQLG2m+ 7aDnDY2PdS/RD922TUxmrTS9shMEmBrkz44MqRVWd1hRr2tkqGL0avvxf83d/qzt1Pzh hg6ERJs0NclOPrRVMBsN7zgR/bEMjqrAaFoba33xkMD1MYmZiu/iIeuPSveRsAO1EZZU 3eSIZk8hamV31/nm1gIeodof2pB7zIK10TZwDb7p52P0eRmutrmAnEoCryg28RW2lTA2 O8bvi9eSlNscF3roV3UW5HTWXBlY5kKju8YtrP7/QSyh013IHM9nIX/iuZ+YPMzqbAqK IdjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ofkqcfeC/yOYZgyvyzJUNq6USdaBhCq+uL6DjcyrFa0=; b=vlfYUrthjFUsPupPAaH/JySmfzzEpAmcOmAg/XzYp29A485DMVc/Pz+POMlivUN0Oa MWAasc2uKHHJiPZjs28/I4iLpgKaAZ6d3/KkCjDGSNLgikulalJOE/xprb9jqoqdDu0Y LSIPPeGUnQ3jCEOMSO1TE4vyMSrYJildj8+64+spib6QeUp/6omlpJQpkXKvevZ1l24n hjdzUmgSN1OroYMO0AoyDkdw+4fkmR7HyikEr1AQTSIEzLwV0JWZcsctg83eaBDMj1uN yHh3OcwFe2qkzLF8M6NOD1ptPanOIArdH/A6g6uMWx+DUyKe43GRFEha8VHYiPEzJrFA tPBQ== X-Gm-Message-State: AFqh2kreRngIm9z6Iq8b90qe1sBTeoeJ2fFcwbIFhi7x9E+BVpwEdX/S NAmvugUgtw3dulZnd1lb9Qrd0yfoeWqv5rQ4RRM= X-Google-Smtp-Source: AMrXdXvdG3W4LvP8TQCPPbqaaOG6w141Ul2+aCb9hivcSfzvdwM+iSCBQzlRwdkDXQFtc/rrplGKESfomPzrujjyy94= X-Received: by 2002:ac2:5969:0:b0:4ad:70c1:de61 with SMTP id h9-20020ac25969000000b004ad70c1de61mr2977162lfp.509.1673251539274; Mon, 09 Jan 2023 00:05:39 -0800 (PST) MIME-Version: 1.0 References: <20221208105944.660323-1-jose.marchesi@oracle.com> <87y1rixd3n.fsf@oracle.com> <87cz8ttt88.fsf@oracle.com> <87zgaysmky.fsf@oracle.com> In-Reply-To: <87zgaysmky.fsf@oracle.com> From: Richard Biener Date: Mon, 9 Jan 2023 09:05:26 +0100 Message-ID: Subject: Re: [PATCH] expr.cc: avoid unexpected side effects in expand_expr_divmod optimization To: "Jose E. Marchesi" Cc: Jakub Jelinek , gcc-patches@gcc.gnu.org, david.faust@oracle.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Jan 4, 2023 at 9:54 AM Jose E. Marchesi via Gcc-patches wrote: > > > ping. > Would this be a good approach for fixing the issue? adding the is_libcall bit enlarges rtx_def by 8 bytes - there's no room for more bits here. I really wonder how other targets avoid the issue you are pointing out? Do their assemblers prune unused (extern) .global? > > Hi Jakub. > > > >> On Thu, Dec 08, 2022 at 02:02:36PM +0100, Jose E. Marchesi wrote: > >>> So, I guess the right fix would be to call assemble_external_libcall > >>> during final? The `.global FOO' directive would be generated > >>> immediately before the call sequence, but I guess that would be ok. > >> > >> During final only if all the targets can deal with the effects of > >> assemble_external_libcall being done in the middle of emitting assembly > >> for the function. > >> > >> Otherwise, it could be e.g. done in the first loop of shorten_branches. > >> > >> Note, in calls.cc it is done only for emit_library_call_value_1 > >> and not for emit_call_1, so if we do it late, we need to be able to find > >> out what call is to a libcall and what is to a normal call. If there is > >> no way to differentiate it right now, perhaps we need some flag somewhere, > >> say on a SYMBOL_REF. And then assemble_external_libcall either only > >> if such a SYMBOL_REF appears in CALL_INSN or sibcall JUMP_INSN, or > >> perhaps anywhere in the function and its constant pool. > > > > Allright, the quick-and-dirty patch below seems to DTRT with simple > > examples. > > > > First, when libcalls are generated. Note only one .global is generated > > for all calls, and actually it is around the same position than before: > > > > $ cat foo.c > > int foo(unsigned int len, int flag) > > { > > if (flag) > > return (((long)len) * 234 / 5); > > return (((long)len) * 2 / 5); > > } > > $ cc1 -O2 foo.c > > $ cat foo.c > > .file "foo.c" > > .text > > .global __divdi3 > > .align 3 > > .global foo > > .type foo, @function > > foo: > > mov32 %r1,%r1 > > lsh %r2,32 > > jne %r2,0,.L5 > > mov %r2,5 > > lsh %r1,1 > > call __divdi3 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .L5: > > mov %r2,5 > > mul %r1,234 > > call __divdi3 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .size foo, .-foo > > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > > > Second, when libcalls are tried by expand_moddiv in a sequence, but then > > discarded and not linked in the main sequence: > > > > $ cat foo.c > > int foo(unsigned int len, int flag) > > { > > if (flag) > > return (((long)len) * 234 / 5); > > return (((long)len) * 2 / 5); > > } > > $ cc1 -O2 foo.c > > $ cat foo.c > > .file "foo.c" > > .text > > .align 3 > > .global foo > > .type foo, @function > > foo: > > mov32 %r0,%r1 > > lsh %r2,32 > > jne %r2,0,.L5 > > add %r0,%r0 > > div %r0,5 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .L5: > > mul %r0,234 > > div %r0,5 > > lsh %r0,32 > > arsh %r0,32 > > exit > > .size foo, .-foo > > .ident "GCC: (GNU) 13.0.0 20221207 (experimental)" > > > > Note the .global now is not generated, as desired. > > > > As you can see below, I am adding a new RTX flag `is_libcall', with > > written form "/l". > > > > Before I get into serious testing etc, can you please confirm whether > > this is the right approach or not? > > > > In particular, I am a little bit concerned about the expectation I am > > using that the target of the `call' instruction emitted by emit_call_1 > > is always a (MEM (SYMBOL_REF ...)) when it is passed a SYMBOL_REF as the > > first argument (`fun' in emit_library_call_value_1). > > > > Thanks. > > > > diff --git a/gcc/calls.cc b/gcc/calls.cc > > index 6dd6f73e978..6c4a3725272 100644 > > --- a/gcc/calls.cc > > +++ b/gcc/calls.cc > > @@ -4370,10 +4370,6 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, > > || argvec[i].partial != 0) > > update_stack_alignment_for_call (&argvec[i].locate); > > > > - /* If this machine requires an external definition for library > > - functions, write one out. */ > > - assemble_external_libcall (fun); > > - > > original_args_size = args_size; > > args_size.constant = (aligned_upper_bound (args_size.constant > > + stack_pointer_delta, > > @@ -4717,6 +4713,9 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value, > > valreg, > > old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far); > > > > + /* Mark the emitted call as a libcall with the new flag. */ > > + RTL_LIBCALL_P (last_call_insn ()) = 1; > > + > > if (flag_ipa_ra) > > { > > rtx datum = orgfun; > > diff --git a/gcc/final.cc b/gcc/final.cc > > index eea572238f6..df57de5afd0 100644 > > --- a/gcc/final.cc > > +++ b/gcc/final.cc > > @@ -815,6 +815,8 @@ make_pass_compute_alignments (gcc::context *ctxt) > > reorg.cc, since the branch splitting exposes new instructions with delay > > slots. */ > > > > +static rtx call_from_call_insn (rtx_call_insn *insn); > > + > > void > > shorten_branches (rtx_insn *first) > > { > > @@ -850,6 +852,24 @@ shorten_branches (rtx_insn *first) > > for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn)) > > { > > INSN_SHUID (insn) = i++; > > + > > + /* If this is a `call' instruction that implements a libcall, > > + and this machine requires an external definition for library > > + functions, write one out. */ > > + if (CALL_P (insn) && RTL_LIBCALL_P (insn)) > > + { > > + rtx nested_call = call_from_call_insn (safe_as_a (insn)); > > + rtx mem = XEXP (nested_call, 0); > > + gcc_assert (GET_CODE (mem) == MEM); > > + rtx fun = XEXP (mem, 0); > > + gcc_assert (GET_CODE (fun) == SYMBOL_REF); > > + assemble_external_libcall (fun); > > + > > + /* Clear the LIBCALL flag to make sure we don't assemble the > > + external definition more than once. */ > > + RTL_LIBCALL_P (insn) = 0; > > + } > > + > > if (INSN_P (insn)) > > continue; > > > > diff --git a/gcc/print-rtl.cc b/gcc/print-rtl.cc > > index e115f987173..26a06511619 100644 > > --- a/gcc/print-rtl.cc > > +++ b/gcc/print-rtl.cc > > @@ -882,6 +882,9 @@ rtx_writer::print_rtx (const_rtx in_rtx) > > if (RTX_FLAG (in_rtx, return_val)) > > fputs ("/i", m_outfile); > > > > + if (RTX_FLAG (in_rtx, is_libcall)) > > + fputs ("/l", m_outfile); > > + > > /* Print REG_NOTE names for EXPR_LIST and INSN_LIST. */ > > if ((GET_CODE (in_rtx) == EXPR_LIST > > || GET_CODE (in_rtx) == INSN_LIST > > diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc > > index 62c7895af60..eb0ae150a5b 100644 > > --- a/gcc/read-rtl.cc > > +++ b/gcc/read-rtl.cc > > @@ -1543,6 +1543,9 @@ read_flags (rtx return_rtx) > > case 'i': > > RTX_FLAG (return_rtx, return_val) = 1; > > break; > > + case 'l': > > + RTX_FLAG (return_rtx, is_libcall) = 1; > > + break; > > default: > > fatal_with_file_and_line ("unrecognized flag: `%c'", flag_char); > > } > > diff --git a/gcc/rtl.h b/gcc/rtl.h > > index 7a8c4709257..92c802d3876 100644 > > --- a/gcc/rtl.h > > +++ b/gcc/rtl.h > > @@ -401,6 +401,10 @@ struct GTY((desc("0"), tag("0"), > > Dumped as "/i" in RTL dumps. */ > > unsigned return_val : 1; > > > > + /* 1 in a CALL_INSN if it is a libcall. > > + Dumped as "/l" in RTL dumps. */ > > + unsigned is_libcall : 1; > > + > > union { > > /* The final union field is aligned to 64 bits on LP64 hosts, > > giving a 32-bit gap after the fields above. We optimize the > > @@ -1578,6 +1582,10 @@ jump_table_for_label (const rtx_code_label *label) > > #define RTL_PURE_CALL_P(RTX) \ > > (RTL_FLAG_CHECK1 ("RTL_PURE_CALL_P", (RTX), CALL_INSN)->return_val) > > > > +/* 1 if RTX is a libcall. */ > > +#define RTL_LIBCALL_P(RTX) \ > > + (RTL_FLAG_CHECK1 ("RTL_LIBCALL_P", (RTX), CALL_INSN)->is_libcall) > > + > > /* 1 if RTX is a call to a const or pure function. */ > > #define RTL_CONST_OR_PURE_CALL_P(RTX) \ > > (RTL_CONST_CALL_P (RTX) || RTL_PURE_CALL_P (RTX))