public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V2] Emit funcall external declarations only if actually used.
@ 2023-08-21 18:07 Jose E. Marchesi
  2023-08-30  8:12 ` Jose E. Marchesi
  0 siblings, 1 reply; 10+ messages in thread
From: Jose E. Marchesi @ 2023-08-21 18:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek

[Differences from V1:
- Prototype for call_from_call_insn moved before comment block.
- Reuse the `call' flag for SYMBOL_REF_LIBCALL.
- Fallback to check REG_CALL_DECL in non-direct calls.
- New test to check correct behavior for non-direct calls.]

There are many places in GCC where alternative local sequences are
tried in order to determine what is the cheapest or best alternative
to use in the current target.  When any of these sequences involve a
libcall, the current implementation of emit_library_call_value_1
introduce a side-effect consisting on emitting an external declaration
for the funcall (such as __divdi3) which is thus emitted even if the
sequence that does the libcall is not retained.

This is problematic in targets such as BPF, because the kernel loader
chokes on the spurious symbol __divdi3 and makes the resulting BPF
object unloadable.  Note that BPF objects are not linked before being
loaded.

This patch changes emit_library_call_value_1 to mark the target
SYMBOL_REF as a libcall.  Then, the emission of the external
declaration is done in the first loop of final.cc:shorten_branches.
This happens only if the corresponding sequence has been kept.

Regtested in x86_64-linux-gnu.
Tested with host x86_64-linux-gnu with target bpf-unknown-none.

gcc/ChangeLog

	* rtl.h	(SYMBOL_REF_LIBCALL): Define.
	* calls.cc (emit_library_call_value_1): Do not emit external
	libcall declaration here.
	* final.cc (shorten_branches): Do it here.

gcc/testsuite/ChangeLog

	* gcc.target/bpf/divmod-libcall-1.c: New test.
	* gcc.target/bpf/divmod-libcall-2.c: Likewise.
	* gcc.c-torture/compile/libcall-2.c: Likewise.
---
 gcc/calls.cc                                  |  9 +++---
 gcc/final.cc                                  | 30 +++++++++++++++++++
 gcc/rtl.h                                     |  5 ++++
 .../gcc.c-torture/compile/libcall-2.c         |  8 +++++
 .../gcc.target/bpf/divmod-libcall-1.c         | 19 ++++++++++++
 .../gcc.target/bpf/divmod-libcall-2.c         | 16 ++++++++++
 6 files changed, 83 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 1f3a6d5c450..219ea599b16 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 	|| argvec[i].partial != 0)
       update_stack_alignment_for_call (&argvec[i].locate);
 
-  /* If this machine requires an external definition for library
-     functions, write one out.  */
-  assemble_external_libcall (fun);
+  /* Mark the emitted target as a libcall.  This will be used by final
+     in order to emit an external symbol declaration if the libcall is
+     ever used.  */
+  SYMBOL_REF_LIBCALL (fun) = 1;
 
   original_args_size = args_size;
   args_size.constant = (aligned_upper_bound (args_size.constant
@@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
 	       valreg,
 	       old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
 
-  if (flag_ipa_ra)
+  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
     {
       rtx datum = orgfun;
       gcc_assert (GET_CODE (datum) == SYMBOL_REF);
diff --git a/gcc/final.cc b/gcc/final.cc
index dd3e22547ac..2041e43fdd1 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
 }
 
 \f
+static rtx call_from_call_insn (rtx_call_insn *insn);
+
 /* Make a pass over all insns and compute their actual lengths by shortening
    any branches of variable length if possible.  */
 
@@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
   for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
     {
       INSN_SHUID (insn) = i++;
+
+      /* If this is a `call' instruction implementing a libcall, and
+         this machine requires an external definition for library
+         functions, write one out.  */
+      if (CALL_P (insn))
+        {
+          rtx x;
+
+          if ((x = call_from_call_insn (dyn_cast <rtx_call_insn *> (insn)))
+              && (x = XEXP (x, 0))
+              && MEM_P (x)
+              && (x = XEXP (x, 0))
+              && SYMBOL_REF_P (x)
+              && SYMBOL_REF_LIBCALL (x))
+            {
+              /* Direct call.  */
+              assemble_external_libcall (x);
+            }
+          else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
+                   && (x = XEXP (x, 0)))
+            {
+              /* Indirect call with REG_CALL_DECL note.  */
+              gcc_assert (SYMBOL_REF_P (x));
+              if (SYMBOL_REF_LIBCALL (x))
+                assemble_external_libcall (x);
+            }
+        }
+
       if (INSN_P (insn))
 	continue;
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e1c51156f90..28be708a55f 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -334,6 +334,7 @@ struct GTY((desc("0"), tag("0"),
      1 in a CALL_INSN logically equivalent to
        ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
      1 in a VALUE is SP_DERIVED_VALUE_P in cselib.cc.
+     1 in a SYMBOL_REF if it is the target of a libcall.
      Dumped as "/c" in RTL dumps.  */
   unsigned int call : 1;
   /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
@@ -2734,6 +2735,10 @@ do {								        \
 #define SYMBOL_REF_USED(RTX)						\
   (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
 
+/* 1 if RTX is a symbol_ref that represents a libcall target.  */
+#define SYMBOL_REF_LIBCALL(RTX)                                         \
+  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->call)
+
 /* 1 if RTX is a symbol_ref for a weak symbol.  */
 #define SYMBOL_REF_WEAK(RTX)						\
   (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
new file mode 100644
index 00000000000..b33944c83ff
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
@@ -0,0 +1,8 @@
+/* Make sure that external refences for libcalls are generated even for
+   indirect calls.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
+/* { dg-final { scan-assembler "globl\t__divti3" } } */
+
+__int128 a, b; void foo () { a = a / b; }
diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
new file mode 100644
index 00000000000..7481076602a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
@@ -0,0 +1,19 @@
+/* This test makes sure that no spurious external symbol declarations are
+   emitted for libcalls in tried but eventually not used code sequences.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=v3" } */
+/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
+/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
+
+int
+foo (unsigned int len)
+{
+  return ((unsigned long)len) * 234 / 5;
+}
+
+int
+bar (unsigned int len)
+{
+  return ((unsigned long)len) * 234 % 5;
+}
diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
new file mode 100644
index 00000000000..792d689395a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=v3" } */
+/* { dg-final { scan-assembler "global\t__divdi3" } } */
+/* { dg-final { scan-assembler "global\t__moddi3" } } */
+
+int
+foo (unsigned int len)
+{
+  return ((long)len) * 234 / 5;
+}
+
+int
+bar (unsigned int len)
+{
+  return ((long)len) * 234 % 5;
+}
-- 
2.30.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-08-21 18:07 [PATCH V2] Emit funcall external declarations only if actually used Jose E. Marchesi
@ 2023-08-30  8:12 ` Jose E. Marchesi
  2023-09-05 13:03   ` Jose E. Marchesi
  2023-10-03 10:39   ` Jose E. Marchesi
  0 siblings, 2 replies; 10+ messages in thread
From: Jose E. Marchesi @ 2023-08-30  8:12 UTC (permalink / raw)
  To: Jose E. Marchesi via Gcc-patches; +Cc: Jakub Jelinek


ping

> [Differences from V1:
> - Prototype for call_from_call_insn moved before comment block.
> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
> - Fallback to check REG_CALL_DECL in non-direct calls.
> - New test to check correct behavior for non-direct calls.]
>
> There are many places in GCC where alternative local sequences are
> tried in order to determine what is the cheapest or best alternative
> to use in the current target.  When any of these sequences involve a
> libcall, the current implementation of emit_library_call_value_1
> introduce a side-effect consisting on emitting an external declaration
> for the funcall (such as __divdi3) which is thus emitted even if the
> sequence that does the libcall is not retained.
>
> This is problematic in targets such as BPF, because the kernel loader
> chokes on the spurious symbol __divdi3 and makes the resulting BPF
> object unloadable.  Note that BPF objects are not linked before being
> loaded.
>
> This patch changes emit_library_call_value_1 to mark the target
> SYMBOL_REF as a libcall.  Then, the emission of the external
> declaration is done in the first loop of final.cc:shorten_branches.
> This happens only if the corresponding sequence has been kept.
>
> Regtested in x86_64-linux-gnu.
> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>
> gcc/ChangeLog
>
> 	* rtl.h	(SYMBOL_REF_LIBCALL): Define.
> 	* calls.cc (emit_library_call_value_1): Do not emit external
> 	libcall declaration here.
> 	* final.cc (shorten_branches): Do it here.
>
> gcc/testsuite/ChangeLog
>
> 	* gcc.target/bpf/divmod-libcall-1.c: New test.
> 	* gcc.target/bpf/divmod-libcall-2.c: Likewise.
> 	* gcc.c-torture/compile/libcall-2.c: Likewise.
> ---
>  gcc/calls.cc                                  |  9 +++---
>  gcc/final.cc                                  | 30 +++++++++++++++++++
>  gcc/rtl.h                                     |  5 ++++
>  .../gcc.c-torture/compile/libcall-2.c         |  8 +++++
>  .../gcc.target/bpf/divmod-libcall-1.c         | 19 ++++++++++++
>  .../gcc.target/bpf/divmod-libcall-2.c         | 16 ++++++++++
>  6 files changed, 83 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 1f3a6d5c450..219ea599b16 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>  	|| argvec[i].partial != 0)
>        update_stack_alignment_for_call (&argvec[i].locate);
>  
> -  /* If this machine requires an external definition for library
> -     functions, write one out.  */
> -  assemble_external_libcall (fun);
> +  /* Mark the emitted target as a libcall.  This will be used by final
> +     in order to emit an external symbol declaration if the libcall is
> +     ever used.  */
> +  SYMBOL_REF_LIBCALL (fun) = 1;
>  
>    original_args_size = args_size;
>    args_size.constant = (aligned_upper_bound (args_size.constant
> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>  	       valreg,
>  	       old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>  
> -  if (flag_ipa_ra)
> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>      {
>        rtx datum = orgfun;
>        gcc_assert (GET_CODE (datum) == SYMBOL_REF);
> diff --git a/gcc/final.cc b/gcc/final.cc
> index dd3e22547ac..2041e43fdd1 100644
> --- a/gcc/final.cc
> +++ b/gcc/final.cc
> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>  }
>  
>  \f
> +static rtx call_from_call_insn (rtx_call_insn *insn);
> +
>  /* Make a pass over all insns and compute their actual lengths by shortening
>     any branches of variable length if possible.  */
>  
> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>    for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>      {
>        INSN_SHUID (insn) = i++;
> +
> +      /* If this is a `call' instruction implementing a libcall, and
> +         this machine requires an external definition for library
> +         functions, write one out.  */
> +      if (CALL_P (insn))
> +        {
> +          rtx x;
> +
> +          if ((x = call_from_call_insn (dyn_cast <rtx_call_insn *> (insn)))
> +              && (x = XEXP (x, 0))
> +              && MEM_P (x)
> +              && (x = XEXP (x, 0))
> +              && SYMBOL_REF_P (x)
> +              && SYMBOL_REF_LIBCALL (x))
> +            {
> +              /* Direct call.  */
> +              assemble_external_libcall (x);
> +            }
> +          else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
> +                   && (x = XEXP (x, 0)))
> +            {
> +              /* Indirect call with REG_CALL_DECL note.  */
> +              gcc_assert (SYMBOL_REF_P (x));
> +              if (SYMBOL_REF_LIBCALL (x))
> +                assemble_external_libcall (x);
> +            }
> +        }
> +
>        if (INSN_P (insn))
>  	continue;
>  
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index e1c51156f90..28be708a55f 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -334,6 +334,7 @@ struct GTY((desc("0"), tag("0"),
>       1 in a CALL_INSN logically equivalent to
>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
>       1 in a VALUE is SP_DERIVED_VALUE_P in cselib.cc.
> +     1 in a SYMBOL_REF if it is the target of a libcall.
>       Dumped as "/c" in RTL dumps.  */
>    unsigned int call : 1;
>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
> @@ -2734,6 +2735,10 @@ do {								        \
>  #define SYMBOL_REF_USED(RTX)						\
>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
>  
> +/* 1 if RTX is a symbol_ref that represents a libcall target.  */
> +#define SYMBOL_REF_LIBCALL(RTX)                                         \
> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->call)
> +
>  /* 1 if RTX is a symbol_ref for a weak symbol.  */
>  #define SYMBOL_REF_WEAK(RTX)						\
>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
> new file mode 100644
> index 00000000000..b33944c83ff
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
> @@ -0,0 +1,8 @@
> +/* Make sure that external refences for libcalls are generated even for
> +   indirect calls.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
> +/* { dg-final { scan-assembler "globl\t__divti3" } } */
> +
> +__int128 a, b; void foo () { a = a / b; }
> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
> new file mode 100644
> index 00000000000..7481076602a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
> @@ -0,0 +1,19 @@
> +/* This test makes sure that no spurious external symbol declarations are
> +   emitted for libcalls in tried but eventually not used code sequences.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcpu=v3" } */
> +/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
> +/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
> +
> +int
> +foo (unsigned int len)
> +{
> +  return ((unsigned long)len) * 234 / 5;
> +}
> +
> +int
> +bar (unsigned int len)
> +{
> +  return ((unsigned long)len) * 234 % 5;
> +}
> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
> new file mode 100644
> index 00000000000..792d689395a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mcpu=v3" } */
> +/* { dg-final { scan-assembler "global\t__divdi3" } } */
> +/* { dg-final { scan-assembler "global\t__moddi3" } } */
> +
> +int
> +foo (unsigned int len)
> +{
> +  return ((long)len) * 234 / 5;
> +}
> +
> +int
> +bar (unsigned int len)
> +{
> +  return ((long)len) * 234 % 5;
> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-08-30  8:12 ` Jose E. Marchesi
@ 2023-09-05 13:03   ` Jose E. Marchesi
  2023-10-03 10:39   ` Jose E. Marchesi
  1 sibling, 0 replies; 10+ messages in thread
From: Jose E. Marchesi @ 2023-09-05 13:03 UTC (permalink / raw)
  To: Jose E. Marchesi via Gcc-patches; +Cc: Jakub Jelinek


ping^

> ping
>
>> [Differences from V1:
>> - Prototype for call_from_call_insn moved before comment block.
>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>> - Fallback to check REG_CALL_DECL in non-direct calls.
>> - New test to check correct behavior for non-direct calls.]
>>
>> There are many places in GCC where alternative local sequences are
>> tried in order to determine what is the cheapest or best alternative
>> to use in the current target.  When any of these sequences involve a
>> libcall, the current implementation of emit_library_call_value_1
>> introduce a side-effect consisting on emitting an external declaration
>> for the funcall (such as __divdi3) which is thus emitted even if the
>> sequence that does the libcall is not retained.
>>
>> This is problematic in targets such as BPF, because the kernel loader
>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>> object unloadable.  Note that BPF objects are not linked before being
>> loaded.
>>
>> This patch changes emit_library_call_value_1 to mark the target
>> SYMBOL_REF as a libcall.  Then, the emission of the external
>> declaration is done in the first loop of final.cc:shorten_branches.
>> This happens only if the corresponding sequence has been kept.
>>
>> Regtested in x86_64-linux-gnu.
>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>
>> gcc/ChangeLog
>>
>> 	* rtl.h	(SYMBOL_REF_LIBCALL): Define.
>> 	* calls.cc (emit_library_call_value_1): Do not emit external
>> 	libcall declaration here.
>> 	* final.cc (shorten_branches): Do it here.
>>
>> gcc/testsuite/ChangeLog
>>
>> 	* gcc.target/bpf/divmod-libcall-1.c: New test.
>> 	* gcc.target/bpf/divmod-libcall-2.c: Likewise.
>> 	* gcc.c-torture/compile/libcall-2.c: Likewise.
>> ---
>>  gcc/calls.cc                                  |  9 +++---
>>  gcc/final.cc                                  | 30 +++++++++++++++++++
>>  gcc/rtl.h                                     |  5 ++++
>>  .../gcc.c-torture/compile/libcall-2.c         |  8 +++++
>>  .../gcc.target/bpf/divmod-libcall-1.c         | 19 ++++++++++++
>>  .../gcc.target/bpf/divmod-libcall-2.c         | 16 ++++++++++
>>  6 files changed, 83 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>
>> diff --git a/gcc/calls.cc b/gcc/calls.cc
>> index 1f3a6d5c450..219ea599b16 100644
>> --- a/gcc/calls.cc
>> +++ b/gcc/calls.cc
>> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>  	|| argvec[i].partial != 0)
>>        update_stack_alignment_for_call (&argvec[i].locate);
>>  
>> -  /* If this machine requires an external definition for library
>> -     functions, write one out.  */
>> -  assemble_external_libcall (fun);
>> +  /* Mark the emitted target as a libcall.  This will be used by final
>> +     in order to emit an external symbol declaration if the libcall is
>> +     ever used.  */
>> +  SYMBOL_REF_LIBCALL (fun) = 1;
>>  
>>    original_args_size = args_size;
>>    args_size.constant = (aligned_upper_bound (args_size.constant
>> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>  	       valreg,
>>  	       old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>>  
>> -  if (flag_ipa_ra)
>> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>>      {
>>        rtx datum = orgfun;
>>        gcc_assert (GET_CODE (datum) == SYMBOL_REF);
>> diff --git a/gcc/final.cc b/gcc/final.cc
>> index dd3e22547ac..2041e43fdd1 100644
>> --- a/gcc/final.cc
>> +++ b/gcc/final.cc
>> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>>  }
>>  
>>  \f
>> +static rtx call_from_call_insn (rtx_call_insn *insn);
>> +
>>  /* Make a pass over all insns and compute their actual lengths by shortening
>>     any branches of variable length if possible.  */
>>  
>> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>>    for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>>      {
>>        INSN_SHUID (insn) = i++;
>> +
>> +      /* If this is a `call' instruction implementing a libcall, and
>> +         this machine requires an external definition for library
>> +         functions, write one out.  */
>> +      if (CALL_P (insn))
>> +        {
>> +          rtx x;
>> +
>> +          if ((x = call_from_call_insn (dyn_cast <rtx_call_insn *> (insn)))
>> +              && (x = XEXP (x, 0))
>> +              && MEM_P (x)
>> +              && (x = XEXP (x, 0))
>> +              && SYMBOL_REF_P (x)
>> +              && SYMBOL_REF_LIBCALL (x))
>> +            {
>> +              /* Direct call.  */
>> +              assemble_external_libcall (x);
>> +            }
>> +          else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
>> +                   && (x = XEXP (x, 0)))
>> +            {
>> +              /* Indirect call with REG_CALL_DECL note.  */
>> +              gcc_assert (SYMBOL_REF_P (x));
>> +              if (SYMBOL_REF_LIBCALL (x))
>> +                assemble_external_libcall (x);
>> +            }
>> +        }
>> +
>>        if (INSN_P (insn))
>>  	continue;
>>  
>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>> index e1c51156f90..28be708a55f 100644
>> --- a/gcc/rtl.h
>> +++ b/gcc/rtl.h
>> @@ -334,6 +334,7 @@ struct GTY((desc("0"), tag("0"),
>>       1 in a CALL_INSN logically equivalent to
>>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
>>       1 in a VALUE is SP_DERIVED_VALUE_P in cselib.cc.
>> +     1 in a SYMBOL_REF if it is the target of a libcall.
>>       Dumped as "/c" in RTL dumps.  */
>>    unsigned int call : 1;
>>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
>> @@ -2734,6 +2735,10 @@ do {								        \
>>  #define SYMBOL_REF_USED(RTX)						\
>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
>>  
>> +/* 1 if RTX is a symbol_ref that represents a libcall target.  */
>> +#define SYMBOL_REF_LIBCALL(RTX)                                         \
>> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->call)
>> +
>>  /* 1 if RTX is a symbol_ref for a weak symbol.  */
>>  #define SYMBOL_REF_WEAK(RTX)						\
>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
>> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>> new file mode 100644
>> index 00000000000..b33944c83ff
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>> @@ -0,0 +1,8 @@
>> +/* Make sure that external refences for libcalls are generated even for
>> +   indirect calls.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
>> +/* { dg-final { scan-assembler "globl\t__divti3" } } */
>> +
>> +__int128 a, b; void foo () { a = a / b; }
>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>> new file mode 100644
>> index 00000000000..7481076602a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>> @@ -0,0 +1,19 @@
>> +/* This test makes sure that no spurious external symbol declarations are
>> +   emitted for libcalls in tried but eventually not used code sequences.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcpu=v3" } */
>> +/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
>> +/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
>> +
>> +int
>> +foo (unsigned int len)
>> +{
>> +  return ((unsigned long)len) * 234 / 5;
>> +}
>> +
>> +int
>> +bar (unsigned int len)
>> +{
>> +  return ((unsigned long)len) * 234 % 5;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>> new file mode 100644
>> index 00000000000..792d689395a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcpu=v3" } */
>> +/* { dg-final { scan-assembler "global\t__divdi3" } } */
>> +/* { dg-final { scan-assembler "global\t__moddi3" } } */
>> +
>> +int
>> +foo (unsigned int len)
>> +{
>> +  return ((long)len) * 234 / 5;
>> +}
>> +
>> +int
>> +bar (unsigned int len)
>> +{
>> +  return ((long)len) * 234 % 5;
>> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-08-30  8:12 ` Jose E. Marchesi
  2023-09-05 13:03   ` Jose E. Marchesi
@ 2023-10-03 10:39   ` Jose E. Marchesi
  2023-10-05 22:17     ` Richard Sandiford
  1 sibling, 1 reply; 10+ messages in thread
From: Jose E. Marchesi @ 2023-10-03 10:39 UTC (permalink / raw)
  To: Jose E. Marchesi via Gcc-patches; +Cc: Jakub Jelinek


ping

> ping
>
>> [Differences from V1:
>> - Prototype for call_from_call_insn moved before comment block.
>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>> - Fallback to check REG_CALL_DECL in non-direct calls.
>> - New test to check correct behavior for non-direct calls.]
>>
>> There are many places in GCC where alternative local sequences are
>> tried in order to determine what is the cheapest or best alternative
>> to use in the current target.  When any of these sequences involve a
>> libcall, the current implementation of emit_library_call_value_1
>> introduce a side-effect consisting on emitting an external declaration
>> for the funcall (such as __divdi3) which is thus emitted even if the
>> sequence that does the libcall is not retained.
>>
>> This is problematic in targets such as BPF, because the kernel loader
>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>> object unloadable.  Note that BPF objects are not linked before being
>> loaded.
>>
>> This patch changes emit_library_call_value_1 to mark the target
>> SYMBOL_REF as a libcall.  Then, the emission of the external
>> declaration is done in the first loop of final.cc:shorten_branches.
>> This happens only if the corresponding sequence has been kept.
>>
>> Regtested in x86_64-linux-gnu.
>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>
>> gcc/ChangeLog
>>
>> 	* rtl.h	(SYMBOL_REF_LIBCALL): Define.
>> 	* calls.cc (emit_library_call_value_1): Do not emit external
>> 	libcall declaration here.
>> 	* final.cc (shorten_branches): Do it here.
>>
>> gcc/testsuite/ChangeLog
>>
>> 	* gcc.target/bpf/divmod-libcall-1.c: New test.
>> 	* gcc.target/bpf/divmod-libcall-2.c: Likewise.
>> 	* gcc.c-torture/compile/libcall-2.c: Likewise.
>> ---
>>  gcc/calls.cc                                  |  9 +++---
>>  gcc/final.cc                                  | 30 +++++++++++++++++++
>>  gcc/rtl.h                                     |  5 ++++
>>  .../gcc.c-torture/compile/libcall-2.c         |  8 +++++
>>  .../gcc.target/bpf/divmod-libcall-1.c         | 19 ++++++++++++
>>  .../gcc.target/bpf/divmod-libcall-2.c         | 16 ++++++++++
>>  6 files changed, 83 insertions(+), 4 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>
>> diff --git a/gcc/calls.cc b/gcc/calls.cc
>> index 1f3a6d5c450..219ea599b16 100644
>> --- a/gcc/calls.cc
>> +++ b/gcc/calls.cc
>> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>  	|| argvec[i].partial != 0)
>>        update_stack_alignment_for_call (&argvec[i].locate);
>>  
>> -  /* If this machine requires an external definition for library
>> -     functions, write one out.  */
>> -  assemble_external_libcall (fun);
>> +  /* Mark the emitted target as a libcall.  This will be used by final
>> +     in order to emit an external symbol declaration if the libcall is
>> +     ever used.  */
>> +  SYMBOL_REF_LIBCALL (fun) = 1;
>>  
>>    original_args_size = args_size;
>>    args_size.constant = (aligned_upper_bound (args_size.constant
>> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>  	       valreg,
>>  	       old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>>  
>> -  if (flag_ipa_ra)
>> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>>      {
>>        rtx datum = orgfun;
>>        gcc_assert (GET_CODE (datum) == SYMBOL_REF);
>> diff --git a/gcc/final.cc b/gcc/final.cc
>> index dd3e22547ac..2041e43fdd1 100644
>> --- a/gcc/final.cc
>> +++ b/gcc/final.cc
>> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>>  }
>>  
>>  \f
>> +static rtx call_from_call_insn (rtx_call_insn *insn);
>> +
>>  /* Make a pass over all insns and compute their actual lengths by shortening
>>     any branches of variable length if possible.  */
>>  
>> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>>    for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>>      {
>>        INSN_SHUID (insn) = i++;
>> +
>> +      /* If this is a `call' instruction implementing a libcall, and
>> +         this machine requires an external definition for library
>> +         functions, write one out.  */
>> +      if (CALL_P (insn))
>> +        {
>> +          rtx x;
>> +
>> +          if ((x = call_from_call_insn (dyn_cast <rtx_call_insn *> (insn)))
>> +              && (x = XEXP (x, 0))
>> +              && MEM_P (x)
>> +              && (x = XEXP (x, 0))
>> +              && SYMBOL_REF_P (x)
>> +              && SYMBOL_REF_LIBCALL (x))
>> +            {
>> +              /* Direct call.  */
>> +              assemble_external_libcall (x);
>> +            }
>> +          else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
>> +                   && (x = XEXP (x, 0)))
>> +            {
>> +              /* Indirect call with REG_CALL_DECL note.  */
>> +              gcc_assert (SYMBOL_REF_P (x));
>> +              if (SYMBOL_REF_LIBCALL (x))
>> +                assemble_external_libcall (x);
>> +            }
>> +        }
>> +
>>        if (INSN_P (insn))
>>  	continue;
>>  
>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>> index e1c51156f90..28be708a55f 100644
>> --- a/gcc/rtl.h
>> +++ b/gcc/rtl.h
>> @@ -334,6 +334,7 @@ struct GTY((desc("0"), tag("0"),
>>       1 in a CALL_INSN logically equivalent to
>>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
>>       1 in a VALUE is SP_DERIVED_VALUE_P in cselib.cc.
>> +     1 in a SYMBOL_REF if it is the target of a libcall.
>>       Dumped as "/c" in RTL dumps.  */
>>    unsigned int call : 1;
>>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
>> @@ -2734,6 +2735,10 @@ do {								        \
>>  #define SYMBOL_REF_USED(RTX)						\
>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
>>  
>> +/* 1 if RTX is a symbol_ref that represents a libcall target.  */
>> +#define SYMBOL_REF_LIBCALL(RTX)                                         \
>> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->call)
>> +
>>  /* 1 if RTX is a symbol_ref for a weak symbol.  */
>>  #define SYMBOL_REF_WEAK(RTX)						\
>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
>> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>> new file mode 100644
>> index 00000000000..b33944c83ff
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>> @@ -0,0 +1,8 @@
>> +/* Make sure that external refences for libcalls are generated even for
>> +   indirect calls.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
>> +/* { dg-final { scan-assembler "globl\t__divti3" } } */
>> +
>> +__int128 a, b; void foo () { a = a / b; }
>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>> new file mode 100644
>> index 00000000000..7481076602a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>> @@ -0,0 +1,19 @@
>> +/* This test makes sure that no spurious external symbol declarations are
>> +   emitted for libcalls in tried but eventually not used code sequences.  */
>> +
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcpu=v3" } */
>> +/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
>> +/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
>> +
>> +int
>> +foo (unsigned int len)
>> +{
>> +  return ((unsigned long)len) * 234 / 5;
>> +}
>> +
>> +int
>> +bar (unsigned int len)
>> +{
>> +  return ((unsigned long)len) * 234 % 5;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>> new file mode 100644
>> index 00000000000..792d689395a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -mcpu=v3" } */
>> +/* { dg-final { scan-assembler "global\t__divdi3" } } */
>> +/* { dg-final { scan-assembler "global\t__moddi3" } } */
>> +
>> +int
>> +foo (unsigned int len)
>> +{
>> +  return ((long)len) * 234 / 5;
>> +}
>> +
>> +int
>> +bar (unsigned int len)
>> +{
>> +  return ((long)len) * 234 % 5;
>> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-10-03 10:39   ` Jose E. Marchesi
@ 2023-10-05 22:17     ` Richard Sandiford
  2023-10-05 22:37       ` Jeff Law
  2023-10-12 11:38       ` Jose E. Marchesi
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Sandiford @ 2023-10-05 22:17 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Jose E. Marchesi via Gcc-patches, Jakub Jelinek

"Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
> ping

I don't know this code very well, and have AFAIR haven't worked
with an assembler that requires external declarations, but since
it's at a second ping :)

>
>> ping
>>
>>> [Differences from V1:
>>> - Prototype for call_from_call_insn moved before comment block.
>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>> - New test to check correct behavior for non-direct calls.]
>>>
>>> There are many places in GCC where alternative local sequences are
>>> tried in order to determine what is the cheapest or best alternative
>>> to use in the current target.  When any of these sequences involve a
>>> libcall, the current implementation of emit_library_call_value_1
>>> introduce a side-effect consisting on emitting an external declaration
>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>> sequence that does the libcall is not retained.
>>>
>>> This is problematic in targets such as BPF, because the kernel loader
>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>> object unloadable.  Note that BPF objects are not linked before being
>>> loaded.
>>>
>>> This patch changes emit_library_call_value_1 to mark the target
>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>> declaration is done in the first loop of final.cc:shorten_branches.
>>> This happens only if the corresponding sequence has been kept.
>>>
>>> Regtested in x86_64-linux-gnu.
>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.

I'm not sure that shorten_branches is a natural place to do this.
It isn't something that would normally emit asm text.

Would it be OK to emit the declaration at the same point as for decls,
which IIUC is process_pending_assemble_externals?  If so, how about
making assemble_external_libcall add the symbol to a list when
!SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
directly?  assemble_external_libcall could then also call get_identifier
on the name (perhaps after calling strip_name_encoding -- can't
remember whether assemble_external_libcall sees the encoded or
unencoded name).

All being well, the call to get_identifier should cause
assemble_name_resolve to record when the name is used, via
TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
go through the list of libcalls recorded by assemble_external_libcall
and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.

Not super elegant, but it seems to fit within the existing scheme.
And I don't there should be any problem with using get_identifier
for libcalls, since it isn't valid to use libcall names for other
types of symbol.

Thanks,
Richard

>>>
>>> gcc/ChangeLog
>>>
>>> 	* rtl.h	(SYMBOL_REF_LIBCALL): Define.
>>> 	* calls.cc (emit_library_call_value_1): Do not emit external
>>> 	libcall declaration here.
>>> 	* final.cc (shorten_branches): Do it here.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>> 	* gcc.target/bpf/divmod-libcall-1.c: New test.
>>> 	* gcc.target/bpf/divmod-libcall-2.c: Likewise.
>>> 	* gcc.c-torture/compile/libcall-2.c: Likewise.
>>> ---
>>>  gcc/calls.cc                                  |  9 +++---
>>>  gcc/final.cc                                  | 30 +++++++++++++++++++
>>>  gcc/rtl.h                                     |  5 ++++
>>>  .../gcc.c-torture/compile/libcall-2.c         |  8 +++++
>>>  .../gcc.target/bpf/divmod-libcall-1.c         | 19 ++++++++++++
>>>  .../gcc.target/bpf/divmod-libcall-2.c         | 16 ++++++++++
>>>  6 files changed, 83 insertions(+), 4 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>>
>>> diff --git a/gcc/calls.cc b/gcc/calls.cc
>>> index 1f3a6d5c450..219ea599b16 100644
>>> --- a/gcc/calls.cc
>>> +++ b/gcc/calls.cc
>>> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>>  	|| argvec[i].partial != 0)
>>>        update_stack_alignment_for_call (&argvec[i].locate);
>>>  
>>> -  /* If this machine requires an external definition for library
>>> -     functions, write one out.  */
>>> -  assemble_external_libcall (fun);
>>> +  /* Mark the emitted target as a libcall.  This will be used by final
>>> +     in order to emit an external symbol declaration if the libcall is
>>> +     ever used.  */
>>> +  SYMBOL_REF_LIBCALL (fun) = 1;
>>>  
>>>    original_args_size = args_size;
>>>    args_size.constant = (aligned_upper_bound (args_size.constant
>>> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>>  	       valreg,
>>>  	       old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>>>  
>>> -  if (flag_ipa_ra)
>>> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>>>      {
>>>        rtx datum = orgfun;
>>>        gcc_assert (GET_CODE (datum) == SYMBOL_REF);
>>> diff --git a/gcc/final.cc b/gcc/final.cc
>>> index dd3e22547ac..2041e43fdd1 100644
>>> --- a/gcc/final.cc
>>> +++ b/gcc/final.cc
>>> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>>>  }
>>>  
>>>  \f
>>> +static rtx call_from_call_insn (rtx_call_insn *insn);
>>> +
>>>  /* Make a pass over all insns and compute their actual lengths by shortening
>>>     any branches of variable length if possible.  */
>>>  
>>> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>>>    for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>>>      {
>>>        INSN_SHUID (insn) = i++;
>>> +
>>> +      /* If this is a `call' instruction implementing a libcall, and
>>> +         this machine requires an external definition for library
>>> +         functions, write one out.  */
>>> +      if (CALL_P (insn))
>>> +        {
>>> +          rtx x;
>>> +
>>> +          if ((x = call_from_call_insn (dyn_cast <rtx_call_insn *> (insn)))
>>> +              && (x = XEXP (x, 0))
>>> +              && MEM_P (x)
>>> +              && (x = XEXP (x, 0))
>>> +              && SYMBOL_REF_P (x)
>>> +              && SYMBOL_REF_LIBCALL (x))
>>> +            {
>>> +              /* Direct call.  */
>>> +              assemble_external_libcall (x);
>>> +            }
>>> +          else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
>>> +                   && (x = XEXP (x, 0)))
>>> +            {
>>> +              /* Indirect call with REG_CALL_DECL note.  */
>>> +              gcc_assert (SYMBOL_REF_P (x));
>>> +              if (SYMBOL_REF_LIBCALL (x))
>>> +                assemble_external_libcall (x);
>>> +            }
>>> +        }
>>> +
>>>        if (INSN_P (insn))
>>>  	continue;
>>>  
>>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>>> index e1c51156f90..28be708a55f 100644
>>> --- a/gcc/rtl.h
>>> +++ b/gcc/rtl.h
>>> @@ -334,6 +334,7 @@ struct GTY((desc("0"), tag("0"),
>>>       1 in a CALL_INSN logically equivalent to
>>>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
>>>       1 in a VALUE is SP_DERIVED_VALUE_P in cselib.cc.
>>> +     1 in a SYMBOL_REF if it is the target of a libcall.
>>>       Dumped as "/c" in RTL dumps.  */
>>>    unsigned int call : 1;
>>>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
>>> @@ -2734,6 +2735,10 @@ do {								        \
>>>  #define SYMBOL_REF_USED(RTX)						\
>>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
>>>  
>>> +/* 1 if RTX is a symbol_ref that represents a libcall target.  */
>>> +#define SYMBOL_REF_LIBCALL(RTX)                                         \
>>> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->call)
>>> +
>>>  /* 1 if RTX is a symbol_ref for a weak symbol.  */
>>>  #define SYMBOL_REF_WEAK(RTX)						\
>>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
>>> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>> new file mode 100644
>>> index 00000000000..b33944c83ff
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>> @@ -0,0 +1,8 @@
>>> +/* Make sure that external refences for libcalls are generated even for
>>> +   indirect calls.  */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
>>> +/* { dg-final { scan-assembler "globl\t__divti3" } } */
>>> +
>>> +__int128 a, b; void foo () { a = a / b; }
>>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>> new file mode 100644
>>> index 00000000000..7481076602a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>> @@ -0,0 +1,19 @@
>>> +/* This test makes sure that no spurious external symbol declarations are
>>> +   emitted for libcalls in tried but eventually not used code sequences.  */
>>> +
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mcpu=v3" } */
>>> +/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
>>> +/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
>>> +
>>> +int
>>> +foo (unsigned int len)
>>> +{
>>> +  return ((unsigned long)len) * 234 / 5;
>>> +}
>>> +
>>> +int
>>> +bar (unsigned int len)
>>> +{
>>> +  return ((unsigned long)len) * 234 % 5;
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>> new file mode 100644
>>> index 00000000000..792d689395a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -mcpu=v3" } */
>>> +/* { dg-final { scan-assembler "global\t__divdi3" } } */
>>> +/* { dg-final { scan-assembler "global\t__moddi3" } } */
>>> +
>>> +int
>>> +foo (unsigned int len)
>>> +{
>>> +  return ((long)len) * 234 / 5;
>>> +}
>>> +
>>> +int
>>> +bar (unsigned int len)
>>> +{
>>> +  return ((long)len) * 234 % 5;
>>> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-10-05 22:17     ` Richard Sandiford
@ 2023-10-05 22:37       ` Jeff Law
  2023-10-12 11:38       ` Jose E. Marchesi
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff Law @ 2023-10-05 22:37 UTC (permalink / raw)
  To: Jose E. Marchesi, Jose E. Marchesi via Gcc-patches,
	Jakub Jelinek, richard.sandiford



On 10/5/23 16:17, Richard Sandiford wrote:
> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>> ping
> 
> I don't know this code very well, and have AFAIR haven't worked
> with an assembler that requires external declarations, but since
> it's at a second ping :)
> 
>>
>>> ping
>>>
>>>> [Differences from V1:
>>>> - Prototype for call_from_call_insn moved before comment block.
>>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>>> - New test to check correct behavior for non-direct calls.]
>>>>
>>>> There are many places in GCC where alternative local sequences are
>>>> tried in order to determine what is the cheapest or best alternative
>>>> to use in the current target.  When any of these sequences involve a
>>>> libcall, the current implementation of emit_library_call_value_1
>>>> introduce a side-effect consisting on emitting an external declaration
>>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>>> sequence that does the libcall is not retained.
>>>>
>>>> This is problematic in targets such as BPF, because the kernel loader
>>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>>> object unloadable.  Note that BPF objects are not linked before being
>>>> loaded.
>>>>
>>>> This patch changes emit_library_call_value_1 to mark the target
>>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>>> declaration is done in the first loop of final.cc:shorten_branches.
>>>> This happens only if the corresponding sequence has been kept.
>>>>
>>>> Regtested in x86_64-linux-gnu.
>>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
> 
> I'm not sure that shorten_branches is a natural place to do this.
> It isn't something that would normally emit asm text.
> 
> Would it be OK to emit the declaration at the same point as for decls,
> which IIUC is process_pending_assemble_externals?  If so, how about
> making assemble_external_libcall add the symbol to a list when
> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
> directly?  assemble_external_libcall could then also call get_identifier
> on the name (perhaps after calling strip_name_encoding -- can't
> remember whether assemble_external_libcall sees the encoded or
> unencoded name).
> 
> All being well, the call to get_identifier should cause
> assemble_name_resolve to record when the name is used, via
> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
> go through the list of libcalls recorded by assemble_external_libcall
> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
> 
> Not super elegant, but it seems to fit within the existing scheme.
> And I don't there should be any problem with using get_identifier
> for libcalls, since it isn't valid to use libcall names for other
> types of symbol.
It might be worth looking at the PA port.  There's a fair number of 
similarities.  Essentially it has to "import" every external symbol, 
including libcall symbols.   A symbol which has been exported, can not 
then also be imported as that can change the underlying type.

IIRC we implemented all the magic to handle this inside 
ASM_OUTPUT_EXTERNAL and ASM_OUTPUT_EXTERNAL_LIBCALL with a bit of magic 
to handle "plabel" symbols which we have to defer until the end of the 
compilation unit via pa_end_file or whatever it's called since some 
cases of symbol marking couldn't be determined until we were actually 
emitting code.

jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-10-05 22:17     ` Richard Sandiford
  2023-10-05 22:37       ` Jeff Law
@ 2023-10-12 11:38       ` Jose E. Marchesi
  2023-10-12 12:49         ` Richard Sandiford
  1 sibling, 1 reply; 10+ messages in thread
From: Jose E. Marchesi @ 2023-10-12 11:38 UTC (permalink / raw)
  To: Jose E. Marchesi via Gcc-patches; +Cc: Jakub Jelinek, richard.sandiford


Hi Richard.
Thanks for looking at this! :)


> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>> ping
>
> I don't know this code very well, and have AFAIR haven't worked
> with an assembler that requires external declarations, but since
> it's at a second ping :)
>
>>
>>> ping
>>>
>>>> [Differences from V1:
>>>> - Prototype for call_from_call_insn moved before comment block.
>>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>>> - New test to check correct behavior for non-direct calls.]
>>>>
>>>> There are many places in GCC where alternative local sequences are
>>>> tried in order to determine what is the cheapest or best alternative
>>>> to use in the current target.  When any of these sequences involve a
>>>> libcall, the current implementation of emit_library_call_value_1
>>>> introduce a side-effect consisting on emitting an external declaration
>>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>>> sequence that does the libcall is not retained.
>>>>
>>>> This is problematic in targets such as BPF, because the kernel loader
>>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>>> object unloadable.  Note that BPF objects are not linked before being
>>>> loaded.
>>>>
>>>> This patch changes emit_library_call_value_1 to mark the target
>>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>>> declaration is done in the first loop of final.cc:shorten_branches.
>>>> This happens only if the corresponding sequence has been kept.
>>>>
>>>> Regtested in x86_64-linux-gnu.
>>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>
> I'm not sure that shorten_branches is a natural place to do this.
> It isn't something that would normally emit asm text.

Well, that was the approach suggested by another reviewer (Jakub) once
my initial approach (in the V1) got rejected.  He explicitly suggested
to use shorten_branches.

> Would it be OK to emit the declaration at the same point as for decls,
> which IIUC is process_pending_assemble_externals?  If so, how about
> making assemble_external_libcall add the symbol to a list when
> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
> directly?  assemble_external_libcall could then also call get_identifier
> on the name (perhaps after calling strip_name_encoding -- can't
> remember whether assemble_external_libcall sees the encoded or
> unencoded name).
>
> All being well, the call to get_identifier should cause
> assemble_name_resolve to record when the name is used, via
> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
> go through the list of libcalls recorded by assemble_external_libcall
> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>
> Not super elegant, but it seems to fit within the existing scheme.
> And I don't there should be any problem with using get_identifier
> for libcalls, since it isn't valid to use libcall names for other
> types of symbol.

This sounds way more complicated to me than the approach in V2, which
seems to work and is thus a clear improvement compared to the current
situation in the trunk.  The approach in V2 may be ugly, but it is
simple and easy to understand.  Is the proposed more convoluted
alternative really worth the extra complexity, given it is "not super
elegant"?

I am willing to give it a try if you insist on it, but I wouldn't want a
V3 series based on that approach to be deflected again on the basis of a
yet another more potentially elegant solution... that wont' converge,
and I need this fixed for the BPF backend..

>
> Thanks,
> Richard
>
>>>>
>>>> gcc/ChangeLog
>>>>
>>>> 	* rtl.h	(SYMBOL_REF_LIBCALL): Define.
>>>> 	* calls.cc (emit_library_call_value_1): Do not emit external
>>>> 	libcall declaration here.
>>>> 	* final.cc (shorten_branches): Do it here.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>>
>>>> 	* gcc.target/bpf/divmod-libcall-1.c: New test.
>>>> 	* gcc.target/bpf/divmod-libcall-2.c: Likewise.
>>>> 	* gcc.c-torture/compile/libcall-2.c: Likewise.
>>>> ---
>>>>  gcc/calls.cc                                  |  9 +++---
>>>>  gcc/final.cc                                  | 30 +++++++++++++++++++
>>>>  gcc/rtl.h                                     |  5 ++++
>>>>  .../gcc.c-torture/compile/libcall-2.c         |  8 +++++
>>>>  .../gcc.target/bpf/divmod-libcall-1.c         | 19 ++++++++++++
>>>>  .../gcc.target/bpf/divmod-libcall-2.c         | 16 ++++++++++
>>>>  6 files changed, 83 insertions(+), 4 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>>>  create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>>>
>>>> diff --git a/gcc/calls.cc b/gcc/calls.cc
>>>> index 1f3a6d5c450..219ea599b16 100644
>>>> --- a/gcc/calls.cc
>>>> +++ b/gcc/calls.cc
>>>> @@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>>>  	|| argvec[i].partial != 0)
>>>>        update_stack_alignment_for_call (&argvec[i].locate);
>>>>  
>>>> -  /* If this machine requires an external definition for library
>>>> -     functions, write one out.  */
>>>> -  assemble_external_libcall (fun);
>>>> +  /* Mark the emitted target as a libcall.  This will be used by final
>>>> +     in order to emit an external symbol declaration if the libcall is
>>>> +     ever used.  */
>>>> +  SYMBOL_REF_LIBCALL (fun) = 1;
>>>>  
>>>>    original_args_size = args_size;
>>>>    args_size.constant = (aligned_upper_bound (args_size.constant
>>>> @@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx value,
>>>>  	       valreg,
>>>>  	       old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
>>>>  
>>>> -  if (flag_ipa_ra)
>>>> +  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
>>>>      {
>>>>        rtx datum = orgfun;
>>>>        gcc_assert (GET_CODE (datum) == SYMBOL_REF);
>>>> diff --git a/gcc/final.cc b/gcc/final.cc
>>>> index dd3e22547ac..2041e43fdd1 100644
>>>> --- a/gcc/final.cc
>>>> +++ b/gcc/final.cc
>>>> @@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
>>>>  }
>>>>  
>>>>  \f
>>>> +static rtx call_from_call_insn (rtx_call_insn *insn);
>>>> +
>>>>  /* Make a pass over all insns and compute their actual lengths by shortening
>>>>     any branches of variable length if possible.  */
>>>>  
>>>> @@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
>>>>    for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
>>>>      {
>>>>        INSN_SHUID (insn) = i++;
>>>> +
>>>> +      /* If this is a `call' instruction implementing a libcall, and
>>>> +         this machine requires an external definition for library
>>>> +         functions, write one out.  */
>>>> +      if (CALL_P (insn))
>>>> +        {
>>>> +          rtx x;
>>>> +
>>>> +          if ((x = call_from_call_insn (dyn_cast <rtx_call_insn *> (insn)))
>>>> +              && (x = XEXP (x, 0))
>>>> +              && MEM_P (x)
>>>> +              && (x = XEXP (x, 0))
>>>> +              && SYMBOL_REF_P (x)
>>>> +              && SYMBOL_REF_LIBCALL (x))
>>>> +            {
>>>> +              /* Direct call.  */
>>>> +              assemble_external_libcall (x);
>>>> +            }
>>>> +          else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
>>>> +                   && (x = XEXP (x, 0)))
>>>> +            {
>>>> +              /* Indirect call with REG_CALL_DECL note.  */
>>>> +              gcc_assert (SYMBOL_REF_P (x));
>>>> +              if (SYMBOL_REF_LIBCALL (x))
>>>> +                assemble_external_libcall (x);
>>>> +            }
>>>> +        }
>>>> +
>>>>        if (INSN_P (insn))
>>>>  	continue;
>>>>  
>>>> diff --git a/gcc/rtl.h b/gcc/rtl.h
>>>> index e1c51156f90..28be708a55f 100644
>>>> --- a/gcc/rtl.h
>>>> +++ b/gcc/rtl.h
>>>> @@ -334,6 +334,7 @@ struct GTY((desc("0"), tag("0"),
>>>>       1 in a CALL_INSN logically equivalent to
>>>>         ECF_LOOPING_CONST_OR_PURE and DECL_LOOPING_CONST_OR_PURE_P.
>>>>       1 in a VALUE is SP_DERIVED_VALUE_P in cselib.cc.
>>>> +     1 in a SYMBOL_REF if it is the target of a libcall.
>>>>       Dumped as "/c" in RTL dumps.  */
>>>>    unsigned int call : 1;
>>>>    /* 1 in a REG, MEM, or CONCAT if the value is set at most once, anywhere.
>>>> @@ -2734,6 +2735,10 @@ do {								        \
>>>>  #define SYMBOL_REF_USED(RTX)						\
>>>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_USED", (RTX), SYMBOL_REF)->used)
>>>>  
>>>> +/* 1 if RTX is a symbol_ref that represents a libcall target.  */
>>>> +#define SYMBOL_REF_LIBCALL(RTX)                                         \
>>>> +  (RTL_FLAG_CHECK1 ("SYMBOL_REF_LIBCALL", (RTX), SYMBOL_REF)->call)
>>>> +
>>>>  /* 1 if RTX is a symbol_ref for a weak symbol.  */
>>>>  #define SYMBOL_REF_WEAK(RTX)						\
>>>>    (RTL_FLAG_CHECK1 ("SYMBOL_REF_WEAK", (RTX), SYMBOL_REF)->return_val)
>>>> diff --git a/gcc/testsuite/gcc.c-torture/compile/libcall-2.c b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>>> new file mode 100644
>>>> index 00000000000..b33944c83ff
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.c-torture/compile/libcall-2.c
>>>> @@ -0,0 +1,8 @@
>>>> +/* Make sure that external refences for libcalls are generated even for
>>>> +   indirect calls.  */
>>>> +
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -mcmodel=large" { target x86_64-*-* } } */
>>>> +/* { dg-final { scan-assembler "globl\t__divti3" } } */
>>>> +
>>>> +__int128 a, b; void foo () { a = a / b; }
>>>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>>> new file mode 100644
>>>> index 00000000000..7481076602a
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
>>>> @@ -0,0 +1,19 @@
>>>> +/* This test makes sure that no spurious external symbol declarations are
>>>> +   emitted for libcalls in tried but eventually not used code sequences.  */
>>>> +
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -mcpu=v3" } */
>>>> +/* { dg-final { scan-assembler-not "global\t__divdi3" } } */
>>>> +/* { dg-final { scan-assembler-not "global\t__moddi3" } } */
>>>> +
>>>> +int
>>>> +foo (unsigned int len)
>>>> +{
>>>> +  return ((unsigned long)len) * 234 / 5;
>>>> +}
>>>> +
>>>> +int
>>>> +bar (unsigned int len)
>>>> +{
>>>> +  return ((unsigned long)len) * 234 % 5;
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>>> new file mode 100644
>>>> index 00000000000..792d689395a
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c
>>>> @@ -0,0 +1,16 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -mcpu=v3" } */
>>>> +/* { dg-final { scan-assembler "global\t__divdi3" } } */
>>>> +/* { dg-final { scan-assembler "global\t__moddi3" } } */
>>>> +
>>>> +int
>>>> +foo (unsigned int len)
>>>> +{
>>>> +  return ((long)len) * 234 / 5;
>>>> +}
>>>> +
>>>> +int
>>>> +bar (unsigned int len)
>>>> +{
>>>> +  return ((long)len) * 234 % 5;
>>>> +}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-10-12 11:38       ` Jose E. Marchesi
@ 2023-10-12 12:49         ` Richard Sandiford
  2023-10-12 13:54           ` Jose E. Marchesi
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Sandiford @ 2023-10-12 12:49 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Jose E. Marchesi via Gcc-patches, Jakub Jelinek

"Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
> Hi Richard.
> Thanks for looking at this! :)
>
>
>> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>>> ping
>>
>> I don't know this code very well, and have AFAIR haven't worked
>> with an assembler that requires external declarations, but since
>> it's at a second ping :)
>>
>>>
>>>> ping
>>>>
>>>>> [Differences from V1:
>>>>> - Prototype for call_from_call_insn moved before comment block.
>>>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>>>> - New test to check correct behavior for non-direct calls.]
>>>>>
>>>>> There are many places in GCC where alternative local sequences are
>>>>> tried in order to determine what is the cheapest or best alternative
>>>>> to use in the current target.  When any of these sequences involve a
>>>>> libcall, the current implementation of emit_library_call_value_1
>>>>> introduce a side-effect consisting on emitting an external declaration
>>>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>>>> sequence that does the libcall is not retained.
>>>>>
>>>>> This is problematic in targets such as BPF, because the kernel loader
>>>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>>>> object unloadable.  Note that BPF objects are not linked before being
>>>>> loaded.
>>>>>
>>>>> This patch changes emit_library_call_value_1 to mark the target
>>>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>>>> declaration is done in the first loop of final.cc:shorten_branches.
>>>>> This happens only if the corresponding sequence has been kept.
>>>>>
>>>>> Regtested in x86_64-linux-gnu.
>>>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>
>> I'm not sure that shorten_branches is a natural place to do this.
>> It isn't something that would normally emit asm text.
>
> Well, that was the approach suggested by another reviewer (Jakub) once
> my initial approach (in the V1) got rejected.  He explicitly suggested
> to use shorten_branches.
>
>> Would it be OK to emit the declaration at the same point as for decls,
>> which IIUC is process_pending_assemble_externals?  If so, how about
>> making assemble_external_libcall add the symbol to a list when
>> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
>> directly?  assemble_external_libcall could then also call get_identifier
>> on the name (perhaps after calling strip_name_encoding -- can't
>> remember whether assemble_external_libcall sees the encoded or
>> unencoded name).
>>
>> All being well, the call to get_identifier should cause
>> assemble_name_resolve to record when the name is used, via
>> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
>> go through the list of libcalls recorded by assemble_external_libcall
>> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>>
>> Not super elegant, but it seems to fit within the existing scheme.
>> And I don't there should be any problem with using get_identifier
>> for libcalls, since it isn't valid to use libcall names for other
>> types of symbol.
>
> This sounds way more complicated to me than the approach in V2, which
> seems to work and is thus a clear improvement compared to the current
> situation in the trunk.  The approach in V2 may be ugly, but it is
> simple and easy to understand.  Is the proposed more convoluted
> alternative really worth the extra complexity, given it is "not super
> elegant"?

Is it really that much more convoluted?  I was thinking of something
like the attached, which seems a bit shorter than V2, and does seem
to fix the bpf tests.

I think most (all?) libcalls already have an associated decl due to
optabs-libfuncs.cc, so an alternative to get_identifier would be to
set the SYMBOL_REF_DECL.  Using get_identiifer seems a bit more
lightweight though.

Richard


diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index b0eff17b8b5..073e3eb2579 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -2461,6 +2461,10 @@ contains_pointers_p (tree type)
    it all the way to final.  See PR 17982 for further discussion.  */
 static GTY(()) tree pending_assemble_externals;
 
+/* A similar list of pending libcall symbols.  We only want to declare
+   symbols that are actually used in the final assembly.  */
+static GTY(()) rtx pending_libcall_symbols;
+
 #ifdef ASM_OUTPUT_EXTERNAL
 /* Some targets delay some output to final using TARGET_ASM_FILE_END.
    As a result, assemble_external can be called after the list of externals
@@ -2516,12 +2520,20 @@ void
 process_pending_assemble_externals (void)
 {
 #ifdef ASM_OUTPUT_EXTERNAL
-  tree list;
-  for (list = pending_assemble_externals; list; list = TREE_CHAIN (list))
+  for (tree list = pending_assemble_externals; list; list = TREE_CHAIN (list))
     assemble_external_real (TREE_VALUE (list));
 
+  for (rtx list = pending_libcall_symbols; list; list = XEXP (list, 1))
+    {
+      rtx symbol = XEXP (list, 0);
+      tree id = get_identifier (XSTR (symbol, 0));
+      if (TREE_SYMBOL_REFERENCED (id))
+	targetm.asm_out.external_libcall (symbol);
+    }
+
   pending_assemble_externals = 0;
   pending_assemble_externals_processed = true;
+  pending_libcall_symbols = NULL_RTX;
   delete pending_assemble_externals_set;
 #endif
 }
@@ -2594,8 +2606,11 @@ assemble_external_libcall (rtx fun)
   /* Declare library function name external when first used, if nec.  */
   if (! SYMBOL_REF_USED (fun))
     {
+      gcc_assert (!pending_assemble_externals_processed);
       SYMBOL_REF_USED (fun) = 1;
-      targetm.asm_out.external_libcall (fun);
+      get_identifier (XSTR (fun, 0));
+      pending_libcall_symbols = gen_rtx_EXPR_LIST (VOIDmode, fun,
+						   pending_libcall_symbols);
     }
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-10-12 12:49         ` Richard Sandiford
@ 2023-10-12 13:54           ` Jose E. Marchesi
  2023-10-12 14:05             ` Richard Sandiford
  0 siblings, 1 reply; 10+ messages in thread
From: Jose E. Marchesi @ 2023-10-12 13:54 UTC (permalink / raw)
  To: Jose E. Marchesi via Gcc-patches; +Cc: Jakub Jelinek, richard.sandiford


> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>> Hi Richard.
>> Thanks for looking at this! :)
>>
>>
>>> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>>>> ping
>>>
>>> I don't know this code very well, and have AFAIR haven't worked
>>> with an assembler that requires external declarations, but since
>>> it's at a second ping :)
>>>
>>>>
>>>>> ping
>>>>>
>>>>>> [Differences from V1:
>>>>>> - Prototype for call_from_call_insn moved before comment block.
>>>>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>>>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>>>>> - New test to check correct behavior for non-direct calls.]
>>>>>>
>>>>>> There are many places in GCC where alternative local sequences are
>>>>>> tried in order to determine what is the cheapest or best alternative
>>>>>> to use in the current target.  When any of these sequences involve a
>>>>>> libcall, the current implementation of emit_library_call_value_1
>>>>>> introduce a side-effect consisting on emitting an external declaration
>>>>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>>>>> sequence that does the libcall is not retained.
>>>>>>
>>>>>> This is problematic in targets such as BPF, because the kernel loader
>>>>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>>>>> object unloadable.  Note that BPF objects are not linked before being
>>>>>> loaded.
>>>>>>
>>>>>> This patch changes emit_library_call_value_1 to mark the target
>>>>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>>>>> declaration is done in the first loop of final.cc:shorten_branches.
>>>>>> This happens only if the corresponding sequence has been kept.
>>>>>>
>>>>>> Regtested in x86_64-linux-gnu.
>>>>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>>
>>> I'm not sure that shorten_branches is a natural place to do this.
>>> It isn't something that would normally emit asm text.
>>
>> Well, that was the approach suggested by another reviewer (Jakub) once
>> my initial approach (in the V1) got rejected.  He explicitly suggested
>> to use shorten_branches.
>>
>>> Would it be OK to emit the declaration at the same point as for decls,
>>> which IIUC is process_pending_assemble_externals?  If so, how about
>>> making assemble_external_libcall add the symbol to a list when
>>> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
>>> directly?  assemble_external_libcall could then also call get_identifier
>>> on the name (perhaps after calling strip_name_encoding -- can't
>>> remember whether assemble_external_libcall sees the encoded or
>>> unencoded name).
>>>
>>> All being well, the call to get_identifier should cause
>>> assemble_name_resolve to record when the name is used, via
>>> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
>>> go through the list of libcalls recorded by assemble_external_libcall
>>> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>>>
>>> Not super elegant, but it seems to fit within the existing scheme.
>>> And I don't there should be any problem with using get_identifier
>>> for libcalls, since it isn't valid to use libcall names for other
>>> types of symbol.
>>
>> This sounds way more complicated to me than the approach in V2, which
>> seems to work and is thus a clear improvement compared to the current
>> situation in the trunk.  The approach in V2 may be ugly, but it is
>> simple and easy to understand.  Is the proposed more convoluted
>> alternative really worth the extra complexity, given it is "not super
>> elegant"?
>
> Is it really that much more convoluted?  I was thinking of something
> like the attached, which seems a bit shorter than V2, and does seem
> to fix the bpf tests.

o_O
Ok I clearly misunderstood what you was proposing.  This is way simpler!

How does the magic of TREE_SYMBOL_REFERENCED work?  How is it set to
`true' only if the RTL containing the call is retained in the final
chain?

> I think most (all?) libcalls already have an associated decl due to
> optabs-libfuncs.cc, so an alternative to get_identifier would be to
> set the SYMBOL_REF_DECL.  Using get_identiifer seems a bit more
> lightweight though.
>
> Richard
>
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index b0eff17b8b5..073e3eb2579 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -2461,6 +2461,10 @@ contains_pointers_p (tree type)
>     it all the way to final.  See PR 17982 for further discussion.  */
>  static GTY(()) tree pending_assemble_externals;
>  
> +/* A similar list of pending libcall symbols.  We only want to declare
> +   symbols that are actually used in the final assembly.  */
> +static GTY(()) rtx pending_libcall_symbols;
> +
>  #ifdef ASM_OUTPUT_EXTERNAL
>  /* Some targets delay some output to final using TARGET_ASM_FILE_END.
>     As a result, assemble_external can be called after the list of externals
> @@ -2516,12 +2520,20 @@ void
>  process_pending_assemble_externals (void)
>  {
>  #ifdef ASM_OUTPUT_EXTERNAL
> -  tree list;
> -  for (list = pending_assemble_externals; list; list = TREE_CHAIN (list))
> +  for (tree list = pending_assemble_externals; list; list = TREE_CHAIN (list))
>      assemble_external_real (TREE_VALUE (list));
>  
> +  for (rtx list = pending_libcall_symbols; list; list = XEXP (list, 1))
> +    {
> +      rtx symbol = XEXP (list, 0);
> +      tree id = get_identifier (XSTR (symbol, 0));
> +      if (TREE_SYMBOL_REFERENCED (id))
> +	targetm.asm_out.external_libcall (symbol);
> +    }
> +
>    pending_assemble_externals = 0;
>    pending_assemble_externals_processed = true;
> +  pending_libcall_symbols = NULL_RTX;
>    delete pending_assemble_externals_set;
>  #endif
>  }
> @@ -2594,8 +2606,11 @@ assemble_external_libcall (rtx fun)
>    /* Declare library function name external when first used, if nec.  */
>    if (! SYMBOL_REF_USED (fun))
>      {
> +      gcc_assert (!pending_assemble_externals_processed);
>        SYMBOL_REF_USED (fun) = 1;
> -      targetm.asm_out.external_libcall (fun);
> +      get_identifier (XSTR (fun, 0));
> +      pending_libcall_symbols = gen_rtx_EXPR_LIST (VOIDmode, fun,
> +						   pending_libcall_symbols);
>      }
>  }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V2] Emit funcall external declarations only if actually used.
  2023-10-12 13:54           ` Jose E. Marchesi
@ 2023-10-12 14:05             ` Richard Sandiford
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Sandiford @ 2023-10-12 14:05 UTC (permalink / raw)
  To: Jose E. Marchesi; +Cc: Jose E. Marchesi via Gcc-patches, Jakub Jelinek

"Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>>> Hi Richard.
>>> Thanks for looking at this! :)
>>>
>>>
>>>> "Jose E. Marchesi" <jose.marchesi@oracle.com> writes:
>>>>> ping
>>>>
>>>> I don't know this code very well, and have AFAIR haven't worked
>>>> with an assembler that requires external declarations, but since
>>>> it's at a second ping :)
>>>>
>>>>>
>>>>>> ping
>>>>>>
>>>>>>> [Differences from V1:
>>>>>>> - Prototype for call_from_call_insn moved before comment block.
>>>>>>> - Reuse the `call' flag for SYMBOL_REF_LIBCALL.
>>>>>>> - Fallback to check REG_CALL_DECL in non-direct calls.
>>>>>>> - New test to check correct behavior for non-direct calls.]
>>>>>>>
>>>>>>> There are many places in GCC where alternative local sequences are
>>>>>>> tried in order to determine what is the cheapest or best alternative
>>>>>>> to use in the current target.  When any of these sequences involve a
>>>>>>> libcall, the current implementation of emit_library_call_value_1
>>>>>>> introduce a side-effect consisting on emitting an external declaration
>>>>>>> for the funcall (such as __divdi3) which is thus emitted even if the
>>>>>>> sequence that does the libcall is not retained.
>>>>>>>
>>>>>>> This is problematic in targets such as BPF, because the kernel loader
>>>>>>> chokes on the spurious symbol __divdi3 and makes the resulting BPF
>>>>>>> object unloadable.  Note that BPF objects are not linked before being
>>>>>>> loaded.
>>>>>>>
>>>>>>> This patch changes emit_library_call_value_1 to mark the target
>>>>>>> SYMBOL_REF as a libcall.  Then, the emission of the external
>>>>>>> declaration is done in the first loop of final.cc:shorten_branches.
>>>>>>> This happens only if the corresponding sequence has been kept.
>>>>>>>
>>>>>>> Regtested in x86_64-linux-gnu.
>>>>>>> Tested with host x86_64-linux-gnu with target bpf-unknown-none.
>>>>
>>>> I'm not sure that shorten_branches is a natural place to do this.
>>>> It isn't something that would normally emit asm text.
>>>
>>> Well, that was the approach suggested by another reviewer (Jakub) once
>>> my initial approach (in the V1) got rejected.  He explicitly suggested
>>> to use shorten_branches.
>>>
>>>> Would it be OK to emit the declaration at the same point as for decls,
>>>> which IIUC is process_pending_assemble_externals?  If so, how about
>>>> making assemble_external_libcall add the symbol to a list when
>>>> !SYMBOL_REF_USED, instead of calling targetm.asm_out.external_libcall
>>>> directly?  assemble_external_libcall could then also call get_identifier
>>>> on the name (perhaps after calling strip_name_encoding -- can't
>>>> remember whether assemble_external_libcall sees the encoded or
>>>> unencoded name).
>>>>
>>>> All being well, the call to get_identifier should cause
>>>> assemble_name_resolve to record when the name is used, via
>>>> TREE_SYMBOL_REFERENCED.  Then process_pending_assemble_externals could
>>>> go through the list of libcalls recorded by assemble_external_libcall
>>>> and check whether TREE_SYMBOL_REFERENCED is set on the get_identifier.
>>>>
>>>> Not super elegant, but it seems to fit within the existing scheme.
>>>> And I don't there should be any problem with using get_identifier
>>>> for libcalls, since it isn't valid to use libcall names for other
>>>> types of symbol.
>>>
>>> This sounds way more complicated to me than the approach in V2, which
>>> seems to work and is thus a clear improvement compared to the current
>>> situation in the trunk.  The approach in V2 may be ugly, but it is
>>> simple and easy to understand.  Is the proposed more convoluted
>>> alternative really worth the extra complexity, given it is "not super
>>> elegant"?
>>
>> Is it really that much more convoluted?  I was thinking of something
>> like the attached, which seems a bit shorter than V2, and does seem
>> to fix the bpf tests.
>
> o_O
> Ok I clearly misunderstood what you was proposing.  This is way simpler!
>
> How does the magic of TREE_SYMBOL_REFERENCED work?  How is it set to
> `true' only if the RTL containing the call is retained in the final
> chain?

It happens in assemble_name, via assemble_name_resolve.  The system
relies on code using that rather than assemble_name_raw for symbols
that might need to be declared, or that might need visibility
information attached.  (It relies on that in general, I mean,
not just for this patch.)

Thanks,
Richard


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-12 14:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-21 18:07 [PATCH V2] Emit funcall external declarations only if actually used Jose E. Marchesi
2023-08-30  8:12 ` Jose E. Marchesi
2023-09-05 13:03   ` Jose E. Marchesi
2023-10-03 10:39   ` Jose E. Marchesi
2023-10-05 22:17     ` Richard Sandiford
2023-10-05 22:37       ` Jeff Law
2023-10-12 11:38       ` Jose E. Marchesi
2023-10-12 12:49         ` Richard Sandiford
2023-10-12 13:54           ` Jose E. Marchesi
2023-10-12 14:05             ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).