[RFC] enable flags-unchanging asms, add_overflow/expand/combine woes

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
@ 2020-09-01 22:22 Alexandre Oliva
  2020-09-02  9:47 ` Segher Boessenkool
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-01 22:22 UTC (permalink / raw)
  To: gcc-patches
  Cc: hubicka, ubizjak, richard.earnshaw, richard.sandiford,
	marcus.shawcroft, kyrylo.tkachov, nickc, ramana.radhakrishnan,
	segher

This WIP patchlet introduces means for machines that implicitly clobber
cc flags in asm statements, but that have machinery to output flags
(namely x86, non-thumb arm and aarch64), to state that the asm statement
does NOT clobber cc flags.  That's accomplished by using "=@ccC" in the
output constraints.  It disables the implicit clobber, but it doesn't
set up an actual asm output to the flags, so they are left alone.

It's ugly, I know.  I've considered "!cc" or "nocc" in the clobber
list as a machine-independent way to signal cc is not modified, or
even special-casing empty asm patterns (at a slight risk of breaking
code that expects implicit clobbers even for empty asm patterns, but
empty asm patterns won't clobber flags, so how could it break
anything?).  I take this might be useful for do-nothing asm
statements, often used to stop certain optimizations, e.g.:

  __typeof (*p) c = __builtin_add_overflow (*p, 1, p);
  asm ("" : "+m" (*p)); // Make sure we write to memory.
  *p += c; // This should compile into an add with carry.

Is there interest in, and a preferred form for (portably?), conveying
a no-cc-clobbering asm?

Without the asm, we issue load;add;adc;store, which is not the ideal
sequence with add and adc to the same memory address (or two different
addresses, if the last statement uses say *q instead of *p).

Without flags clobbering, we end up saving the flag to a register, and
then we use add for the final statement.  Its load/plus/store gimple
sequence is successfully optimized to a memory add with TER.  With
arrangements to stop the asm from clobbering the flags, we get an adc to
memory, and don't waste cycles saving the wanted flag to a register.

Alas, getting the first add to go straight to memory is more
complicated.  Even with the asm that forces the output to memory, the
output flag makes it harder to get it optimized to an add-to-memory
form.  When the output flag is unused, we optimize it enough in gimple
that TER does its job and we issue a single add, but that's not possible
when the two outputs of ADD_OVERFLOW are used: the flag setting gets
optimized away, but only after stopping combine from turning the
load/add/store into an add-to-memory.

If we retried the 3-insn substitution after substituting the flag store
into the add for the adc, we might succeed, but only if we had a pattern
that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
the second element of the parallel, because that's where combine adds it
to the new i3 pattern, after splitting it out of i2.

I suppose adding such patterns manually isn't the way to go.  I wonder
if getting recog_for_combine to recognize and reorder PARALLELs
appearing out of order would get too expensive, even if genrecog were to
generate optimized code to try alternate orders in parallels.

The issue doesn't seem that important in the grand scheme of things, but
there is some embarrassment from the missed combines and from the AFAICT
impossibility to get GCC to issue the most compact (and possibly
fastest) insn sequence on x86* for a 'memory += value;' spelled as
__builtin_add_overflow, when the result of the overflow checking is
used.

Part of the issue is that the *_overflow builtins, despite taking a
pointer to hold the op result, split the pointer away from the IFN
interface, which makes for some optimizations, but that ends up
preventing the most desirable (?) translation when adding to a variable
in memory.

Thoughts?  Suggestions?  Any tricks to share to get the desired
sequence?  Thanks in advance,
---
 gcc/config/arm/aarch-common.c |    8 +++++++-
 gcc/config/i386/i386.c        |    6 ++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c
index 6bc6ccf..ebe034c 100644
--- a/gcc/config/arm/aarch-common.c
+++ b/gcc/config/arm/aarch-common.c
@@ -557,11 +557,17 @@ arm_md_asm_adjust (vec<rtx> &outputs, vec<rtx> &/*inputs*/,
 #define C(X, Y)  (unsigned char)(X) * 256 + (unsigned char)(Y)

       /* All of the condition codes are two characters.  */
-      if (con[0] != 0 && con[1] != 0 && con[2] == 0)
+      if (con[0] != 0 && (con[1] == 0 || con[2] == 0))
 	con01 = C(con[0], con[1]);

       switch (con01)
 	{
+	case C('C', 0):
+	  saw_asm_flag = true;
+	  constraints[i] = "=X";
+	  outputs[i] = gen_rtx_SCRATCH (SImode);
+	  continue;
+
 	case C('c', 'c'):
 	case C('l', 'o'):
 	  mode = CC_Cmode, code = GEU;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a15807d..56e086a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21021,6 +21021,12 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> &/*inputs*/,

       switch (con[0])
 	{
+	case 'C':
+	  saw_asm_flag = true;
+	  constraints[i] = "=X";
+	  outputs[i] = gen_rtx_SCRATCH (SImode);
+	  continue;
+
 	case 'a':
 	  if (con[1] == 0)
 	    mode = CCAmode, code = EQ;

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-01 22:22 [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes Alexandre Oliva
@ 2020-09-02  9:47 ` Segher Boessenkool
  2020-09-03 10:03   ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2020-09-02  9:47 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

Hi!

On Tue, Sep 01, 2020 at 07:22:57PM -0300, Alexandre Oliva wrote:
> This WIP patchlet introduces means for machines that implicitly clobber
> cc flags in asm statements, but that have machinery to output flags
> (namely x86, non-thumb arm and aarch64), to state that the asm statement
> does NOT clobber cc flags.  That's accomplished by using "=@ccC" in the
> output constraints.  It disables the implicit clobber, but it doesn't
> set up an actual asm output to the flags, so they are left alone.
> 
> It's ugly, I know.

Yeah, it's bloody disgusting :-)  But it is very local, and it works
with the generic code without any changes there, that is good.  OTOH
this patch is for x86 only.  (And aarch, but not the other targets
with default clobbers).

> I've considered "!cc" or "nocc" in the clobber
> list as a machine-independent way to signal cc is not modified, or
> even special-casing empty asm patterns (at a slight risk of breaking
> code that expects implicit clobbers even for empty asm patterns, but
> empty asm patterns won't clobber flags, so how could it break
> anything?).

People write empty asm statements not because they would like no insns
emitted from it, but *because* they want the other effects an asm has
(for example, an empty asm usually has no outputs, so it is volatile,
and that makes sure it is executed in the real machine exactly as often
as in the abstract machine).  So your expectation might be wrong,
someone might want an empty asm to clobber cc on x86 (like any asm is
documented as doing).

But how about a "none" clobber?  That would be generic, and just remove
all preceding clobbers (incl. the implicit clobbers).  Maybe disallow
any explicit clobbers before it, not sure what is nicer.

> I take this might be useful for do-nothing asm
> statements, often used to stop certain optimizations, e.g.:
> 
>   __typeof (*p) c = __builtin_add_overflow (*p, 1, p);
>   asm ("" : "+m" (*p)); // Make sure we write to memory.
>   *p += c; // This should compile into an add with carry.

Wow, nasty.  That asm cannot be optimised away even if the rest is
(unless GCC can somehow figure out nothing ever uses *p).  Is there no
better way to do this?

> Is there interest in, and a preferred form for (portably?), conveying
> a no-cc-clobbering asm?

Well, that whole cc clobbering is an x86 thing, but some other targets
clobber other registers by default.  Yes, I think this might be useful;
and see my suggestion above ("none").

> Without the asm, we issue load;add;adc;store, which is not the ideal
> sequence with add and adc to the same memory address (or two different
> addresses, if the last statement uses say *q instead of *p).

Is doing two RMWs on memory faster?  Huh.

> Alas, getting the first add to go straight to memory is more
> complicated.  Even with the asm that forces the output to memory, the
> output flag makes it harder to get it optimized to an add-to-memory
> form.  When the output flag is unused, we optimize it enough in gimple
> that TER does its job and we issue a single add, but that's not possible
> when the two outputs of ADD_OVERFLOW are used: the flag setting gets
> optimized away, but only after stopping combine from turning the
> load/add/store into an add-to-memory.
> 
> If we retried the 3-insn substitution after substituting the flag store
> into the add for the adc,

combine should retry every combination if any of the input insns to it
have changed (put another way, if any insn is changed all combinations
with it are tried anew).  If this doesn't work, please file a bug.

But.  Dependencies through memory are never used for combine (it uses
dependencies through registers only), maybe that is what you are seeing?
This makes many "RMW" optimisations need 4-insn combinations, which are
not normally done.

> we might succeed, but only if we had a pattern
> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
> the second element of the parallel, because that's where combine adds it
> to the new i3 pattern, after splitting it out of i2.

That sounds like the backend pattern has it wrong then?  There is a
canonical order for this?

> I suppose adding such patterns manually isn't the way to go.  I wonder
> if getting recog_for_combine to recognize and reorder PARALLELs
> appearing out of order would get too expensive, even if genrecog were to
> generate optimized code to try alternate orders in parallels.

Very big parallels are used, and trying all orderings would take just a
little too much time ;-)

We could do some limited permutations of course.  There are some cases
where you *unavoidably* have this problem (say, when adding three
things together), so this could be useful sometimes.  Maybe just try
permuting the first three arms of the parallel, for example?

> The issue doesn't seem that important in the grand scheme of things, but
> there is some embarrassment from the missed combines and from the AFAICT
> impossibility to get GCC to issue the most compact (and possibly
> fastest) insn sequence on x86* for a 'memory += value;' spelled as
> __builtin_add_overflow, when the result of the overflow checking is
> used.

GCC does not handle RMW to memory very well (partially because it
*cannot* really be handled well).  There are some PRs about this I
think (at least wrt combine).

Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-02  9:47 ` Segher Boessenkool
@ 2020-09-03 10:03   ` Alexandre Oliva
  2020-09-03 15:38     ` Segher Boessenkool
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-03 10:03 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> Hi!
> On Tue, Sep 01, 2020 at 07:22:57PM -0300, Alexandre Oliva wrote:
>> It's ugly, I know.

> Yeah, it's bloody disgusting :-)

:-)

> (And aarch, but not the other targets with default clobbers).

And arm.  I didn't look very hard for others yet; I wasn't sure it would
be worth pursuing any further.

>> even special-casing empty asm patterns (at a slight risk of breaking
>> code that expects implicit clobbers even for empty asm patterns, but
>> empty asm patterns won't clobber flags, so how could it break
>> anything?).

> People write empty asm statements not because they would like no insns
> emitted from it, but *because* they want the other effects an asm has
> (for example, an empty asm usually has no outputs, so it is volatile,
> and that makes sure it is executed in the real machine exactly as often
> as in the abstract machine).  So your expectation might be wrong,
> someone might want an empty asm to clobber cc on x86 (like any asm is
> documented as doing).

I just can't imagine a case in which flags set by an earlier
instruction, and that could be usable by a subsequent instruction, could
possibly be clobbered by an *empty* sequence of insns.  Maybe I just
lack imagination about all the sorts of weird things that people use
such asm statements for...  Got any example of something that could be
broken by such a change to educate me?  Not that I need it, I'm just
curious.

> But how about a "none" clobber?

I like that!

>> I take this might be useful for do-nothing asm
>> statements, often used to stop certain optimizations, e.g.:
>> 
>> __typeof (*p) c = __builtin_add_overflow (*p, 1, p);
>> asm ("" : "+m" (*p)); // Make sure we write to memory.
>> *p += c; // This should compile into an add with carry.

> Wow, nasty.  That asm cannot be optimised away even if the rest is
> (unless GCC can somehow figure out nothing ever uses *p).  Is there no
> better way to do this?

I don't know.  There's a piece of hand-crafted x86 assembly that I'm
trying to find some way to replicate in compiler-optimized code.  I
don't know whether two RMWs could possibly be faster, but there are two
concerns driving me to try to use the same insn sequence:

- compactness; this is for per-block instrumentation, and if we use 4
  larger insns instead of 2 smaller ones, there might be even more
  significant impact on performance because of code cache issues

- threads: though the instrumentation code is definitely not
  thread-safe, and failing to expose the intermediate state should not
  have a significant impact, I wanted to at least be able to compare the
  use of code as in existing instrumentation code with
  compiler-generated instrumentation, and measure any performance
  differences and actual impact on multi-threaded programs.
  
>> Without the asm, we issue load;add;adc;store, which is not the ideal
>> sequence with add and adc to the same memory address (or two different
>> addresses, if the last statement uses say *q instead of *p).

> Is doing two RMWs on memory faster?  Huh.

I haven't really measured it (yet), I just assumed whoever put the asm
code there like that, who seemed to be both knowledgeable and concerned
about the performance impact on a very hot piece of instrumentation
code, knew what they were doing, but when I set out to check it myself,
I've hit this wall.  At first I though GCC was picking the sequence it
did because it was faster, but then, while I investigated, I came to the
conclusion that it picked it because it didn't stand a chance of doing
otherwise, not even if I pushed it really hard.


>> Alas, getting the first add to go straight to memory is more
>> complicated.  Even with the asm that forces the output to memory, the
>> output flag makes it harder to get it optimized to an add-to-memory
>> form.  When the output flag is unused, we optimize it enough in gimple
>> that TER does its job and we issue a single add, but that's not possible
>> when the two outputs of ADD_OVERFLOW are used: the flag setting gets
>> optimized away, but only after stopping combine from turning the
>> load/add/store into an add-to-memory.
>> 
>> If we retried the 3-insn substitution after substituting the flag store
>> into the add for the adc,

> combine should retry every combination if any of the input insns to it
> have changed (put another way, if any insn is changed all combinations
> with it are tried anew).

Alas, they don't change, it's an intervening flag-store that does.  When
we try the 3 insns that make up RMW, while the flag store is still there
before the W, it triggers one of the cant_combine tests.  After it's
gone, we don't reconsider.  Does that count as a bug to be filed?

> But.  Dependencies through memory are never used for combine (it uses
> dependencies through registers only), maybe that is what you are seeing?

No, it's just that the use of the flag between M and W prevents
substituting the M into the W, because then the use of the parallel set
result would be before the M.  If there weren't a use of the parallel
flags set, we'd have combined the 3 RMW insns into 1 just fine.

>> we might succeed, but only if we had a pattern
>> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
>> the second element of the parallel, because that's where combine adds it
>> to the new i3 pattern, after splitting it out of i2.

> That sounds like the backend pattern has it wrong then?  There is a
> canonical order for this?

Much as I can tell, there isn't, it's just an arbitrary choice of
backends, some do it one way or the other, and that causes combine to be
able to perform some combinations but not others.

>> I suppose adding such patterns manually isn't the way to go.  I wonder
>> if getting recog_for_combine to recognize and reorder PARALLELs
>> appearing out of order would get too expensive, even if genrecog were to
>> generate optimized code to try alternate orders in parallels.

> Very big parallels are used, and trying all orderings would take just a
> little too much time ;-)

Oh, I wasn't talking about trying all permutations in recog_for_combine,
but about generating recog code, for combine to use, that could
recognize out-of-order top-level parallels, so that they could be
reordered into something regular recog would recognize.  I suppose this
might involve computing permutations within genrecog...  Here's a hack
just to get a clue about extra time.  On amd64, we go from:

Shared 39391 out of 75775 states by creating 10001 new states, saving 29390

to:

Shared 45474 out of 86045 states by creating 11586 new states, saving 33888

For comparisong purposes, here's what we get permuting up to 8 items in
a parallel (MAX_PERMUTE bumped up to 8), without involving trailing
clobbers in the permutations:

Shared 66206 out of 127097 states by creating 16663 new states, saving 49543

While this enables permutations among trailing clobbers, also up to the
8th item in the parallel (PERMUTE_CLOBBERS 8):

Shared 71133 out of 140924 states by creating 17047 new states, saving 54086

While this adds partial removal of clobbers, leaving up to
8 of them for matching:

Shared 86720 out of 193474 states by creating 17483 new states, saving 69237


What this patchlet does *not* do is to encode the permutations into
e.g. num_clobbers, so that we could apply them afterwards.  With a
32-bit int, we could encode permutations of MAX_PERMUTE <= 8 assuming
the original num_clobbers is always < 52, or <= 7 if num_clobbers could
go up to 415.


Not really tested, other than gathering the state counts above.  Does it
look promising?


diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index 5b5b72f..6daef51 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -5295,20 +5295,15 @@ get_peephole2_pattern (md_rtx_info *info)
   return pattern;
 }
 
-/* Return true if *PATTERN_PTR is a PARALLEL in which at least one trailing
-   rtx can be added automatically by add_clobbers.  If so, update
-   *ACCEPTANCE_PTR so that its num_clobbers field contains the number
-   of such trailing rtxes and update *PATTERN_PTR so that it contains
-   the pattern without those rtxes.  */
+/* Return the number of trailing rtx in a PARALLEL that
+   can be added automatically by add_clobbers.  */
 
-static bool
-remove_clobbers (acceptance_type *acceptance_ptr, rtx *pattern_ptr)
+static int
+remove_clobbers (rtx pattern)
 {
   int i;
-  rtx new_pattern;
 
   /* Find the last non-clobber in the parallel.  */
-  rtx pattern = *pattern_ptr;
   for (i = XVECLEN (pattern, 0); i > 0; i--)
     {
       rtx x = XVECEXP (pattern, 0, i - 1);
@@ -5318,24 +5313,69 @@ remove_clobbers (acceptance_type *acceptance_ptr, rtx *pattern_ptr)
 	break;
     }
 
-  if (i == XVECLEN (pattern, 0))
-    return false;
+  return XVECLEN (pattern, 0) - i;
+}
 
-  /* Build a similar insn without the clobbers.  */
-  if (i == 1)
-    new_pattern = XVECEXP (pattern, 0, 0);
-  else
+/* Match a PATTERN with a top-level PARALLEL, with permutations
+   starting at POS and up to MAX_PERMUTE.  CLOB is the trailing
+   clobbers count (returned by remove_clobbers).  Only up to
+   PERMUTE_CLOBBERS undergo permutation.  PARTIAL_CLOBBERS, if
+   positive, enables matching of patterns with partial removal of
+   trailing clobbers, so that up to (possibly permuted)
+   PARTIAL_CLOBBERS remain.  */
+
+static void
+match_parallel (state *s, md_rtx_info *info, rtx pattern,
+		acceptance_type acceptance, int pos, int clob)
+{
+
+#define MAX_PERMUTE 3
+#define PERMUTE_CLOBBERS 0
+#define PARTIAL_CLOBBERS 0
+
+  int len = XVECLEN (pattern, 0);
+  int len_clob = len - clob;
+
+  if (pos == len)
     {
-      new_pattern = rtx_alloc (PARALLEL);
-      XVEC (new_pattern, 0) = rtvec_alloc (i);
-      for (int j = 0; j < i; ++j)
-	XVECEXP (new_pattern, 0, j) = XVECEXP (pattern, 0, j);
+      match_pattern (s, info, pattern, acceptance);
+      return;
     }
+  
+  match_parallel (s, info, pattern, acceptance, pos + 1, clob);
 
-  /* Recognize it.  */
-  acceptance_ptr->u.full.u.num_clobbers = XVECLEN (pattern, 0) - i;
-  *pattern_ptr = new_pattern;
-  return true;
+  if (pos < len_clob)
+    {
+      for (int i = pos + 1; i < len_clob && i < MAX_PERMUTE; i++)
+	{
+	  std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i));
+	  match_parallel (s, info, pattern, acceptance, pos + 1, clob);
+	  std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i));
+	}
+    }
+  else
+    {
+      for (int i = pos + 1; i < len && i < MAX_PERMUTE - 1
+	     && i < len_clob + PERMUTE_CLOBBERS; i++)
+	{
+	  std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i));
+	  match_parallel (s, info, pattern, acceptance, pos + 1, clob);
+	  std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i));
+	}
+
+      if (pos <= len_clob + PARTIAL_CLOBBERS)
+	{
+	  acceptance.u.full.u.num_clobbers = len - pos;
+	  if (pos == 1)
+	    match_pattern (s, info, XVECEXP (pattern, 0, 0), acceptance);
+	  else
+	    {
+	      XVECLEN (pattern, 0) = pos;
+	      match_pattern (s, info, pattern, acceptance);
+	      XVECLEN (pattern, 0) = len;
+	    }
+	}
+    }
 }
 
 int
@@ -5371,13 +5411,15 @@ main (int argc, const char **argv)
 	    acceptance.u.full.u.num_clobbers = 0;
 	    pattern = add_implicit_parallel (XVEC (def, 1));
 	    validate_pattern (pattern, &info, NULL_RTX, 0);
-	    match_pattern (&insn_root, &info, pattern, acceptance);
 
 	    /* If the pattern is a PARALLEL with trailing CLOBBERs,
 	       allow recog_for_combine to match without the clobbers.  */
-	    if (GET_CODE (pattern) == PARALLEL
-		&& remove_clobbers (&acceptance, &pattern))
+	    if (GET_CODE (pattern) != PARALLEL)
 	      match_pattern (&insn_root, &info, pattern, acceptance);
+	    else
+	      match_parallel (&insn_root, &info, pattern, acceptance,
+			      0, remove_clobbers (pattern));
+
 	    break;
 	  }
 

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 10:03   ` Alexandre Oliva
@ 2020-09-03 15:38     ` Segher Boessenkool
  2020-09-03 17:07       ` Alexandre Oliva
  2020-09-03 17:17       ` Alexandre Oliva
  0 siblings, 2 replies; 14+ messages in thread
From: Segher Boessenkool @ 2020-09-03 15:38 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
> On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > (And aarch, but not the other targets with default clobbers).
> 
> And arm.  I didn't look very hard for others yet; I wasn't sure it would
> be worth pursuing any further.

Oh, I meant what was in your patch.  From a grep, all of mn10300, nds32,
arm, cris, pdp11, rs6000, i386, visium, aarch64 have default clobbers.
Hrm, what you said was the targets with inline asm flag output
constraints?

> > People write empty asm statements not because they would like no insns
> > emitted from it, but *because* they want the other effects an asm has
> > (for example, an empty asm usually has no outputs, so it is volatile,
> > and that makes sure it is executed in the real machine exactly as often
> > as in the abstract machine).  So your expectation might be wrong,
> > someone might want an empty asm to clobber cc on x86 (like any asm is
> > documented as doing).
> 
> I just can't imagine a case in which flags set by an earlier
> instruction, and that could be usable by a subsequent instruction, could
> possibly be clobbered by an *empty* sequence of insns.  Maybe I just
> lack imagination about all the sorts of weird things that people use
> such asm statements for...  Got any example of something that could be
> broken by such a change to educate me?  Not that I need it, I'm just
> curious.

Before your change, flags could not be live through an asm.  After your
change, it can.  So it does change things...  It seems like this should
never matter, but :-)

> > But how about a "none" clobber?
> 
> I like that!

Okay, sold!  Now to convince everyone else ;-)

> I don't know.  There's a piece of hand-crafted x86 assembly that I'm
> trying to find some way to replicate in compiler-optimized code.  I
> don't know whether two RMWs could possibly be faster, but there are two
> concerns driving me to try to use the same insn sequence:
> 
> - compactness; this is for per-block instrumentation, and if we use 4
>   larger insns instead of 2 smaller ones, there might be even more
>   significant impact on performance because of code cache issues

Yes, it *is* smaller code, certainly.  As I said before, GCC does not
generate very good code with RMWs :-/

> >> If we retried the 3-insn substitution after substituting the flag store
> >> into the add for the adc,
> 
> > combine should retry every combination if any of the input insns to it
> > have changed (put another way, if any insn is changed all combinations
> > with it are tried anew).
> 
> Alas, they don't change, it's an intervening flag-store that does.  When
> we try the 3 insns that make up RMW, while the flag store is still there
> before the W, it triggers one of the cant_combine tests.  After it's
> gone, we don't reconsider.  Does that count as a bug to be filed?

An enhancement bug, sure...  "missing optimisation".  But this is
expected behaviour at least :-)

> >> we might succeed, but only if we had a pattern
> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
> >> the second element of the parallel, because that's where combine adds it
> >> to the new i3 pattern, after splitting it out of i2.
> 
> > That sounds like the backend pattern has it wrong then?  There is a
> > canonical order for this?
> 
> Much as I can tell, there isn't, it's just an arbitrary choice of
> backends, some do it one way or the other, and that causes combine to be
> able to perform some combinations but not others.

Hrm, I could swear it was in the documentation somewhere, but I cannot
find it currently.  I can find comments in combine.c that refer to this
though, like
  /* Many machines that don't use CC0 have insns that can both perform an
     arithmetic operation and set the condition code.  These operations will
     be represented as a PARALLEL with the first element of the vector
     being a COMPARE of an arithmetic operation with the constant zero.
     The second element of the vector will set some pseudo to the result
     of the same arithmetic operation.

Ah found it (md.texi, @cindex @code{compare}, canonicalization of):

@item
For instructions that inherently set a condition code register, the
@code{compare} operator is always written as the first RTL expression of
the @code{parallel} instruction pattern.  For example,

@smallexample
(define_insn ""
  [(set (reg:CCZ FLAGS_REG)
        (compare:CCZ
          (plus:SI
            (match_operand:SI 1 "register_operand" "%r")
            (match_operand:SI 2 "register_operand" "r"))
          (const_int 0)))
   (set (match_operand:SI 0 "register_operand" "=r")
        (plus:SI (match_dup 1) (match_dup 2)))]
  ""
  "addl %0, %1, %2")
@end smallexample

Not sure this 100% applies to your case though?

> >> I suppose adding such patterns manually isn't the way to go.  I wonder
> >> if getting recog_for_combine to recognize and reorder PARALLELs
> >> appearing out of order would get too expensive, even if genrecog were to
> >> generate optimized code to try alternate orders in parallels.
> 
> > Very big parallels are used, and trying all orderings would take just a
> > little too much time ;-)
> 
> Oh, I wasn't talking about trying all permutations in recog_for_combine,
> but about generating recog code, for combine to use, that could
> recognize out-of-order top-level parallels, so that they could be
> reordered into something regular recog would recognize.  I suppose this
> might involve computing permutations within genrecog...

In most cases it won't have to try many permutations, but in some it
will have to I think :-(

> Here's a hack
> just to get a clue about extra time.  On amd64, we go from:
> 
> Shared 39391 out of 75775 states by creating 10001 new states, saving 29390
> 
> to:
> 
> Shared 45474 out of 86045 states by creating 11586 new states, saving 33888

That is a lot extra, is it worth it?

> For comparisong purposes, here's what we get permuting up to 8 items in
> a parallel (MAX_PERMUTE bumped up to 8), without involving trailing
> clobbers in the permutations:
> 
> Shared 66206 out of 127097 states by creating 16663 new states, saving 49543
> 
> While this enables permutations among trailing clobbers, also up to the
> 8th item in the parallel (PERMUTE_CLOBBERS 8):
> 
> Shared 71133 out of 140924 states by creating 17047 new states, saving 54086
> 
> While this adds partial removal of clobbers, leaving up to
> 8 of them for matching:
> 
> Shared 86720 out of 193474 states by creating 17483 new states, saving 69237
> 
> 
> What this patchlet does *not* do is to encode the permutations into
> e.g. num_clobbers, so that we could apply them afterwards.  With a
> 32-bit int, we could encode permutations of MAX_PERMUTE <= 8 assuming
> the original num_clobbers is always < 52, or <= 7 if num_clobbers could
> go up to 415.
> 
> 
> Not really tested, other than gathering the state counts above.  Does it
> look promising?

Yes.  But the huge numbers above look scary.  And amd64 isn't the worst
case I think :-(


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 15:38     ` Segher Boessenkool
@ 2020-09-03 17:07       ` Alexandre Oliva
  2020-09-03 18:32         ` Segher Boessenkool
  2020-09-08 14:20         ` Hans-Peter Nilsson
  2020-09-03 17:17       ` Alexandre Oliva
  1 sibling, 2 replies; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-03 17:07 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

[dropping port maintainers for combine-mostly subthread]

On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
>> On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> >> we might succeed, but only if we had a pattern
>> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
>> >> the second element of the parallel, because that's where combine adds it
>> >> to the new i3 pattern, after splitting it out of i2.
>> 
>> > That sounds like the backend pattern has it wrong then?  There is a
>> > canonical order for this?
>> 
>> Much as I can tell, there isn't, it's just an arbitrary choice of
>> backends, some do it one way or the other, and that causes combine to be
>> able to perform some combinations but not others.

> For instructions that inherently set a condition code register, the
> @code{compare} operator is always written as the first RTL expression of
> the @code{parallel} instruction pattern.

Interesting.  I'm pretty sure I read email recently that suggested it
was really up to the port, but I've caught up with GCC emails from years
ago, so that might have been it.  Or I just misremember.  Whatever.

The x86 pattern that fails to match in combine has the flags setter
first, but combine places it second, after splitting it out of i2 and
then appending it back to i3.

Alas, it would be just as legitimate for combine to go the opposite way,
substituting the flags set into another insn, and then tacking the other
set onto the substituted-into insn.

Since there is a canonical order, maybe combine should attempt to follow
that order.  Meanwhile, there's the patchlet below, that gets us (*) a
grand total of *one* additional insn combine (in libbacktrace/dwarf.c)
per bootstrap stage, and then, in target libs, 3 permuted combines in
libhsail-rt/rt/workitems.c and 4 in s-expmod.adb.  Not bad, but I'd
agree it could be regarded as a bit of a waste.

This is not finished, I'd like to abstract out a little more of the
permutation representation, that had to be explicitly used in combine to
extract the actual clobber count before permuting the newly-created
rtvec, but I was also surprised at how easy it turned out to be.  I was
also happy I missed a few orders of magnitude in my earlier estimates
about how far we could go into (representing) permutations without
taking up all of num_clobbers.  It's so loose that I went for bitfields
rather than arithmetic coding.  But the reason I wanted to abstract it
out was to make it easy to switch to a different encoding.

Anyway...  Does this still seem worth pursuing?


permute parallel items in recog

From: Alexandre Oliva <oliva@gnu.org>


---
 gcc/combine.c  |   35 ++++++++++++++++--------
 gcc/genrecog.c |   78 ++++++++++++++++++++++++++++++++++++------------------
 gcc/rtl.h      |   81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 156 insertions(+), 38 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index c88382e..4383ce5 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -11510,11 +11510,14 @@ recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
      them.  Then check to make sure that all of them are dead.  */
   if (num_clobbers_to_add)
     {
+      int nclob = (PERMUTE_MULT (0)
+		   ? (num_clobbers_to_add % PERMUTE_MULT (0))
+		   : num_clobbers_to_add);
       rtx newpat = gen_rtx_PARALLEL (VOIDmode,
 				     rtvec_alloc (GET_CODE (pat) == PARALLEL
 						  ? (XVECLEN (pat, 0)
-						     + num_clobbers_to_add)
-						  : num_clobbers_to_add + 1));
+						     + nclob)
+						  : nclob + 1));
 
       if (GET_CODE (pat) == PARALLEL)
 	for (i = 0; i < XVECLEN (pat, 0); i++)
@@ -11522,19 +11525,27 @@ recog_for_combine_1 (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
       else
 	XVECEXP (newpat, 0, 0) = pat;
 
-      add_clobbers (newpat, insn_code_number);
+      if (num_clobbers_to_add != nclob)
+	recog_unpermute_parallel (newpat, &num_clobbers_to_add);
+      gcc_checking_assert (num_clobbers_to_add == nclob);
 
-      for (i = XVECLEN (newpat, 0) - num_clobbers_to_add;
-	   i < XVECLEN (newpat, 0); i++)
+      if (num_clobbers_to_add)
 	{
-	  if (REG_P (XEXP (XVECEXP (newpat, 0, i), 0))
-	      && ! reg_dead_at_p (XEXP (XVECEXP (newpat, 0, i), 0), insn))
-	    return -1;
-	  if (GET_CODE (XEXP (XVECEXP (newpat, 0, i), 0)) != SCRATCH)
+	  add_clobbers (newpat, insn_code_number);
+
+	  for (i = XVECLEN (newpat, 0) - num_clobbers_to_add;
+	       i < XVECLEN (newpat, 0); i++)
 	    {
-	      gcc_assert (REG_P (XEXP (XVECEXP (newpat, 0, i), 0)));
-	      notes = alloc_reg_note (REG_UNUSED,
-				      XEXP (XVECEXP (newpat, 0, i), 0), notes);
+	      if (REG_P (XEXP (XVECEXP (newpat, 0, i), 0))
+		  && ! reg_dead_at_p (XEXP (XVECEXP (newpat, 0, i), 0), insn))
+		return -1;
+	      if (GET_CODE (XEXP (XVECEXP (newpat, 0, i), 0)) != SCRATCH)
+		{
+		  gcc_assert (REG_P (XEXP (XVECEXP (newpat, 0, i), 0)));
+		  notes = alloc_reg_note (REG_UNUSED,
+					  XEXP (XVECEXP (newpat, 0, i), 0),
+					  notes);
+		}
 	    }
 	}
       pat = newpat;
diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index 5b5b72f..2625660 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -5295,20 +5295,15 @@ get_peephole2_pattern (md_rtx_info *info)
   return pattern;
 }
 
-/* Return true if *PATTERN_PTR is a PARALLEL in which at least one trailing
-   rtx can be added automatically by add_clobbers.  If so, update
-   *ACCEPTANCE_PTR so that its num_clobbers field contains the number
-   of such trailing rtxes and update *PATTERN_PTR so that it contains
-   the pattern without those rtxes.  */
+/* Return the number of trailing rtx in a PARALLEL that
+   can be added automatically by add_clobbers.  */
 
-static bool
-remove_clobbers (acceptance_type *acceptance_ptr, rtx *pattern_ptr)
+static int
+remove_clobbers (rtx pattern)
 {
   int i;
-  rtx new_pattern;
 
   /* Find the last non-clobber in the parallel.  */
-  rtx pattern = *pattern_ptr;
   for (i = XVECLEN (pattern, 0); i > 0; i--)
     {
       rtx x = XVECEXP (pattern, 0, i - 1);
@@ -5318,24 +5313,53 @@ remove_clobbers (acceptance_type *acceptance_ptr, rtx *pattern_ptr)
 	break;
     }
 
-  if (i == XVECLEN (pattern, 0))
-    return false;
+  return XVECLEN (pattern, 0) - i;
+}
 
-  /* Build a similar insn without the clobbers.  */
-  if (i == 1)
-    new_pattern = XVECEXP (pattern, 0, 0);
-  else
+/* Match a PATTERN with a top-level PARALLEL, with permutations
+   starting at POS and up to MAX_PERMUTE.  CLOB is the trailing
+   clobbers count (returned by remove_clobbers).  */
+
+static void
+match_parallel (state *s, md_rtx_info *info, rtx pattern,
+		acceptance_type acceptance, int pos, int clob)
+{
+  int len = XVECLEN (pattern, 0);
+  int len_clob = len - clob;
+
+  if (pos == len)
     {
-      new_pattern = rtx_alloc (PARALLEL);
-      XVEC (new_pattern, 0) = rtvec_alloc (i);
-      for (int j = 0; j < i; ++j)
-	XVECEXP (new_pattern, 0, j) = XVECEXP (pattern, 0, j);
+      match_pattern (s, info, pattern, acceptance);
+      return;
     }
+  
+  match_parallel (s, info, pattern, acceptance, pos + 1, clob);
 
-  /* Recognize it.  */
-  acceptance_ptr->u.full.u.num_clobbers = XVECLEN (pattern, 0) - i;
-  *pattern_ptr = new_pattern;
-  return true;
+  if (pos < len_clob)
+    {
+      for (int i = pos + 1; i < len_clob && i < MAX_PERMUTE; i++)
+	{
+	  acceptance.u.full.u.num_clobbers += PERMUTE_MULT (pos);
+	  std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i));
+	  match_parallel (s, info, pattern, acceptance, pos + 1, clob);
+	  std::swap (XVECEXP (pattern, 0, pos), XVECEXP (pattern, 0, i));
+	}
+    }
+  else if (pos == len_clob)
+    {
+      gcc_checking_assert (clob == len - pos);
+      gcc_checking_assert (clob < PERMUTE_MULT (0));
+
+      acceptance.u.full.u.num_clobbers += clob;
+      if (pos == 1)
+	match_pattern (s, info, XVECEXP (pattern, 0, 0), acceptance);
+      else
+	{
+	  XVECLEN (pattern, 0) = pos;
+	  match_pattern (s, info, pattern, acceptance);
+	  XVECLEN (pattern, 0) = len;
+	}
+    }
 }
 
 int
@@ -5371,13 +5395,15 @@ main (int argc, const char **argv)
 	    acceptance.u.full.u.num_clobbers = 0;
 	    pattern = add_implicit_parallel (XVEC (def, 1));
 	    validate_pattern (pattern, &info, NULL_RTX, 0);
-	    match_pattern (&insn_root, &info, pattern, acceptance);
 
 	    /* If the pattern is a PARALLEL with trailing CLOBBERs,
 	       allow recog_for_combine to match without the clobbers.  */
-	    if (GET_CODE (pattern) == PARALLEL
-		&& remove_clobbers (&acceptance, &pattern))
+	    if (GET_CODE (pattern) != PARALLEL)
 	      match_pattern (&insn_root, &info, pattern, acceptance);
+	    else
+	      match_parallel (&insn_root, &info, pattern, acceptance,
+			      0, remove_clobbers (pattern));
+
 	    break;
 	  }
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 0872cc4..083ed0d 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -4456,4 +4456,85 @@ extern void gt_ggc_mx (rtx &);
 extern void gt_pch_nx (rtx &);
 extern void gt_pch_nx (rtx &, gt_pointer_operator, void *);
 
+
+/* Define to the number of initial elements of a parallel in an insn's
+   pattern that we recognize in any permutation.  The permutation is
+   encoded in upper bits of pnum_clobbers set by recog.  If
+   pnum_clobbers is NULL, no permutations are attempted.  */
+#define MAX_PERMUTE 3
+
+/* These are multipliers used to encode the above permutations.  We
+   could use a more compact representation, but using bitfields is
+   nicer.  The permutations of the pattern described in the md file
+   for it to match an insn are encoded as follows:
+
+   For each POS from 0 to PERMUTE_MAX-1:
+
+     Select Choice as any of the items from POS to PERMUTE_MAX-1;
+     Swap the item at Choice with that at Pos;
+     Encode it as (Choice - Pos) multiplied by PERMUTE_MULT (Pos).
+
+   For the last iteration, namely Pos = PERMUTE_MAX-1, there is only
+   one choice, thus nothing to encode.
+
+   For the 1-before-last iteration, there are 2 choices, so we need
+   one bit: 1<<30 (we don't want to use the sign bit).
+
+   For the 2-before-last iteration, there are 3 choices, so we use the
+   next two bits, thus 1<<28 is the multiplier.
+
+   Generalizing, for Pos = PERMUTE_MAX - 2 - I, we use the multiplier
+   at index I.  */
+static const int num_clobbers_permute_multipliers[] = {
+  1 << 30,
+  1 << 28, 1 << 26,
+  1 << 23, 1 << 20, 1 << 17, 1 << 14,
+  1 << 10, 1 <<  6, 1 <<  2
+};
+
+/* Make sure to not define MAX_PERMUTE too high for the array.  */
+#define MAX_MAX_PERMUTE (2 + ARRAY_SIZE (num_clobbers_permute_multipliers))
+static_assert (MAX_PERMUTE <= MAX_MAX_PERMUTE,
+	       "MAX_PERMUTE is too big");
+
+/* Get the multiplier to use for the permutation choice at POS.  */
+#define PERMUTE_MULT(POS) \
+  (num_clobbers_permute_multipliers[MAX_PERMUTE - 2 - (POS)])
+
+/* Return the mask for the encoding of hte permutation choice at POS.  */
+static inline int
+PERMUTE_MASK(int Pos)
+{
+  int i = MAX_PERMUTE - 2 - Pos;
+
+  if (i < 0)
+    return 0;
+
+  if (i == 0)
+    return num_clobbers_permute_multipliers[0];
+
+  return (num_clobbers_permute_multipliers[i-1]
+	  - num_clobbers_permute_multipliers[i]);
+}
+
+/* Modify PAT, recognized with a permutatoin encoded in *NUMCLOB, to
+   its canonical form, i.e., one that would be recognized without
+   permutations.  PAT is expected to be a PARALLEL if there is any
+   permutation to make; recog will not encode permutations if there
+   isn't a PARALLEL pattern.  */
+static inline void
+recog_unpermute_parallel (rtx pat, int *numclob)
+{
+  for (int pos = MAX_PERMUTE - 2; pos >= 0; pos--)
+    {
+      int masked = *numclob & PERMUTE_MASK (pos);
+      if (!masked)
+	continue;
+      *numclob -= masked;
+      int choice = masked / PERMUTE_MULT (pos);
+      std::swap (XVECEXP (pat, 0, pos),
+		 XVECEXP (pat, 0, pos + choice));
+    }
+}
+
 #endif /* ! GCC_RTL_H */


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 15:38     ` Segher Boessenkool
  2020-09-03 17:07       ` Alexandre Oliva
@ 2020-09-03 17:17       ` Alexandre Oliva
  2020-09-03 19:01         ` Segher Boessenkool
  1 sibling, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-03 17:17 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
>> On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> > (And aarch, but not the other targets with default clobbers).
>> 
>> And arm.  I didn't look very hard for others yet; I wasn't sure it would
>> be worth pursuing any further.

> Oh, I meant what was in your patch.  From a grep, all of mn10300, nds32,
> arm, cris, pdp11, rs6000, i386, visium, aarch64 have default clobbers.
> Hrm, what you said was the targets with inline asm flag output
> constraints?

I found implicit cc clobbers on x86 only, and then I stumbled across the
same notation used for asm cc-setting in arm/aarch64, but that was all,
I didn't look any further (yet).  I'll look at the other targets with
asm_adjust hooks and see whether they need adjustments for "none", if we
agree that's the way to go.


>> > People write empty asm statements not because they would like no insns
>> > emitted from it, but *because* they want the other effects an asm has
>> > (for example, an empty asm usually has no outputs, so it is volatile,
>> > and that makes sure it is executed in the real machine exactly as often
>> > as in the abstract machine).  So your expectation might be wrong,
>> > someone might want an empty asm to clobber cc on x86 (like any asm is
>> > documented as doing).
>> 
>> I just can't imagine a case in which flags set by an earlier
>> instruction, and that could be usable by a subsequent instruction, could
>> possibly be clobbered by an *empty* sequence of insns.  Maybe I just
>> lack imagination about all the sorts of weird things that people use
>> such asm statements for...  Got any example of something that could be
>> broken by such a change to educate me?  Not that I need it, I'm just
>> curious.

> Before your change, flags could not be live through an asm.

While implementing this patchlet, I learned that standard asms (as
opposed to GCC-extended ones) only undergo machine-dependent adjustments
if their asm pattern string is non-empty.  So there is precedent.  I may
even have a finger on that precedent, from very long ago.

> After your change, it can.  So it does change things...

No doubt it does change things, that's not the point.  The point is how
could it break things.  If the compiler can see flags are set before the
asm, and it can see the flags could be used after the asm, and the asm
has no code whatsoever in it that could possibly clobber the flags...
What could possibly go wrong?  (Famous last words? :-)



>> > But how about a "none" clobber?
>> 
>> I like that!

> Okay, sold!  Now to convince everyone else ;-)

Here's a prototype implementation, x86 only.  (ARM and Thumb do not add
any clobbers by default, so it's already good for "none")  I haven't yet
looked at other targets.  It's missing documentation and tests, too.


enable flags-unchanging asms

---
 gcc/cfgexpand.c        |    6 ++++++
 gcc/config/i386/i386.c |    2 +-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index b334ea0..f7ae789 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -2986,6 +2986,12 @@ expand_asm_stmt (gasm *stmt)
           const char *regname = TREE_STRING_POINTER (TREE_VALUE (t));
 	  int nregs, j;
 
+	  if (i == 0 && strcmp (regname, "none") == 0)
+	    {
+	      clobber_rvec.safe_push (const0_rtx);
+	      continue;
+	    }
+
 	  j = decode_reg_name_and_count (regname, &nregs);
 	  if (j < 0)
 	    {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a15807d..2afff12 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21142,7 +21142,7 @@ ix86_md_asm_adjust (vec<rtx> &outputs, vec<rtx> &/*inputs*/,
   rtx_insn *seq = get_insns ();
   end_sequence ();
 
-  if (saw_asm_flag)
+  if (saw_asm_flag || (clobbers.length () > 0 && clobbers[0] == const0_rtx))
     return seq;
   else
     {


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 17:07       ` Alexandre Oliva
@ 2020-09-03 18:32         ` Segher Boessenkool
  2020-09-03 19:31           ` Alexandre Oliva
  2020-09-08 14:20         ` Hans-Peter Nilsson
  1 sibling, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2020-09-03 18:32 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc-patches

On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote:
> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
> >> On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >> >> we might succeed, but only if we had a pattern
> >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
> >> >> the second element of the parallel, because that's where combine adds it
> >> >> to the new i3 pattern, after splitting it out of i2.
> >> 
> >> > That sounds like the backend pattern has it wrong then?  There is a
> >> > canonical order for this?
> >> 
> >> Much as I can tell, there isn't, it's just an arbitrary choice of
> >> backends, some do it one way or the other, and that causes combine to be
> >> able to perform some combinations but not others.
> 
> > For instructions that inherently set a condition code register, the
> > @code{compare} operator is always written as the first RTL expression of
> > the @code{parallel} instruction pattern.
> 
> Interesting.  I'm pretty sure I read email recently that suggested it
> was really up to the port, but I've caught up with GCC emails from years
> ago, so that might have been it.  Or I just misremember.  Whatever.

I think you remember right.  But combine depends on the documented
order, and so does compare-elim (since 4f0473fe89e6), so now the
documented order is always the only wanted one.

> The x86 pattern that fails to match in combine has the flags setter
> first, but combine places it second, after splitting it out of i2 and
> then appending it back to i3.

What does that RTL look like exactly?  This canonical form is only for
a set of the flags as a compare to 0 of what the other set sets (hrm, I
hope you can make sense of that).

> Alas, it would be just as legitimate for combine to go the opposite way,
> substituting the flags set into another insn, and then tacking the other
> set onto the substituted-into insn.

Combine always generates the canonical form (for this, anyway; and it is
a missed optimisation bug if it makes something non-canonical anywhere).

Do you have a simple testcase?  Or a -fdump-rtl-combine-all dump.


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 17:17       ` Alexandre Oliva
@ 2020-09-03 19:01         ` Segher Boessenkool
  2020-09-03 19:46           ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2020-09-03 19:01 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

On Thu, Sep 03, 2020 at 02:17:04PM -0300, Alexandre Oliva wrote:
> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > Oh, I meant what was in your patch.  From a grep, all of mn10300, nds32,
> > arm, cris, pdp11, rs6000, i386, visium, aarch64 have default clobbers.
> > Hrm, what you said was the targets with inline asm flag output
> > constraints?
> 
> I found implicit cc clobbers on x86 only, and then I stumbled across the
> same notation used for asm cc-setting in arm/aarch64, but that was all,
> I didn't look any further (yet).

mn10300, cris, pdp11 (twice), i386, visium at least.

> I'll look at the other targets with
> asm_adjust hooks and see whether they need adjustments for "none", if we
> agree that's the way to go.

The idea is that none of them will need adjustment.  This hook runs
before the "none" code will run, and it can just clear all clobbers
then.

> > Before your change, flags could not be live through an asm.
> 
> While implementing this patchlet, I learned that standard asms (as
> opposed to GCC-extended ones) only undergo machine-dependent adjustments
> if their asm pattern string is non-empty.  So there is precedent.  I may
> even have a finger on that precedent, from very long ago.

Exceptions can look reasonable and easy, until you have to document it,
and then write down how it interacts with other stuff :-/

> > After your change, it can.  So it does change things...
> 
> No doubt it does change things, that's not the point.  The point is how
> could it break things.  If the compiler can see flags are set before the
> asm, and it can see the flags could be used after the asm, and the asm
> has no code whatsoever in it that could possibly clobber the flags...
> What could possibly go wrong?  (Famous last words? :-)

The user could want to use an empty asm to influence what code is
generated.  Yes, I know this isn't a very strong argument :-)

> >> > But how about a "none" clobber?
> >> 
> >> I like that!
> 
> > Okay, sold!  Now to convince everyone else ;-)
> 
> Here's a prototype implementation, x86 only.  (ARM and Thumb do not add
> any clobbers by default, so it's already good for "none")  I haven't yet
> looked at other targets.  It's missing documentation and tests, too.

I think we can just do this in expand_asm_stmt?  It makes a first pass
over the clobbers as well, before actually expanding things, so we could
just set a flag and not call md_asm_adjust?  Maybe "nodefault" is a
better name than "none" btw (must be close to getting it done, if we are
picking colours for the bikeshed :-) )

Or, hrm, doing it exactly like that will disable all =@xxx as well.
That doesn't matter for x86, and probably nowhere in practice, but eww.

Kill all clobbers immediately after calling md_asm_adjust?


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 18:32         ` Segher Boessenkool
@ 2020-09-03 19:31           ` Alexandre Oliva
  2020-09-04 14:51             ` Segher Boessenkool
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-03 19:31 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote:
>> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
>> >> On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> >> >> we might succeed, but only if we had a pattern
>> >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
>> >> >> the second element of the parallel, because that's where combine adds it
>> >> >> to the new i3 pattern, after splitting it out of i2.
>> >> 
>> >> > That sounds like the backend pattern has it wrong then?  There is a
>> >> > canonical order for this?
>> >> 
>> >> Much as I can tell, there isn't, it's just an arbitrary choice of
>> >> backends, some do it one way or the other, and that causes combine to be
>> >> able to perform some combinations but not others.
>> 
>> > For instructions that inherently set a condition code register, the
>> > @code{compare} operator is always written as the first RTL expression of
>> > the @code{parallel} instruction pattern.
>> 
>> Interesting.  I'm pretty sure I read email recently that suggested it
>> was really up to the port, but I've caught up with GCC emails from years
>> ago, so that might have been it.  Or I just misremember.  Whatever.

> I think you remember right.  But combine depends on the documented
> order

Except when it doesn't ;-)

Under:

  /* If the actions of the earlier insns must be kept
     in addition to substituting them into the latest one,
     we must make a new PARALLEL for the latest insn
     to hold additional the SETs.  */

it turns the RMW add pattern into a PARALLEL and adds to it i0's
flags-setter, that had been split out of i2.  The PARALLEL 
has the flag-setter as the second in the parallel, as you can see below,
and nothing (other than my patch) attempts to change that order.

> What does that RTL look like exactly?

Trying 5, 7 -> 15:
    5: r91:SI=zero_extend([`i'])
    7: {flags:CCC=cmp(r91:SI#0+0x1,r91:SI#0);r92:QI=r91:SI#0+0x1;}
      REG_DEAD r91:SI
   15: [`i']=r92:QI
      REG_DEAD r92:QI
Successfully matched this instruction:
(parallel [
        (set (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b4
0 i>) [0 i+0 S1 A8])
            (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ff
ff7ff7b40 i>) [0 i+0 S1 A8])
                (const_int 1 [0x1])))
        (set (reg:CCC 17 flags)
            (compare:CCC (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])
                    (const_int 1 [0x1]))
                (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])))
    ])

The reason it successfully matched in this combine dump is that the
above had the permuting recog for combine that I posted earlier enabled.
Without that patch, it wouldn't be recognized, and ended up rejected.

> Do you have a simple testcase?  Or a -fdump-rtl-combine-all dump.

Not so simple, I'm afraid.  I mean, the testcase is simple, but
reproducing this behavior isn't.  The small testcase I'm using only gets
to that point if:

- we allow flags to pass through extended asms, as in the other patch

- I arrange for combine to give the above set of insns a second shot,
  after the removal of the intervening flag-store insn, that prevents
  the combination in the first try

typedef unsigned char T;
T i;
int inc ()
{
  T c = __builtin_add_overflow (i, 1, &i);
  asm ("" : "+m" (i) : : "none");
  i += c;
}


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 19:01         ` Segher Boessenkool
@ 2020-09-03 19:46           ` Alexandre Oliva
  2020-09-04 15:02             ` Segher Boessenkool
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-03 19:46 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 03, 2020 at 02:17:04PM -0300, Alexandre Oliva wrote:
>> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> > Oh, I meant what was in your patch.  From a grep, all of mn10300, nds32,
>> > arm, cris, pdp11, rs6000, i386, visium, aarch64 have default clobbers.
>> > Hrm, what you said was the targets with inline asm flag output
>> > constraints?
>> 
>> I found implicit cc clobbers on x86 only, and then I stumbled across the
>> same notation used for asm cc-setting in arm/aarch64, but that was all,
>> I didn't look any further (yet).

> mn10300, cris, pdp11 (twice), i386, visium at least.

Thanks, yeah, I was hoping to get the general approach sorted out before
going through all ports.

> The idea is that none of them will need adjustment.  This hook runs
> before the "none" code will run, and it can just clear all clobbers
> then.

Uhh...  That's not how asm expansion works.  We process inputs, outputs,
clobbers and labels first, *then* we pass all the cooked ops to the
target hook, so that it gets a chance to, well, adjust stuff.

I don't think clobber "none" should disable target adjustments; IMHO, it
should be recognized by target adjustments, and cause them to not add
stuff they normally would.

> The user could want to use an empty asm to influence what code is
> generated.  Yes, I know this isn't a very strong argument :-)

Well, it is to me, it's exactly what I'm doing here, you know? :-)
But it's not the kind of "things going wrong" that I was looking for.

> just set a flag and not call md_asm_adjust?

I don't like that.  When we move some asm processing between
target-specific and generic, in either direction, suddenly "none"
affects whether or not the processing gets done.  I don't like that.

> Maybe "nodefault" is a
> better name than "none" btw

For that, I agree, but I don't think that's a good feature to have.  I'd
rather have "none" and keep the ability to introduce target-specific
adjustments à-la "=@cc*".  There's no good reason IMHO to have that hook
disabled by "none" or by "nodefault".

> (must be close to getting it done, if we are picking colours for the
> bikeshed :-) )

:-D

> Kill all clobbers immediately after calling md_asm_adjust?

Why?  We don't want to discard any clobbers mentioned by the user, and
we (well, I :-) even want to give the target hook a chance to
adjust/remove any of those clobbers.  That requires the target hook to
be aware of the presence of "none", which is fine with me.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 19:31           ` Alexandre Oliva
@ 2020-09-04 14:51             ` Segher Boessenkool
  2020-09-07 22:47               ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Segher Boessenkool @ 2020-09-04 14:51 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: gcc-patches

Hi!

On Thu, Sep 03, 2020 at 04:31:35PM -0300, Alexandre Oliva wrote:
> Except when it doesn't ;-)

Heh.  PRs welcome :-)

> Under:
> 
>   /* If the actions of the earlier insns must be kept
>      in addition to substituting them into the latest one,
>      we must make a new PARALLEL for the latest insn
>      to hold additional the SETs.  */
> 
> it turns the RMW add pattern into a PARALLEL and adds to it i0's
> flags-setter, that had been split out of i2.  The PARALLEL 
> has the flag-setter as the second in the parallel, as you can see below,
> and nothing (other than my patch) attempts to change that order.

Please open a PR?  (But see below first.)

> > What does that RTL look like exactly?
> 
> Trying 5, 7 -> 15:
>     5: r91:SI=zero_extend([`i'])
>     7: {flags:CCC=cmp(r91:SI#0+0x1,r91:SI#0);r92:QI=r91:SI#0+0x1;}
>       REG_DEAD r91:SI
>    15: [`i']=r92:QI
>       REG_DEAD r92:QI
> Successfully matched this instruction:
> (parallel [
>         (set (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b4
> 0 i>) [0 i+0 S1 A8])
>             (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ff
> ff7ff7b40 i>) [0 i+0 S1 A8])
>                 (const_int 1 [0x1])))
>         (set (reg:CCC 17 flags)
>             (compare:CCC (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])
>                     (const_int 1 [0x1]))
>                 (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])))
>     ])

This isn't the case with the canonical form (the compare is not a
compare against 0).


Btw, we could perhaps try permuting arms in combine itself (not in
recog)?  In combine we know that there are at most 4 sets (and usually
at most 3), and we also often have a good idea what swaps are likely
to work and which not, so maybe trying just one extra thing will have
good results already.


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 19:46           ` Alexandre Oliva
@ 2020-09-04 15:02             ` Segher Boessenkool
  0 siblings, 0 replies; 14+ messages in thread
From: Segher Boessenkool @ 2020-09-04 15:02 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: gcc-patches, hubicka, ubizjak, richard.earnshaw,
	richard.sandiford, marcus.shawcroft, kyrylo.tkachov, nickc,
	ramana.radhakrishnan

On Thu, Sep 03, 2020 at 04:46:43PM -0300, Alexandre Oliva wrote:
> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > The idea is that none of them will need adjustment.  This hook runs
> > before the "none" code will run, and it can just clear all clobbers
> > then.
> 
> Uhh...  That's not how asm expansion works.  We process inputs, outputs,
> clobbers and labels first, *then* we pass all the cooked ops to the
> target hook, so that it gets a chance to, well, adjust stuff.

And after that you do the "none" stuff, not before!

> I don't think clobber "none" should disable target adjustments; IMHO, it
> should be recognized by target adjustments, and cause them to not add
> stuff they normally would.

Yeah, big thinko on my side there.  But my point remains: if we can do
this in generic code (and it looks like we can), don't do it in all 50+
targets (or the tennish affected ones) :-)

> > The user could want to use an empty asm to influence what code is
> > generated.  Yes, I know this isn't a very strong argument :-)
> 
> Well, it is to me, it's exactly what I'm doing here, you know? :-)

"I hadn't noticed yet".  Hehe.

> But it's not the kind of "things going wrong" that I was looking for.
> 
> > just set a flag and not call md_asm_adjust?
> 
> I don't like that.  When we move some asm processing between
> target-specific and generic, in either direction, suddenly "none"
> affects whether or not the processing gets done.  I don't like that.

But we do not *have* "none" in trunk yet (and so, of course also not in
any released GCC version, the important consideration).

Or you nmean something else that I don't see?

> > Maybe "nodefault" is a
> > better name than "none" btw
> 
> For that, I agree, but I don't think that's a good feature to have.  I'd
> rather have "none" and keep the ability to introduce target-specific
> adjustments à-la "=@cc*".  There's no good reason IMHO to have that hook
> disabled by "none" or by "nodefault".

"none" would disable all default *clobbers* only.  Not outputs.


Segher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-04 14:51             ` Segher Boessenkool
@ 2020-09-07 22:47               ` Alexandre Oliva
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Oliva @ 2020-09-07 22:47 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches

On Sep  4, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> Please open a PR?  (But see below first.)

I saw the bit below, then filed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96965

> Btw, we could perhaps try permuting arms in combine itself (not in
> recog)?  In combine we know that there are at most 4 sets (and usually
> at most 3), and we also often have a good idea what swaps are likely
> to work and which not, so maybe trying just one extra thing will have
> good results already.

That makes sense.  Perhaps for starters we could just recall that the
original i2 was (parallel [i1 i2]), i.e., with i1 appearing first, so
that added_sets tried to insert i1 first rather than last.  That would
catch this case, and likely any case in which we i1 is split out of i2
and i2 rather than i1 feeds i3.  When it is i1 that feeds i3, we'll have
added_sets_2 and add i2 last in the i3 parallel, which is happens to
already be the most likely order for recog.  In both cases, the
likelihood follows from the fact that i1 and i2 were split, in this
order, out of the original i2's parallel.

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
  2020-09-03 17:07       ` Alexandre Oliva
  2020-09-03 18:32         ` Segher Boessenkool
@ 2020-09-08 14:20         ` Hans-Peter Nilsson
  1 sibling, 0 replies; 14+ messages in thread
From: Hans-Peter Nilsson @ 2020-09-08 14:20 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: Segher Boessenkool, gcc-patches

On Thu, 3 Sep 2020, Alexandre Oliva wrote:
> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > For instructions that inherently set a condition code register, the
> > @code{compare} operator is always written as the first RTL expression of
> > the @code{parallel} instruction pattern.
>
> Interesting.  I'm pretty sure I read email recently that suggested it
> was really up to the port, but I've caught up with GCC emails from years
> ago, so that might have been it.  Or I just misremember.  Whatever.

If you remember far enough back, you were right. :)

As I recall it, at one time, it was up to the port.  Then, some
time after the x86 port was decc0rated, and cc0 judged evil (but
before the infrastructure changes to seriously support
decc0ration), it became important and there was a discussion on
canonicalizing the order.  I remember arguing for the
flags-setting to be last in the parallel, consistent with the
clobber canonically being last and the "important" part of the
insn should be first, (possibly also having observed combine
ordering as you did) but that's not the way it turned out.
I have a faint memory about the order in the x86 patterns even
being used as an argument!

> Since there is a canonical order, maybe combine should attempt to follow
> that order.

> Anyway...  Does this still seem worth pursuing?

Are you referring to your non-flags-clobbering or making combine
order flags-side-effect parallels canonically?

I don't have an opinion in the former, but IMHO, yes, getting
the combined order right seems worthwile.  (No, I have no
targets that'd benefit from this.)

brgds, H-P

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-08 14:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-01 22:22 [RFC] enable flags-unchanging asms, add_overflow/expand/combine woes Alexandre Oliva
2020-09-02  9:47 ` Segher Boessenkool
2020-09-03 10:03   ` Alexandre Oliva
2020-09-03 15:38     ` Segher Boessenkool
2020-09-03 17:07       ` Alexandre Oliva
2020-09-03 18:32         ` Segher Boessenkool
2020-09-03 19:31           ` Alexandre Oliva
2020-09-04 14:51             ` Segher Boessenkool
2020-09-07 22:47               ` Alexandre Oliva
2020-09-08 14:20         ` Hans-Peter Nilsson
2020-09-03 17:17       ` Alexandre Oliva
2020-09-03 19:01         ` Segher Boessenkool
2020-09-03 19:46           ` Alexandre Oliva
2020-09-04 15:02             ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).