[PATCH 0/3] testsuite: Exclude vector tests for unsupported targets

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] testsuite: Exclude vector tests for unsupported targets
@ 2023-07-06 21:35 Maciej W. Rozycki
  2023-07-06 21:35 ` [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported Maciej W. Rozycki
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-06 21:35 UTC (permalink / raw)
  To: Rainer Orth, Mike Stump; +Cc: gcc-patches

Hi,

 In the course of verifying an out-of-tree RISC-V target that has a vendor
extension providing hardware support for vector operations on pairs of 
single floating-point values (similar to MIPS paired-single or Power SPE 
vector types) I have come across a couple of tests that fail just because 
they expect GCC to produce code this particular hardware does not support.  
Therefore I have created this small patch series, which marks the features 
required for the test cases to be relevant, which makes them unsupported 
for the hardware concerned.  For further details see individual change 
descriptions.

 This patch series has been verified with an `x86_64-linux-gnu' native 
configuration.  I could verify it with MIPS paired-single hw sometime, but 
I'm not currently set up for it and I think the changes are obvious enough 
regardless.

 OK to apply?  As testsuite fixes I think the changes also qualify for 
backporting to active release branches.

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported
  2023-07-06 21:35 [PATCH 0/3] testsuite: Exclude vector tests for unsupported targets Maciej W. Rozycki
@ 2023-07-06 21:35 ` Maciej W. Rozycki
  2023-07-07  6:24   ` Richard Biener
  2023-07-06 21:36 ` [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c Maciej W. Rozycki
  2023-07-06 21:36 ` [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c Maciej W. Rozycki
  2 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-06 21:35 UTC (permalink / raw)
  To: Rainer Orth, Mike Stump; +Cc: gcc-patches

Similarly to checks for vectors of 32 bits and 64 bits being supported 
add one for vectors of 128 bits.

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect128): New 
	procedure.
---
 gcc/testsuite/lib/target-supports.exp |    6 ++++++
 1 file changed, 6 insertions(+)

gcc-test-effective-target-vect128.diff
Index: gcc/gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc.orig/gcc/testsuite/lib/target-supports.exp
+++ gcc/gcc/testsuite/lib/target-supports.exp
@@ -8599,6 +8599,12 @@ proc check_effective_target_vect_variabl
     return [expr { [lindex [available_vector_sizes] 0] == 0 }]
 }
 
+# Return 1 if the target supports vectors of 128 bits.
+
+proc check_effective_target_vect128 { } {
+    return [expr { [lsearch -exact [available_vector_sizes] 128] >= 0 }]
+}
+
 # Return 1 if the target supports vectors of 64 bits.
 
 proc check_effective_target_vect64 { } {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-06 21:35 [PATCH 0/3] testsuite: Exclude vector tests for unsupported targets Maciej W. Rozycki
  2023-07-06 21:35 ` [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported Maciej W. Rozycki
@ 2023-07-06 21:36 ` Maciej W. Rozycki
  2023-07-07  6:33   ` Richard Biener
  2023-07-06 21:36 ` [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c Maciej W. Rozycki
  2 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-06 21:36 UTC (permalink / raw)
  To: Rainer Orth, Mike Stump; +Cc: gcc-patches

The bb-slp-pr95839.c test assumes quad-single float vector support, but 
some targets only support pairs of floats, causing this test to fail 
with such targets.  Limit this test to targets that support at least 
128-bit vectors then, and add a complementing test that can be run with 
targets that have support for 64-bit vectors only.  There is no need to 
adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest 
vector of doubles, so support is implied by the presence of vectors of 
doubles.

	gcc/testsuite/
	* gcc.dg/vect/bb-slp-pr95839.c: Limit to `vect128' targets.
	* gcc.dg/vect/bb-slp-pr95839-v8.c: New test.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c |   14 ++++++++++++++
 gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c    |    1 +
 2 files changed, 15 insertions(+)

gcc-test-bb-slp-pr95839-vect128.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect64 } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef float __attribute__((vector_size(8))) v2f32;
+
+v2f32 f(v2f32 a, v2f32 b)
+{
+  /* Check that we vectorize this CTOR without any loads.  */
+  return (v2f32){a[0] + b[0], a[1] + b[1]};
+}
+
+/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */
Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
+++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target vect128 } */
 /* { dg-additional-options "-w -Wno-psabi" } */
 
 typedef float __attribute__((vector_size(16))) v4f32;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c
  2023-07-06 21:35 [PATCH 0/3] testsuite: Exclude vector tests for unsupported targets Maciej W. Rozycki
  2023-07-06 21:35 ` [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported Maciej W. Rozycki
  2023-07-06 21:36 ` [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c Maciej W. Rozycki
@ 2023-07-06 21:36 ` Maciej W. Rozycki
  2023-07-07  6:33   ` Richard Biener
  2 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-06 21:36 UTC (permalink / raw)
  To: Rainer Orth, Mike Stump; +Cc: gcc-patches

The pr97428.c test assumes support for vectors of doubles, but some 
targets only support vectors of floats, causing this test to fail with 
such targets.  Limit this test to targets that support vectors of 
doubles then.

	gcc/testsuite/
	* gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
---
 gcc/testsuite/gcc.dg/vect/pr97428.c |    1 +
 1 file changed, 1 insertion(+)

gcc-test-pr97428-vect-double.diff
Index: gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
===================================================================
--- gcc.orig/gcc/testsuite/gcc.dg/vect/pr97428.c
+++ gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
 
 typedef struct { double re, im; } dcmlx_t;
 typedef struct { double re[4], im[4]; } dcmlx4_t;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported
  2023-07-06 21:35 ` [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported Maciej W. Rozycki
@ 2023-07-07  6:24   ` Richard Biener
  2023-07-11 15:00     ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-07  6:24 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Thu, Jul 6, 2023 at 11:36 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> Similarly to checks for vectors of 32 bits and 64 bits being supported
> add one for vectors of 128 bits.

OK

>         gcc/testsuite/
>         * lib/target-supports.exp (check_effective_target_vect128): New
>         procedure.
> ---
>  gcc/testsuite/lib/target-supports.exp |    6 ++++++
>  1 file changed, 6 insertions(+)
>
> gcc-test-effective-target-vect128.diff
> Index: gcc/gcc/testsuite/lib/target-supports.exp
> ===================================================================
> --- gcc.orig/gcc/testsuite/lib/target-supports.exp
> +++ gcc/gcc/testsuite/lib/target-supports.exp
> @@ -8599,6 +8599,12 @@ proc check_effective_target_vect_variabl
>      return [expr { [lindex [available_vector_sizes] 0] == 0 }]
>  }
>
> +# Return 1 if the target supports vectors of 128 bits.
> +
> +proc check_effective_target_vect128 { } {
> +    return [expr { [lsearch -exact [available_vector_sizes] 128] >= 0 }]
> +}
> +
>  # Return 1 if the target supports vectors of 64 bits.
>
>  proc check_effective_target_vect64 { } {

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-06 21:36 ` [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c Maciej W. Rozycki
@ 2023-07-07  6:33   ` Richard Biener
  2023-07-11 15:01     ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-07  6:33 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Thu, Jul 6, 2023 at 11:37 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> The bb-slp-pr95839.c test assumes quad-single float vector support, but
> some targets only support pairs of floats, causing this test to fail
> with such targets.  Limit this test to targets that support at least
> 128-bit vectors then, and add a complementing test that can be run with
> targets that have support for 64-bit vectors only.  There is no need to
> adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest
> vector of doubles, so support is implied by the presence of vectors of
> doubles.

I wonder why you see the testcase FAIL, on x86-64 when doing

typedef float __attribute__((vector_size(32))) v4f32;

v4f32 f(v4f32 a, v4f32 b)
{
  /* Check that we vectorize this CTOR without any loads.  */
  return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
  a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
}

I see we vectorize the add and the "store".  We fail to perform
extraction from the incoming vectors (unless you enable AVX),
that's a missed optimization.

So with paired floats I would expect sth similar?  Maybe
x86 is saved by kind-of-presence (but disabled) of V8SFmode vectors.

That said, we should handle this better so can you file an
enhancement bugreport for this?

Thanks,
Richard.

>         gcc/testsuite/
>         * gcc.dg/vect/bb-slp-pr95839.c: Limit to `vect128' targets.
>         * gcc.dg/vect/bb-slp-pr95839-v8.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c |   14 ++++++++++++++
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c    |    1 +
>  2 files changed, 15 insertions(+)
>
> gcc-test-bb-slp-pr95839-vect128.diff
> Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
> ===================================================================
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839-v8.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect64 } */
> +/* { dg-additional-options "-w -Wno-psabi" } */
> +
> +typedef float __attribute__((vector_size(8))) v2f32;
> +
> +v2f32 f(v2f32 a, v2f32 b)
> +{
> +  /* Check that we vectorize this CTOR without any loads.  */
> +  return (v2f32){a[0] + b[0], a[1] + b[1]};
> +}
> +
> +/* { dg-final { scan-tree-dump "optimized: basic block" "slp2" } } */
> Index: gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
> ===================================================================
> --- gcc.orig/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
> +++ gcc/gcc/testsuite/gcc.dg/vect/bb-slp-pr95839.c
> @@ -1,5 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target vect128 } */
>  /* { dg-additional-options "-w -Wno-psabi" } */
>
>  typedef float __attribute__((vector_size(16))) v4f32;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c
  2023-07-06 21:36 ` [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c Maciej W. Rozycki
@ 2023-07-07  6:33   ` Richard Biener
  2023-07-11 15:01     ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-07  6:33 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Thu, Jul 6, 2023 at 11:37 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> The pr97428.c test assumes support for vectors of doubles, but some
> targets only support vectors of floats, causing this test to fail with
> such targets.  Limit this test to targets that support vectors of
> doubles then.

OK.

>         gcc/testsuite/
>         * gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
> ---
>  gcc/testsuite/gcc.dg/vect/pr97428.c |    1 +
>  1 file changed, 1 insertion(+)
>
> gcc-test-pr97428-vect-double.diff
> Index: gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
> ===================================================================
> --- gcc.orig/gcc/testsuite/gcc.dg/vect/pr97428.c
> +++ gcc/gcc/testsuite/gcc.dg/vect/pr97428.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
>
>  typedef struct { double re, im; } dcmlx_t;
>  typedef struct { double re[4], im[4]; } dcmlx4_t;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported
  2023-07-07  6:24   ` Richard Biener
@ 2023-07-11 15:00     ` Maciej W. Rozycki
  0 siblings, 0 replies; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-11 15:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Fri, 7 Jul 2023, Richard Biener wrote:

> > Similarly to checks for vectors of 32 bits and 64 bits being supported
> > add one for vectors of 128 bits.
> 
> OK

 Thanks for the review, however this is only needed for 2/3 at this point, 
so I'll only push it if 2/3 gets a go-ahead (and still needs this change).

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-07  6:33   ` Richard Biener
@ 2023-07-11 15:01     ` Maciej W. Rozycki
  2023-07-12  7:15       ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-11 15:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Fri, 7 Jul 2023, Richard Biener wrote:

> > The bb-slp-pr95839.c test assumes quad-single float vector support, but
> > some targets only support pairs of floats, causing this test to fail
> > with such targets.  Limit this test to targets that support at least
> > 128-bit vectors then, and add a complementing test that can be run with
> > targets that have support for 64-bit vectors only.  There is no need to
> > adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest
> > vector of doubles, so support is implied by the presence of vectors of
> > doubles.
> 
> I wonder why you see the testcase FAIL, on x86-64 when doing
> 
> typedef float __attribute__((vector_size(32))) v4f32;
> 
> v4f32 f(v4f32 a, v4f32 b)
> {
>   /* Check that we vectorize this CTOR without any loads.  */
>   return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
>   a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
> }
> 
> I see we vectorize the add and the "store".  We fail to perform
> extraction from the incoming vectors (unless you enable AVX),
> that's a missed optimization.
> 
> So with paired floats I would expect sth similar?  Maybe
> x86 is saved by kind-of-presence (but disabled) of V8SFmode vectors.

 I am not familiar enough with this stuff to answer your question.

 As we pass and return V2SF data in FP registers just as with complex 
float data with this hardware the function from my bb-slp-pr95839-v8.c 
expands to a single vector FP add instruction, followed by a function 
return.

 Conversely, the original function from bb-slp-pr95839.c expands to a 
sequence of 22 instructions to extract incoming vector FP data from 4 
64-bit GPRs into 8 FPRs, add the vectors piecemeal with 4 scalar FP add 
instructions, and then insert outgoing vector FP data from 4 FPRs back to 
2 64-bit GPRs.  As an experiment I have modified the backend minimally so 
as to pass and return V4SF data in FP registers as well, but that didn't 
make the vectoriser trigger.

> That said, we should handle this better so can you file an
> enhancement bugreport for this?

 Filed as PR -optimization/110630.  I can't publish RISC-V information 
related to the hardware affected, but as a quick check I ran the MIPS 
compiler:

$ mips-linux-gnu-gcc -march=mips64 -mabi=64 -mpaired-single -O2 -S bb-slp-pr95839*.c

and got this code for bb-slp-pr95839-v8.c (mind the branch delay slot):

	jr	$31
	add.ps	$f0,$f12,$f13

vs code for bb-slp-pr95839.c:

	daddiu	$sp,$sp,-64
	sd	$5,24($sp)
	sd	$7,40($sp)
	lwc1	$f0,24($sp)
	lwc1	$f1,40($sp)
	sd	$4,16($sp)
	sd	$6,32($sp)
	add.s	$f3,$f0,$f1
	lwc1	$f0,28($sp)
	lwc1	$f1,44($sp)
	lwc1	$f4,36($sp)
	swc1	$f3,56($sp)
	add.s	$f2,$f0,$f1
	lwc1	$f0,16($sp)
	lwc1	$f1,32($sp)
	swc1	$f2,60($sp)
	add.s	$f1,$f0,$f1
	lwc1	$f0,20($sp)
	ld	$3,56($sp)
	add.s	$f0,$f0,$f4
	swc1	$f1,48($sp)
	swc1	$f0,52($sp)
	ld	$2,48($sp)
	jr	$31
	daddiu	$sp,$sp,64

so this is essentially the same scenario (up to the machine instruction 
count), and therefore it seems backend-agnostic.  I can imagine the latter 
case could expand to something like (instruction reordering surely needed 
for performance omitted for clarity):

	dmtc1	$4,$f0
	dmtc1	$5,$f1
	dmtc1	$6,$f2
	dmtc1	$7,$f3
	add.ps	$f0,$f0,$f1
	add.ps	$f2,$f2,$f3
	dmfc1	$2,$f0
	jr	$31
	dmfc1	$3,$f2

saving a lot of cycles, and removing the need for spilling temporaries to 
the stack and for frame creation in the first place.

 Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the 
testsuite?

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c
  2023-07-07  6:33   ` Richard Biener
@ 2023-07-11 15:01     ` Maciej W. Rozycki
  2023-07-12  7:15       ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-11 15:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Fri, 7 Jul 2023, Richard Biener wrote:

> > The pr97428.c test assumes support for vectors of doubles, but some
> > targets only support vectors of floats, causing this test to fail with
> > such targets.  Limit this test to targets that support vectors of
> > doubles then.
> 
> OK.

 Applied, thanks.  OK to backport to the active branches?

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-11 15:01     ` Maciej W. Rozycki
@ 2023-07-12  7:15       ` Richard Biener
  2023-07-19 14:34         ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-12  7:15 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Tue, Jul 11, 2023 at 5:01 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Fri, 7 Jul 2023, Richard Biener wrote:
>
> > > The bb-slp-pr95839.c test assumes quad-single float vector support, but
> > > some targets only support pairs of floats, causing this test to fail
> > > with such targets.  Limit this test to targets that support at least
> > > 128-bit vectors then, and add a complementing test that can be run with
> > > targets that have support for 64-bit vectors only.  There is no need to
> > > adjust bb-slp-pr95839-2.c as 128 bits are needed even for the smallest
> > > vector of doubles, so support is implied by the presence of vectors of
> > > doubles.
> >
> > I wonder why you see the testcase FAIL, on x86-64 when doing
> >
> > typedef float __attribute__((vector_size(32))) v4f32;
> >
> > v4f32 f(v4f32 a, v4f32 b)
> > {
> >   /* Check that we vectorize this CTOR without any loads.  */
> >   return (v4f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
> >   a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
> > }
> >
> > I see we vectorize the add and the "store".  We fail to perform
> > extraction from the incoming vectors (unless you enable AVX),
> > that's a missed optimization.
> >
> > So with paired floats I would expect sth similar?  Maybe
> > x86 is saved by kind-of-presence (but disabled) of V8SFmode vectors.
>
>  I am not familiar enough with this stuff to answer your question.
>
>  As we pass and return V2SF data in FP registers just as with complex
> float data with this hardware the function from my bb-slp-pr95839-v8.c
> expands to a single vector FP add instruction, followed by a function
> return.
>
>  Conversely, the original function from bb-slp-pr95839.c expands to a
> sequence of 22 instructions to extract incoming vector FP data from 4
> 64-bit GPRs into 8 FPRs, add the vectors piecemeal with 4 scalar FP add
> instructions, and then insert outgoing vector FP data from 4 FPRs back to
> 2 64-bit GPRs.  As an experiment I have modified the backend minimally so
> as to pass and return V4SF data in FP registers as well, but that didn't
> make the vectoriser trigger.
>
> > That said, we should handle this better so can you file an
> > enhancement bugreport for this?
>
>  Filed as PR -optimization/110630.

Thanks!

>  I can't publish RISC-V information
> related to the hardware affected, but as a quick check I ran the MIPS
> compiler:
>
> $ mips-linux-gnu-gcc -march=mips64 -mabi=64 -mpaired-single -O2 -S bb-slp-pr95839*.c
>
> and got this code for bb-slp-pr95839-v8.c (mind the branch delay slot):
>
>         jr      $31
>         add.ps  $f0,$f12,$f13
>
> vs code for bb-slp-pr95839.c:
>
>         daddiu  $sp,$sp,-64
>         sd      $5,24($sp)
>         sd      $7,40($sp)
>         lwc1    $f0,24($sp)
>         lwc1    $f1,40($sp)
>         sd      $4,16($sp)
>         sd      $6,32($sp)
>         add.s   $f3,$f0,$f1
>         lwc1    $f0,28($sp)
>         lwc1    $f1,44($sp)
>         lwc1    $f4,36($sp)
>         swc1    $f3,56($sp)
>         add.s   $f2,$f0,$f1
>         lwc1    $f0,16($sp)
>         lwc1    $f1,32($sp)
>         swc1    $f2,60($sp)
>         add.s   $f1,$f0,$f1
>         lwc1    $f0,20($sp)
>         ld      $3,56($sp)
>         add.s   $f0,$f0,$f4
>         swc1    $f1,48($sp)
>         swc1    $f0,52($sp)
>         ld      $2,48($sp)
>         jr      $31
>         daddiu  $sp,$sp,64
>
> so this is essentially the same scenario (up to the machine instruction
> count), and therefore it seems backend-agnostic.  I can imagine the latter
> case could expand to something like (instruction reordering surely needed
> for performance omitted for clarity):
>
>         dmtc1   $4,$f0
>         dmtc1   $5,$f1
>         dmtc1   $6,$f2
>         dmtc1   $7,$f3
>         add.ps  $f0,$f0,$f1
>         add.ps  $f2,$f2,$f3
>         dmfc1   $2,$f0
>         jr      $31
>         dmfc1   $3,$f2
>
> saving a lot of cycles, and removing the need for spilling temporaries to
> the stack and for frame creation in the first place.
>
>  Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the
> testsuite?

Sure, more coverage is always  nice.

Richard.

>   Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c
  2023-07-11 15:01     ` Maciej W. Rozycki
@ 2023-07-12  7:15       ` Richard Biener
  2023-07-19 10:29         ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-12  7:15 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Tue, Jul 11, 2023 at 5:01 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Fri, 7 Jul 2023, Richard Biener wrote:
>
> > > The pr97428.c test assumes support for vectors of doubles, but some
> > > targets only support vectors of floats, causing this test to fail with
> > > such targets.  Limit this test to targets that support vectors of
> > > doubles then.
> >
> > OK.
>
>  Applied, thanks.  OK to backport to the active branches?

Yes.

>   Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c
  2023-07-12  7:15       ` Richard Biener
@ 2023-07-19 10:29         ` Maciej W. Rozycki
  0 siblings, 0 replies; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-19 10:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: Rainer Orth, Mike Stump, gcc-patches

On Wed, 12 Jul 2023, Richard Biener wrote:

> >  Applied, thanks.  OK to backport to the active branches?
> 
> Yes.

 Now backported, thanks.

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-12  7:15       ` Richard Biener
@ 2023-07-19 14:34         ` Maciej W. Rozycki
  2023-07-20  7:15           ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-19 14:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: YunQiang Su, Rainer Orth, Mike Stump, gcc-patches

On Wed, 12 Jul 2023, Richard Biener wrote:

> > > That said, we should handle this better so can you file an
> > > enhancement bugreport for this?
> >
> >  Filed as PR -optimization/110630.
> 
> Thanks!

 Thanks for making this improvement.  I've checked MIPS results and code 
produced now is as follows:

	daddiu	$sp,$sp,-64
	sd	$5,24($sp)
	sd	$7,40($sp)
	ldc1	$f0,24($sp)
	ldc1	$f1,40($sp)
	sd	$4,16($sp)
	sd	$6,32($sp)
	ldc1	$f2,32($sp)
	add.ps	$f1,$f0,$f1
	ldc1	$f0,16($sp)
	add.ps	$f0,$f0,$f2
	sdc1	$f1,56($sp)
	ld	$3,56($sp)
	sdc1	$f0,48($sp)
	ld	$2,48($sp)
	jr	$31
	daddiu	$sp,$sp,64

which does do vector stuff now, although it's still considerably worse 
than my handwritten example:

> >         dmtc1   $4,$f0
> >         dmtc1   $5,$f1
> >         dmtc1   $6,$f2
> >         dmtc1   $7,$f3
> >         add.ps  $f0,$f0,$f1
> >         add.ps  $f2,$f2,$f3
> >         dmfc1   $2,$f0
> >         jr      $31
> >         dmfc1   $3,$f2

Or I'd say it's pretty terrible, but given the current situation with the 
MIPS backend I'm going to leave it to the new maintainer to sort out.

> >  Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the
> > testsuite?
> 
> Sure, more coverage is always  nice.

 Thanks, committed (with the `vect64' requirement removed, as we can take 
it for granted with `vect_float').

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-19 14:34         ` Maciej W. Rozycki
@ 2023-07-20  7:15           ` Richard Biener
  2023-07-20  9:16             ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-20  7:15 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: YunQiang Su, Rainer Orth, Mike Stump, gcc-patches

On Wed, Jul 19, 2023 at 4:34 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Wed, 12 Jul 2023, Richard Biener wrote:
>
> > > > That said, we should handle this better so can you file an
> > > > enhancement bugreport for this?
> > >
> > >  Filed as PR -optimization/110630.
> >
> > Thanks!
>
>  Thanks for making this improvement.  I've checked MIPS results and code
> produced now is as follows:
>
>         daddiu  $sp,$sp,-64
>         sd      $5,24($sp)
>         sd      $7,40($sp)
>         ldc1    $f0,24($sp)
>         ldc1    $f1,40($sp)
>         sd      $4,16($sp)
>         sd      $6,32($sp)
>         ldc1    $f2,32($sp)
>         add.ps  $f1,$f0,$f1
>         ldc1    $f0,16($sp)
>         add.ps  $f0,$f0,$f2
>         sdc1    $f1,56($sp)
>         ld      $3,56($sp)
>         sdc1    $f0,48($sp)
>         ld      $2,48($sp)
>         jr      $31
>         daddiu  $sp,$sp,64
>
> which does do vector stuff now, although it's still considerably worse
> than my handwritten example:
>
> > >         dmtc1   $4,$f0
> > >         dmtc1   $5,$f1
> > >         dmtc1   $6,$f2
> > >         dmtc1   $7,$f3
> > >         add.ps  $f0,$f0,$f1
> > >         add.ps  $f2,$f2,$f3
> > >         dmfc1   $2,$f0
> > >         jr      $31
> > >         dmfc1   $3,$f2
>
> Or I'd say it's pretty terrible, but given the current situation with the
> MIPS backend I'm going to leave it to the new maintainer to sort out.

Yeah, I also wondered what is wrong ... I suspect it's the usual issue
of parameter passing causing spilling ...

> > >  Do you agree it still makes sense to include bb-slp-pr95839-v8.c with the
> > > testsuite?
> >
> > Sure, more coverage is always  nice.
>
>  Thanks, committed (with the `vect64' requirement removed, as we can take
> it for granted with `vect_float').
>
>   Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-20  7:15           ` Richard Biener
@ 2023-07-20  9:16             ` Maciej W. Rozycki
  2023-07-20 12:09               ` Richard Biener
  0 siblings, 1 reply; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-20  9:16 UTC (permalink / raw)
  To: Richard Biener; +Cc: YunQiang Su, Rainer Orth, Mike Stump, gcc-patches

On Thu, 20 Jul 2023, Richard Biener wrote:

> >  Thanks for making this improvement.  I've checked MIPS results and code
> > produced now is as follows:
> >
> >         daddiu  $sp,$sp,-64
> >         sd      $5,24($sp)
> >         sd      $7,40($sp)
> >         ldc1    $f0,24($sp)
> >         ldc1    $f1,40($sp)
> >         sd      $4,16($sp)
> >         sd      $6,32($sp)
> >         ldc1    $f2,32($sp)
> >         add.ps  $f1,$f0,$f1
> >         ldc1    $f0,16($sp)
> >         add.ps  $f0,$f0,$f2
> >         sdc1    $f1,56($sp)
> >         ld      $3,56($sp)
> >         sdc1    $f0,48($sp)
> >         ld      $2,48($sp)
> >         jr      $31
> >         daddiu  $sp,$sp,64
> >
> > which does do vector stuff now, although it's still considerably worse
> > than my handwritten example:
> >
> > > >         dmtc1   $4,$f0
> > > >         dmtc1   $5,$f1
> > > >         dmtc1   $6,$f2
> > > >         dmtc1   $7,$f3
> > > >         add.ps  $f0,$f0,$f1
> > > >         add.ps  $f2,$f2,$f3
> > > >         dmfc1   $2,$f0
> > > >         jr      $31
> > > >         dmfc1   $3,$f2
> >
> > Or I'd say it's pretty terrible, but given the current situation with the
> > MIPS backend I'm going to leave it to the new maintainer to sort out.
> 
> Yeah, I also wondered what is wrong ... I suspect it's the usual issue
> of parameter passing causing spilling ...

 There's no such requirement in the psABI and I fail to see a plausible 
justification.  And direct GPR<->FPR move patterns are available in the 
backend for the V2SF mode.  Also there's no delay slot requirement even 
for these move instructions for MIPS64r1+ ISA levels, which have this 
paired-single FP format defined.  It seems to me a plain bug (or missed 
optimisation if you prefer).

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-20  9:16             ` Maciej W. Rozycki
@ 2023-07-20 12:09               ` Richard Biener
  2023-07-20 13:02                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Biener @ 2023-07-20 12:09 UTC (permalink / raw)
  To: Maciej W. Rozycki, Jiufu Guo
  Cc: YunQiang Su, Rainer Orth, Mike Stump, gcc-patches

On Thu, Jul 20, 2023 at 11:16 AM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> On Thu, 20 Jul 2023, Richard Biener wrote:
>
> > >  Thanks for making this improvement.  I've checked MIPS results and code
> > > produced now is as follows:
> > >
> > >         daddiu  $sp,$sp,-64
> > >         sd      $5,24($sp)
> > >         sd      $7,40($sp)
> > >         ldc1    $f0,24($sp)
> > >         ldc1    $f1,40($sp)
> > >         sd      $4,16($sp)
> > >         sd      $6,32($sp)
> > >         ldc1    $f2,32($sp)
> > >         add.ps  $f1,$f0,$f1
> > >         ldc1    $f0,16($sp)
> > >         add.ps  $f0,$f0,$f2
> > >         sdc1    $f1,56($sp)
> > >         ld      $3,56($sp)
> > >         sdc1    $f0,48($sp)
> > >         ld      $2,48($sp)
> > >         jr      $31
> > >         daddiu  $sp,$sp,64
> > >
> > > which does do vector stuff now, although it's still considerably worse
> > > than my handwritten example:
> > >
> > > > >         dmtc1   $4,$f0
> > > > >         dmtc1   $5,$f1
> > > > >         dmtc1   $6,$f2
> > > > >         dmtc1   $7,$f3
> > > > >         add.ps  $f0,$f0,$f1
> > > > >         add.ps  $f2,$f2,$f3
> > > > >         dmfc1   $2,$f0
> > > > >         jr      $31
> > > > >         dmfc1   $3,$f2
> > >
> > > Or I'd say it's pretty terrible, but given the current situation with the
> > > MIPS backend I'm going to leave it to the new maintainer to sort out.
> >
> > Yeah, I also wondered what is wrong ... I suspect it's the usual issue
> > of parameter passing causing spilling ...
>
>  There's no such requirement in the psABI and I fail to see a plausible
> justification.  And direct GPR<->FPR move patterns are available in the
> backend for the V2SF mode.  Also there's no delay slot requirement even
> for these move instructions for MIPS64r1+ ISA levels, which have this
> paired-single FP format defined.  It seems to me a plain bug (or missed
> optimisation if you prefer).

Definitely.  OTOH parameter/return passing for V4SFmode while
appearantly being done in registers the backend(?) assigns BLKmode
to the V4SFmode arguments so they get immediately spilled in the
code moving the incoming hardregisters to pseudos (or stack as in
this case).  It comes down to the issue that Jiufu Guo is eventually
addressing with adding SRA-style heuristics to the code chosing
the layout of that storage.  Interestingly for the return value we get
TImode.

Note we don't seem to be able to optimize

(insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 78 $frame)
                (const_int 24 [0x18])) [1 a+8 S8 A64])
        (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
     (expr_list:REG_DEAD (reg:DI 5 $5)
        (nil)))
...
(insn 40 7 41 2 (set (reg:V2SF 205 [ a+8 ])
        (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
                (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
{*movv2sf}
     (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
                (const_int 24 [0x18])) [1 a+8 S8 A64])
        (nil)))

for some reason.  Maybe we are afraid of the hardreg use in the store,
maybe it is because the store is in the prologue (before
NOTE_INSN_FUNCTION_BEG).  Also postreload isn't able to fix this:

(insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
                (const_int 24 [0x18])) [1 a+8 S8 A64])
        (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
     (nil))
...
(insn 40 7 41 2 (set (reg:V2SF 32 $f0 [orig:205 a+8 ] [205])
        (mem/c:V2SF (plus:DI (reg/f:DI 29 $sp)
                (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
{*movv2sf}
     (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
                (const_int 24 [0x18])) [1 a+8 S8 A64])
        (nil)))

so something is amiss in the backend as well if you say there should be
direct moves available.

Richard.

>
>   Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c
  2023-07-20 12:09               ` Richard Biener
@ 2023-07-20 13:02                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 18+ messages in thread
From: Maciej W. Rozycki @ 2023-07-20 13:02 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jiufu Guo, YunQiang Su, Rainer Orth, Mike Stump, gcc-patches

On Thu, 20 Jul 2023, Richard Biener wrote:

> >  There's no such requirement in the psABI and I fail to see a plausible
> > justification.  And direct GPR<->FPR move patterns are available in the
> > backend for the V2SF mode.  Also there's no delay slot requirement even
> > for these move instructions for MIPS64r1+ ISA levels, which have this
> > paired-single FP format defined.  It seems to me a plain bug (or missed
> > optimisation if you prefer).
> 
> Definitely.  OTOH parameter/return passing for V4SFmode while
> appearantly being done in registers the backend(?) assigns BLKmode
> to the V4SFmode arguments so they get immediately spilled in the

 MIPS NewABI targets use registers to return data of small aggregate types
(effectively of up to the TImode size), so this seems reasonable to me.  
FP scalars and aggregates made of up to two fields are returned in FPRs 
and any other data is returned in GPRs:

"* Function results are returned in $2 (and $3 if needed), or $f0 (and $f2 
   if needed), as appropriate for the type.  Composite results (struct, 
   union, or array) are returned in $2/$f0 and $3/$f2 according to the 
   following rules:

"  - A struct with only one or two floating point fields is returned in 
     $f0 (and $f2 if necessary).  This is a generalization of the Fortran 
     COMPLEX case.

"  - Any other struct or union results of at most 128 bits are returned in 
     $2 (first 64 bits) and $3 (remainder, if necessary)."

Given that V4SFmode data has more than two FP fields (it's effectively an 
array of four) it is correctly returned in GPRs (even though the advantage 
of this arrangement is questionable, but the NewABI predates the invention 
of the paired-single FP format by a few years, which was only introduced 
with the MIPS V ISA, and actually implemented with the MIPS64r1 ISA even 
later).  A similar NewABI rule works here for the arguments.

 I suspect the relevant part of the backend handles it correctly for other 
modes and was missed in the update for V4SFmode, which was a change on its 
own.  The only sufficiently old version of GCC I have ready to use is 
4.1.2 and it produces the same code, so at least it does not seem to be a 
regression.

> code moving the incoming hardregisters to pseudos (or stack as in
> this case).  It comes down to the issue that Jiufu Guo is eventually
> addressing with adding SRA-style heuristics to the code chosing
> the layout of that storage.  Interestingly for the return value we get
> TImode.

 That may come from the use of the GPRs I suppose.

> Note we don't seem to be able to optimize
> 
> (insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 78 $frame)
>                 (const_int 24 [0x18])) [1 a+8 S8 A64])
>         (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
>      (expr_list:REG_DEAD (reg:DI 5 $5)
>         (nil)))
> ...
> (insn 40 7 41 2 (set (reg:V2SF 205 [ a+8 ])
>         (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
>                 (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
> {*movv2sf}
>      (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
>                 (const_int 24 [0x18])) [1 a+8 S8 A64])
>         (nil)))
> 
> for some reason.  Maybe we are afraid of the hardreg use in the store,

 I believe the reason is the relevant constraints use the `*' modifier so 
as not to spill FP values to GPRs or vice versa (ISTR a discussion as to 
why we should prevent it from happening and I don't remember the outcome, 
but overall it seems reasonable to me), so once we've spilled to memory it 
won't be undone.  That doesn't mean we should refrain from moving directly 
when data is there already in the "wrong" kind of register.

> maybe it is because the store is in the prologue (before
> NOTE_INSN_FUNCTION_BEG).  Also postreload isn't able to fix this:
> 
> (insn 6 21 8 2 (set (mem/c:DI (plus:DI (reg/f:DI 29 $sp)
>                 (const_int 24 [0x18])) [1 a+8 S8 A64])
>         (reg:DI 5 $5)) "t.c":4:1 322 {*movdi_64bit}
>      (nil))
> ...
> (insn 40 7 41 2 (set (reg:V2SF 32 $f0 [orig:205 a+8 ] [205])
>         (mem/c:V2SF (plus:DI (reg/f:DI 29 $sp)
>                 (const_int 24 [0x18])) [1 a+8 S8 A64])) "t.c":6:23 387
> {*movv2sf}
>      (expr_list:REG_EQUIV (mem/c:V2SF (plus:DI (reg/f:DI 78 $frame)
>                 (const_int 24 [0x18])) [1 a+8 S8 A64])
>         (nil)))
> 
> so something is amiss in the backend as well if you say there should be
> direct moves available.

 There are, they're alternatives #5/#6 (`mtc'/`mfc') in `*movv2sf' and 
they're handled correctly by `mips_output_move' AFAICT.  Hardware has 
always had it, so there's no ISA constraint here.

 But as I say, I'm leaving it to the backend maintainer to sort out.

  Maciej

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-07-20 13:02 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-06 21:35 [PATCH 0/3] testsuite: Exclude vector tests for unsupported targets Maciej W. Rozycki
2023-07-06 21:35 ` [PATCH 1/3] testsuite: Add check for vectors of 128 bits being supported Maciej W. Rozycki
2023-07-07  6:24   ` Richard Biener
2023-07-11 15:00     ` Maciej W. Rozycki
2023-07-06 21:36 ` [PATCH 2/3] testsuite: Require 128-bit vectors for bb-slp-pr95839.c Maciej W. Rozycki
2023-07-07  6:33   ` Richard Biener
2023-07-11 15:01     ` Maciej W. Rozycki
2023-07-12  7:15       ` Richard Biener
2023-07-19 14:34         ` Maciej W. Rozycki
2023-07-20  7:15           ` Richard Biener
2023-07-20  9:16             ` Maciej W. Rozycki
2023-07-20 12:09               ` Richard Biener
2023-07-20 13:02                 ` Maciej W. Rozycki
2023-07-06 21:36 ` [PATCH 3/3] testsuite: Require vectors of doubles for pr97428.c Maciej W. Rozycki
2023-07-07  6:33   ` Richard Biener
2023-07-11 15:01     ` Maciej W. Rozycki
2023-07-12  7:15       ` Richard Biener
2023-07-19 10:29         ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).