[DOC PATCH] PowerPC extended asm example

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [DOC PATCH] PowerPC extended asm example
@ 2017-03-31 13:50 Alan Modra
  2017-04-01 23:44 ` Sandra Loosemore
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Modra @ 2017-03-31 13:50 UTC (permalink / raw)
  To: gcc-patches

Some people over at OpenBLAS were asking me whether I knew of a
whitepaper on gcc asm.  I didn't besides the gcc manual, and wrote a
note explaining some tricks.  This patch is that note cleaned up.
Tested by an x86_64-linux build.  OK to apply?

BTW, anyone wandering over to look at OpenBLAS might notice that this
example doesn't match the file exactly.  Yes, writing this doco made
me realize I need to submit a patch there..

	* doc/extend.texi (Extended Asm): Add OpenBLAS example.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 594b32a..05c6892 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-03-31  Alan Modra  <amodra@gmail.com>
+
+	* doc/extend.texi (Extended Asm): Add OpenBLAS example.
+
 2017-03-31  Matthew Fortune  <matthew.fortune@imgtec.com>
 
 	* config/mips/mips-msa.md (msa_vec_extract_<msafmt_f>): Update
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fadbc96..991a2f6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8516,6 +8516,84 @@ asm ("cmoveq %1, %2, %[result]"
    : "r" (test), "r" (new), "[result]" (old));
 @end example
 
+Here is a larger PowerPC example taken from OpenBLAS.  The over 150
+lines of assembly have been removed except for comments added to check
+gcc's register assignments, because the assembly itself isn't that
+important.  You do need to know that all of the function parameters
+are inputs except for the @code{y} array, which is modified by the
+function, and that early assembly sets up four pointers into the
+@code{ap} array, @code{a0=ap}, @code{a1=ap+lda}, @code{a2=ap+2*lda},
+and @code{a3=ap+3*lda}.
+
+Illustrated here is a technique you can use to have gcc allocate
+temporary registers for an asm, giving the compiler more freedom than
+the programmer allocating fixed registers via clobbers.  This is done
+by declaring a variable and making it an early-clobber asm output as
+with @code{a2} and @code{a3}, or making it an output tied to an input
+as with @code{a0} and @code{a1}.  The vsx registers used by the asm
+could have used the same technique except for gcc's limit on number of
+asm parameters.  It shouldn't be surprising that @code{a0} is tied to
+@code{ap} from the above description, and @code{lda} is only used
+early so that register is available for reuse as @code{a1}.  Tying an
+input to an output is the way to set up an initialised temporary
+register that is modified by an asm.  The example also shows an
+initialised register unchanged by the asm; @code{"b" (16)} sets up
+@code{%11} to 16.
+
+Also shown is a somewhat better method than using a @code{"memory"}
+clobber to tell gcc that an asm accesses or modifies memory .  Here we
+use @code{"+m" (*y)} in the list of outputs to tell gcc that the
+@code{y} array is both read and written by the asm.  @code{"m" (*x)}
+and @code{"m" (*ap)} in the inputs tells gcc that these arrays are
+read.  At a minimum, aliasing rules will allow gcc to know what memory
+@emph{doesn't} need to be flushed, and if the function were inlined
+then gcc may be able to do even better.  Notice that @code{x},
+@code{y}, and @code{ap} all appear twice in the asm parameters, once
+to specify memory accessed, and once to specify a base register used
+by the asm.  You won't normally be wasting a register by doing this as
+gcc can use the same register for both purposes.  However, it would be
+foolish to use both @code{%0} and @code{%2} for @code{y} in your asm
+and expect them to be the same.
+
+@example
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+    (
+     ...
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*y),
+       "+r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*x),
+       "m" (*ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
+@}
+@end example
+
 @anchor{Clobbers}
 @subsubsection Clobbers
 @cindex @code{asm} clobbers


-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [DOC PATCH] PowerPC extended asm example
  2017-03-31 13:50 [DOC PATCH] PowerPC extended asm example Alan Modra
@ 2017-04-01 23:44 ` Sandra Loosemore
  2017-04-04 12:14   ` Alan Modra
  0 siblings, 1 reply; 5+ messages in thread
From: Sandra Loosemore @ 2017-04-01 23:44 UTC (permalink / raw)
  To: Alan Modra, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

On 03/31/2017 07:30 AM, Alan Modra wrote:
> Some people over at OpenBLAS were asking me whether I knew of a
> whitepaper on gcc asm.  I didn't besides the gcc manual, and wrote a
> note explaining some tricks.  This patch is that note cleaned up.
> Tested by an x86_64-linux build.  OK to apply?

The patch had a lot of copy-editing issues with markup, spelling, etc. 
I thought it would be easier just to fix them than explain what was 
wrong, so I've attached a tidied-up version.  I also moved the example 
before the detailed discussion since it's easier to understand that way.

There are still a couple semantic issues that need fixing, though...

(1) The example is in the "Input Operands" subsection, but it seems like 
it's really about clobbers and alternatives to clobbers.  Unless you 
have some better idea, I'd suggest moving it to the "Clobbers" 
subsection and maybe renaming that subsection "Clobbers and Scratch 
Registers" too.  And making the purpose of the example and its relation 
to the purpose of the containing section more explicit in its 
introductory text.

(2) In this bit of text

> +function, and that early assembly sets up four pointers into the
> +@code{ap} array, @code{a0=ap}, @code{a1=ap+lda}, @code{a2=ap+2*lda},
> +and @code{a3=ap+3*lda}.

I don't understand what "early assembly" is.  Wouldn't it make more 
sense to add initializers to these declarations in the example code

> +  double *a0;
> +  double *a1;
> +  double *a2;
> +  double *a3;

than to hand-wave about what sets up these pointers?

-Sandra


[-- Attachment #2: asm.patch --]
[-- Type: text/x-patch, Size: 3692 bytes --]

Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 246632)
+++ gcc/doc/extend.texi	(working copy)
@@ -8516,6 +8516,86 @@ asm ("cmoveq %1, %2, %[result]"
    : "r" (test), "r" (new), "[result]" (old));
 @end example
 
+Here is a larger PowerPC example taken from OpenBLAS.  All of the
+function parameters are inputs except for the @code{y} array, which is
+modified by the function.  Early assembly sets up four pointers
+into the @code{ap} array, @code{a0=ap}, @code{a1=ap+lda},
+@code{a2=ap+2*lda}, and @code{a3=ap+3*lda}.  The actual assembly code
+has been elided except for comments added to check GCC's register
+assignments, since it's not interesting for purposes of explaining the
+operands.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+    (
+     ...
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*y),
+       "+r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*x),
+       "m" (*ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
+@}
+@end smallexample
+
+Illustrated here are techniques you can use to have GCC allocate
+temporary registers for an @code{asm} statement, giving the compiler
+more freedom than if you allocated fixed registers via clobbers.  This
+is done by declaring a variable and making it an early-clobber
+@code{asm} output as with @code{a2} and @code{a3}, or making it an
+output tied to an input as with @code{a0} and @code{a1}.  The VSX
+registers used by the @code{asm} statement could have used the same
+technique except for GCC's limit on number of @code{asm} parameters.
+It shouldn't be surprising that @code{a0} is tied to @code{ap} from
+the above description, and @code{lda} is only used early so that
+register is available for reuse as @code{a1}.  Tying an input to an
+output is the way to set up an initialized temporary register that is
+modified by an @code{asm} statement.  The example also shows an
+initialized register unchanged by the @code{asm} statement; @code{"b"
+(16)} sets up @code{%11} to 16.
+
+Also shown is a somewhat better method than using a @code{"memory"}
+clobber to tell GCC that an @code{asm} statement accesses or modifies
+memory.  Here we use @code{"+m" (*y)} in the list of outputs to tell
+GCC that the @code{y} array is both read and written by the @code{asm}
+statement.  @code{"m" (*x)} and @code{"m" (*ap)} in the inputs tell
+GCC that these arrays are read.  At a minimum, aliasing rules allow
+GCC to know what memory @emph{doesn't} need to be flushed, and if the
+function were inlined then GCC may be able to do even better.  Notice
+that @code{x}, @code{y}, and @code{ap} all appear twice in the
+@code{asm} parameters, once to specify memory accessed, and once to
+specify a base register used by the @code{asm}.  You won't normally be
+wasting a register by doing this as GCC can use the same register for
+both purposes.  However, it would be foolish to use both @code{%0} and
+@code{%2} for @code{y} in your @code{asm} statement and expect them to
+be the same.
+
 @anchor{Clobbers}
 @subsubsection Clobbers
 @cindex @code{asm} clobbers

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [DOC PATCH] PowerPC extended asm example
  2017-04-01 23:44 ` Sandra Loosemore
@ 2017-04-04 12:14   ` Alan Modra
  2017-04-05 15:49     ` Sandra Loosemore
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Modra @ 2017-04-04 12:14 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: gcc-patches

Revised patch.

	* doc/extend.texi (Extended Asm <Clobbers>): Rename to
	"Clobbers and Scratch Registers".  Add OpenBLAS example.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0f44ece..0b0a021 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -7869,7 +7869,7 @@ A comma-separated list of C expressions read by the instructions in the
 @item Clobbers
 A comma-separated list of registers or other values changed by the 
 @var{AssemblerTemplate}, beyond those listed as outputs.
-An empty list is permitted.  @xref{Clobbers}.
+An empty list is permitted.  @xref{Clobbers and Scratch Registers}.
 
 @item GotoLabels
 When you are using the @code{goto} form of @code{asm}, this section contains 
@@ -8229,7 +8229,7 @@ The enclosing parentheses are a required part of the syntax.
 
 When the compiler selects the registers to use to 
 represent the output operands, it does not use any of the clobbered registers 
-(@pxref{Clobbers}).
+(@pxref{Clobbers and Scratch Registers}).
 
 Output operand expressions must be lvalues. The compiler cannot check whether 
 the operands have data types that are reasonable for the instruction being 
@@ -8465,7 +8465,8 @@ as input.  The enclosing parentheses are a required part of the syntax.
 @end table
 
 When the compiler selects the registers to use to represent the input 
-operands, it does not use any of the clobbered registers (@pxref{Clobbers}).
+operands, it does not use any of the clobbered registers
+(@pxref{Clobbers and Scratch Registers}).
 
 If there are no output operands but there are input operands, place two 
 consecutive colons where the output operands would go:
@@ -8516,9 +8517,10 @@ asm ("cmoveq %1, %2, %[result]"
    : "r" (test), "r" (new), "[result]" (old));
 @end example
 
-@anchor{Clobbers}
-@subsubsection Clobbers
+@anchor{Clobbers and Scratch Registers}
+@subsubsection Clobbers and Scratch Registers
 @cindex @code{asm} clobbers
+@cindex @code{asm} scratch registers
 
 While the compiler is aware of changes to entries listed in the output 
 operands, the inline @code{asm} code may modify more than just the outputs. For 
@@ -8589,6 +8591,110 @@ ten bytes of a string, use a memory input like:
 
 @end table
 
+Rather than allocating fixed registers via clobbers to provide scratch
+registers for an @code{asm} statement, there are better techniques you
+can use which give the compiler more freedom.  There are also better
+ways than using a @code{"memory"} clobber to tell GCC that an
+@code{asm} statement accesses or modifies memory.  The following
+PowerPC example taken from OpenBLAS illustrates some of these
+techniques.
+
+In the function shown below, all of the function parameters are inputs
+except for the @code{y} array, which is modified by the function.
+Only the first few lines of assembly in the @code{asm} statement are
+shown, and a comment handy for checking register assignments.  These
+insns set up some registers for later use in loops, and in particular,
+set up four pointers into the @code{ap} array, @code{a0=ap},
+@code{a1=ap+lda}, @code{a2=ap+2*lda}, and @code{a3=ap+3*lda}.  The
+rest of the assembly is simply too large to include here.
+
+@smallexample
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+    (
+       "lxvd2x		34, 0, %10	\n\t"	// x0, x1
+       "lxvd2x		35, %11, %10	\n\t"	// x2, x3
+       "xxspltd		32, %x9, 0	\n\t"	// alpha, alpha
+       "sldi		%6, %13, 3	\n\t"	// lda * sizeof (double)
+       "xvmuldp		34, 34, 32	\n\t"	// x0 * alpha, x1 * alpha
+       "xvmuldp		35, 35, 32	\n\t"	// x2 * alpha, x3 * alpha
+       "add		%4, %3, %6	\n\t"	// a0 = ap, a1 = a0 + lda
+       "add		%6, %6, %6	\n\t"	// 2 * lda
+       "xxspltd		32, 34, 0	\n\t"	// x0 * alpha, x0 * alpha
+       "xxspltd		33, 34, 1	\n\t"	// x1 * alpha, x1 * alpha
+       "xxspltd		34, 35, 0	\n\t"	// x2 * alpha, x2 * alpha
+       "xxspltd		35, 35, 1	\n\t"	// x3 * alpha, x3 * alpha
+       "add		%5, %3, %6	\n\t"	// a2 = a0 + 2 * lda
+       "add		%6, %4, %6	\n\t"	// a3 = a1 + 2 * lda
+     ...
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*y),
+       "+r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*x),
+       "m" (*ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
+@}
+@end smallexample
+
+Allocating scratch registers is done by declaring a variable and
+making it an early-clobber @code{asm} output as with @code{a2} and
+@code{a3}, or making it an output tied to an input as with @code{a0}
+and @code{a1}.  You can use a normal @code{asm} output if all inputs
+that might share the same register are consumed before the scratch is
+used.  The VSX registers clobbered by the @code{asm} statement could
+have used the same technique except for GCC's limit on number of
+@code{asm} parameters.  It shouldn't be surprising that @code{a0} is
+tied to @code{ap} from the above description, and @code{lda} is only
+used in the fourth machine insn shown above, so that register is
+available for reuse as @code{a1}.  Note that tying an input to an
+output is the way to set up an initialized temporary register modified
+by an @code{asm} statement.  The example also shows an initialized
+register unchanged by the @code{asm} statement; @code{"b" (16)} sets
+up @code{%11} to 16.
+
+Rather than using a @code{"memory"} clobber, the @code{asm} has
+@code{"+m" (*y)} in the list of outputs to tell GCC that the @code{y}
+array is both read and written by the @code{asm} statement.
+@code{"m" (*x)} and @code{"m" (*ap)} in the inputs tell GCC that these
+arrays are read.  At a minimum, aliasing rules allow GCC to know what
+memory @emph{doesn't} need to be flushed, and if the function were
+inlined then GCC may be able to do even better.  Also, if GCC can
+prove that all of the outputs of an @code{asm} statement are unused,
+then the @code{asm} may be deleted.  Removal of dead @code{asm}
+statements will not happen if they clobber @code{"memory"}.  Notice
+that @code{x}, @code{y}, and @code{ap} all appear twice in the
+@code{asm} parameters, once to specify memory accessed, and once to
+specify a base register used by the @code{asm}.  You won't normally be
+wasting a register by doing this as GCC can use the same register for
+both purposes.  However, it would be foolish to use both @code{%0} and
+@code{%2} for @code{y} in this @code{asm} assembly and expect them to
+be the same.
+
 @anchor{GotoLabels}
 @subsubsection Goto Labels
 @cindex @code{asm} goto labels

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [DOC PATCH] PowerPC extended asm example
  2017-04-04 12:14   ` Alan Modra
@ 2017-04-05 15:49     ` Sandra Loosemore
  2017-04-06  1:38       ` Alan Modra
  0 siblings, 1 reply; 5+ messages in thread
From: Sandra Loosemore @ 2017-04-05 15:49 UTC (permalink / raw)
  To: Alan Modra; +Cc: gcc-patches

On 04/04/2017 06:14 AM, Alan Modra wrote:
> Revised patch.
>
> [snip]
> +@smallexample
> +static void
> +dgemv_kernel_4x4 (long n, const double *ap, long lda,
> +                  const double *x, double *y, double alpha)
> +@{
> +  double *a0;
> +  double *a1;
> +  double *a2;
> +  double *a3;
> +
> +  __asm__
> +    (
> +       "lxvd2x		34, 0, %10	\n\t"	// x0, x1
> +       "lxvd2x		35, %11, %10	\n\t"	// x2, x3
> +       "xxspltd		32, %x9, 0	\n\t"	// alpha, alpha
> +       "sldi		%6, %13, 3	\n\t"	// lda * sizeof (double)
> +       "xvmuldp		34, 34, 32	\n\t"	// x0 * alpha, x1 * alpha
> +       "xvmuldp		35, 35, 32	\n\t"	// x2 * alpha, x3 * alpha
> +       "add		%4, %3, %6	\n\t"	// a0 = ap, a1 = a0 + lda
> +       "add		%6, %6, %6	\n\t"	// 2 * lda
> +       "xxspltd		32, 34, 0	\n\t"	// x0 * alpha, x0 * alpha
> +       "xxspltd		33, 34, 1	\n\t"	// x1 * alpha, x1 * alpha
> +       "xxspltd		34, 35, 0	\n\t"	// x2 * alpha, x2 * alpha
> +       "xxspltd		35, 35, 1	\n\t"	// x3 * alpha, x3 * alpha
> +       "add		%5, %3, %6	\n\t"	// a2 = a0 + 2 * lda
> +       "add		%6, %4, %6	\n\t"	// a3 = a1 + 2 * lda
> +     ...
> +     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
> +     "#a0=%3 a1=%4 a2=%5 a3=%6"
> +     :
> +       "+m" (*y),
> +       "+r" (n),	// 1
> +       "+b" (y),	// 2
> +       "=b" (a0),	// 3
> +       "=b" (a1),	// 4
> +       "=&b" (a2),	// 5
> +       "=&b" (a3)	// 6
> +     :
> +       "m" (*x),
> +       "m" (*ap),
> +       "d" (alpha),	// 9
> +       "r" (x),		// 10
> +       "b" (16),	// 11
> +       "3" (ap),	// 12
> +       "4" (lda)	// 13
> +     :
> +       "cr0",
> +       "vs32","vs33","vs34","vs35","vs36","vs37",
> +       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
> +     );
> +@}
> +@end smallexample

Hmmm.  My main objection to this version is that it's unintelligible to 
anyone who can't parse PowerPC assembly language without the help of an 
architecture manual, and that's probably the majority of readers.

I'm now wondering if it would be better to have a series of small 
examples showing these tricks individually instead of one giant example 
that tries to illustrate multiple things?

-Sandra

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [DOC PATCH] PowerPC extended asm example
  2017-04-05 15:49     ` Sandra Loosemore
@ 2017-04-06  1:38       ` Alan Modra
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Modra @ 2017-04-06  1:38 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: gcc-patches

On Wed, Apr 05, 2017 at 09:37:04AM -0600, Sandra Loosemore wrote:
> On 04/04/2017 06:14 AM, Alan Modra wrote:
> >Revised patch.
> >
> >[snip]
> >+@smallexample
> >+static void
> >+dgemv_kernel_4x4 (long n, const double *ap, long lda,
> >+                  const double *x, double *y, double alpha)
> >+@{
> >+  double *a0;
> >+  double *a1;
> >+  double *a2;
> >+  double *a3;
> >+
> >+  __asm__
> >+    (
> >+       "lxvd2x		34, 0, %10	\n\t"	// x0, x1
> >+       "lxvd2x		35, %11, %10	\n\t"	// x2, x3
> >+       "xxspltd		32, %x9, 0	\n\t"	// alpha, alpha
> >+       "sldi		%6, %13, 3	\n\t"	// lda * sizeof (double)
> >+       "xvmuldp		34, 34, 32	\n\t"	// x0 * alpha, x1 * alpha
> >+       "xvmuldp		35, 35, 32	\n\t"	// x2 * alpha, x3 * alpha
> >+       "add		%4, %3, %6	\n\t"	// a0 = ap, a1 = a0 + lda
> >+       "add		%6, %6, %6	\n\t"	// 2 * lda
> >+       "xxspltd		32, 34, 0	\n\t"	// x0 * alpha, x0 * alpha
> >+       "xxspltd		33, 34, 1	\n\t"	// x1 * alpha, x1 * alpha
> >+       "xxspltd		34, 35, 0	\n\t"	// x2 * alpha, x2 * alpha
> >+       "xxspltd		35, 35, 1	\n\t"	// x3 * alpha, x3 * alpha
> >+       "add		%5, %3, %6	\n\t"	// a2 = a0 + 2 * lda
> >+       "add		%6, %4, %6	\n\t"	// a3 = a1 + 2 * lda
> >+     ...
> >+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
> >+     "#a0=%3 a1=%4 a2=%5 a3=%6"
> >+     :
> >+       "+m" (*y),
> >+       "+r" (n),	// 1
> >+       "+b" (y),	// 2
> >+       "=b" (a0),	// 3
> >+       "=b" (a1),	// 4
> >+       "=&b" (a2),	// 5
> >+       "=&b" (a3)	// 6
> >+     :
> >+       "m" (*x),
> >+       "m" (*ap),
> >+       "d" (alpha),	// 9
> >+       "r" (x),		// 10
> >+       "b" (16),	// 11
> >+       "3" (ap),	// 12
> >+       "4" (lda)	// 13
> >+     :
> >+       "cr0",
> >+       "vs32","vs33","vs34","vs35","vs36","vs37",
> >+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
> >+     );
> >+@}
> >+@end smallexample
> 
> Hmmm.  My main objection to this version is that it's unintelligible to
> anyone who can't parse PowerPC assembly language without the help of an
> architecture manual, and that's probably the majority of readers.

Heh, even I have trouble parsing some powerpc assembly!  That's why
there are a few lines of text describing what the assembly code does.
I am concerned that the 14 lines of assembly shown make the example
too big, but it's harder to describe code that isn't shown than to
describe something under the nose of the reader.

> I'm now wondering if it would be better to have a series of small examples
> showing these tricks individually instead of one giant example that tries to
> illustrate multiple things?

Possibly, but this example comes after many others.  If people have
waded this far into the asm section of the manual they shouldn't have
too much trouble understanding the concepts here.

Also, there's value in a real-world example.  Maybe that's just me.
I'm not someone who tends to read manuals first, preferring to dive
right in then go back to a manual later for some detail that can't be
easily deduced.  In fact, I have a distrust of manuals..  ;)  This
isn't a criticism of the gcc manual, but other documents I've read
over the years are often just plain wrong.  I've even been the
*author* of technical documentation that had errors, some by yours
truly, and some introduced by a "technical writer" who edited my input
to make it read better, in the process accidentally changing something
that made the details incorrect.  I'm sure others have had the same
experience.  So I like *and trust* code snippets taken from working
code more than made up examples created for documentation.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-06  1:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 13:50 [DOC PATCH] PowerPC extended asm example Alan Modra
2017-04-01 23:44 ` Sandra Loosemore
2017-04-04 12:14   ` Alan Modra
2017-04-05 15:49     ` Sandra Loosemore
2017-04-06  1:38       ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).