public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [DOC PATCH] PowerPC extended asm example
@ 2017-03-31 13:50 Alan Modra
  2017-04-01 23:44 ` Sandra Loosemore
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Modra @ 2017-03-31 13:50 UTC (permalink / raw)
  To: gcc-patches

Some people over at OpenBLAS were asking me whether I knew of a
whitepaper on gcc asm.  I didn't besides the gcc manual, and wrote a
note explaining some tricks.  This patch is that note cleaned up.
Tested by an x86_64-linux build.  OK to apply?

BTW, anyone wandering over to look at OpenBLAS might notice that this
example doesn't match the file exactly.  Yes, writing this doco made
me realize I need to submit a patch there..

	* doc/extend.texi (Extended Asm): Add OpenBLAS example.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 594b32a..05c6892 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-03-31  Alan Modra  <amodra@gmail.com>
+
+	* doc/extend.texi (Extended Asm): Add OpenBLAS example.
+
 2017-03-31  Matthew Fortune  <matthew.fortune@imgtec.com>
 
 	* config/mips/mips-msa.md (msa_vec_extract_<msafmt_f>): Update
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fadbc96..991a2f6 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -8516,6 +8516,84 @@ asm ("cmoveq %1, %2, %[result]"
    : "r" (test), "r" (new), "[result]" (old));
 @end example
 
+Here is a larger PowerPC example taken from OpenBLAS.  The over 150
+lines of assembly have been removed except for comments added to check
+gcc's register assignments, because the assembly itself isn't that
+important.  You do need to know that all of the function parameters
+are inputs except for the @code{y} array, which is modified by the
+function, and that early assembly sets up four pointers into the
+@code{ap} array, @code{a0=ap}, @code{a1=ap+lda}, @code{a2=ap+2*lda},
+and @code{a3=ap+3*lda}.
+
+Illustrated here is a technique you can use to have gcc allocate
+temporary registers for an asm, giving the compiler more freedom than
+the programmer allocating fixed registers via clobbers.  This is done
+by declaring a variable and making it an early-clobber asm output as
+with @code{a2} and @code{a3}, or making it an output tied to an input
+as with @code{a0} and @code{a1}.  The vsx registers used by the asm
+could have used the same technique except for gcc's limit on number of
+asm parameters.  It shouldn't be surprising that @code{a0} is tied to
+@code{ap} from the above description, and @code{lda} is only used
+early so that register is available for reuse as @code{a1}.  Tying an
+input to an output is the way to set up an initialised temporary
+register that is modified by an asm.  The example also shows an
+initialised register unchanged by the asm; @code{"b" (16)} sets up
+@code{%11} to 16.
+
+Also shown is a somewhat better method than using a @code{"memory"}
+clobber to tell gcc that an asm accesses or modifies memory .  Here we
+use @code{"+m" (*y)} in the list of outputs to tell gcc that the
+@code{y} array is both read and written by the asm.  @code{"m" (*x)}
+and @code{"m" (*ap)} in the inputs tells gcc that these arrays are
+read.  At a minimum, aliasing rules will allow gcc to know what memory
+@emph{doesn't} need to be flushed, and if the function were inlined
+then gcc may be able to do even better.  Notice that @code{x},
+@code{y}, and @code{ap} all appear twice in the asm parameters, once
+to specify memory accessed, and once to specify a base register used
+by the asm.  You won't normally be wasting a register by doing this as
+gcc can use the same register for both purposes.  However, it would be
+foolish to use both @code{%0} and @code{%2} for @code{y} in your asm
+and expect them to be the same.
+
+@example
+static void
+dgemv_kernel_4x4 (long n, const double *ap, long lda,
+                  const double *x, double *y, double alpha)
+@{
+  double *a0;
+  double *a1;
+  double *a2;
+  double *a3;
+
+  __asm__
+    (
+     ...
+     "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
+     "#a0=%3 a1=%4 a2=%5 a3=%6"
+     :
+       "+m" (*y),
+       "+r" (n),	// 1
+       "+b" (y),	// 2
+       "=b" (a0),	// 3
+       "=b" (a1),	// 4
+       "=&b" (a2),	// 5
+       "=&b" (a3)	// 6
+     :
+       "m" (*x),
+       "m" (*ap),
+       "d" (alpha),	// 9
+       "r" (x),		// 10
+       "b" (16),	// 11
+       "3" (ap),	// 12
+       "4" (lda)	// 13
+     :
+       "cr0",
+       "vs32","vs33","vs34","vs35","vs36","vs37",
+       "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
+     );
+@}
+@end example
+
 @anchor{Clobbers}
 @subsubsection Clobbers
 @cindex @code{asm} clobbers


-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-06  1:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 13:50 [DOC PATCH] PowerPC extended asm example Alan Modra
2017-04-01 23:44 ` Sandra Loosemore
2017-04-04 12:14   ` Alan Modra
2017-04-05 15:49     ` Sandra Loosemore
2017-04-06  1:38       ` Alan Modra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).