* [PATCH] Asm memory constraints @ 2017-08-18 17:51 Alan Modra 2017-08-21 0:59 ` Segher Boessenkool 0 siblings, 1 reply; 10+ messages in thread From: Alan Modra @ 2017-08-18 17:51 UTC (permalink / raw) To: gcc-patches This patch adds some documentation on asm memory constraints, aimed especially at constraints for arrays. I may have invented something new here as I've never seen "=m" (*(T (*)[]) ptr) used before. So this isn't simply a documentation patch. It needs blessing from a global maintainer, I think, as to whether this is a valid approach and something that gcc ought to continue supporting. My poking around the code and looking at dumps convinced me that it's OK.. PR inline-asm/81890 * doc/extend.texi (Clobbers): Correct vax example. Delete old example of a memory input for a string of known length. Move commentary out of table. Add a number of new examples covering array memory inputs and outputs. testsuite/ * gcc.target/i386/asm-mem.c: New test. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 93d542d..224518f 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8755,7 +8755,7 @@ registers: asm volatile ("movc3 %0, %1, %2" : /* No outputs. */ : "g" (from), "g" (to), "g" (count) - : "r0", "r1", "r2", "r3", "r4", "r5"); + : "r0", "r1", "r2", "r3", "r4", "r5", "memory"); @end example Also, there are two special clobber arguments: @@ -8786,14 +8786,75 @@ Note that this clobber does not prevent the @emph{processor} from doing speculative reads past the @code{asm} statement. To prevent that, you need processor-specific fence instructions. -Flushing registers to memory has performance implications and may be an issue -for time-sensitive code. You can use a trick to avoid this if the size of -the memory being accessed is known at compile time. For example, if accessing -ten bytes of a string, use a memory input like: +@end table -@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}. +Flushing registers to memory has performance implications and may be +an issue for time-sensitive code. You can provide better information +to GCC to avoid this, as shown in the following examples. At a +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} +need to be flushed. Also, if GCC can prove that all of the outputs of +a non-volatile @code{asm} statement are unused, then the @code{asm} +may be deleted. Removal of otherwise dead @code{asm} statements will +not happen if they clobber @code{"memory"}. -@end table +Here is a fictitious sum of squares instruction, that takes two +pointers to floating point values in memory and produces a floating +point register output. +Notice that @code{x}, and @code{y} both appear twice in the @code{asm} +parameters, once to specify memory accessed, and once to specify a +base register used by the @code{asm}. You won't normally be wasting a +register by doing this as GCC can use the same register for both +purposes. However, it would be foolish to use both @code{%1} and +@code{%3} for @code{x} in this @code{asm} and expect them to be the +same. In fact, @code{%3} may well not even be a register. It might +be a symbolic memory reference to the object pointed to by @code{x}. + +@smallexample +asm ("sumsq %0, %1, %2" + : "+f" (result) + : "r" (x), "r" (y), "m" (*x), "m" (*y)); +@end smallexample + +Here is a fictitious @code{*z++ = *x++ * *y++} instruction. +Notice that the @code{x}, @code{y} and @code{z} pointer registers +must be specified as input/output because the @code{asm} modifies +them. + +@smallexample +asm ("vecmul %0, %1, %2" + : "+r" (z), "+r" (x), "+r" (y), "=m" (*z) + : "m" (*x), "m" (*y)); +@end smallexample + +An x86 example where the string memory argument is of unknown length. + +@smallexample +asm("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); +@end smallexample + +If you know the above will only be reading a ten byte array then you +could instead use a memory input like: +@code{"m" (*(const char (*)[10]) p)}. + +Here is an example of a PowerPC vector scale implemented in assembly, +complete with vector and condition code clobbers, and some initialized +offset registers that are unchanged by the @code{asm}. + +@smallexample +void +dscal (size_t n, double *x, double alpha) +@{ + asm ("/* lots of asm here */" + : "+m" (*(double (*)[n]) x), "+r" (n), "+b" (x) + : "d" (alpha), "b" (32), "b" (48), "b" (64), + "b" (80), "b" (96), "b" (112) + : "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"); +@} +@end smallexample @anchor{GotoLabels} @subsubsection Goto Labels diff --git a/gcc/testsuite/gcc.target/i386/asm-mem.c b/gcc/testsuite/gcc.target/i386/asm-mem.c new file mode 100644 index 0000000..01522fe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/asm-mem.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-O3" } */ + +/* Check that "m" array references are effective in preventing the + array initialization from wandering past a use in the asm. */ + +static int +f1 (const char *p) +{ + int count; + + __asm__ ("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); + return -2 - count; +} + +static int +f2 (const char *p) +{ + int count; + + __asm__ ("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[48]) p), "0" (-1), "a" (0)); + return -2 - count; +} + +static int +f3 (int n, const char *p) +{ + int count; + + __asm__ ("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[n]) p), "0" (-1), "a" (0)); + return -2 - count; +} + +int +main () +{ + int a; + char buff[48] = "hello world"; + buff[4] = 0; + a = f1 (buff); + if (a != 4) + __builtin_abort (); + buff[4] = 'o'; + a = f2 (buff); + if (a != 11) + __builtin_abort (); + buff[4] = 0; + a = f3 (48, buff); + if (a != 4) + __builtin_abort (); + return 0; +} -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Asm memory constraints 2017-08-18 17:51 [PATCH] Asm memory constraints Alan Modra @ 2017-08-21 0:59 ` Segher Boessenkool 2017-08-21 7:36 ` Alan Modra 0 siblings, 1 reply; 10+ messages in thread From: Segher Boessenkool @ 2017-08-21 0:59 UTC (permalink / raw) To: Alan Modra; +Cc: gcc-patches Hi Alan, On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote: > +Flushing registers to memory has performance implications and may be > +an issue for time-sensitive code. You can provide better information > +to GCC to avoid this, as shown in the following examples. At a > +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} > +need to be flushed. Also, if GCC can prove that all of the outputs of > +a non-volatile @code{asm} statement are unused, then the @code{asm} > +may be deleted. Removal of otherwise dead @code{asm} statements will > +not happen if they clobber @code{"memory"}. void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); } void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); } Both f and g are completely removed by the first jump pass immediately after expand (via delete_trivially_dead_insns). Do you have a testcase for the behaviour you saw? Segher ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Asm memory constraints 2017-08-21 0:59 ` Segher Boessenkool @ 2017-08-21 7:36 ` Alan Modra 2017-08-21 7:44 ` Clobbers and Scratch Registers Alan Modra ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Alan Modra @ 2017-08-21 7:36 UTC (permalink / raw) To: Segher Boessenkool; +Cc: gcc-patches On Sun, Aug 20, 2017 at 08:00:53AM -0500, Segher Boessenkool wrote: > Hi Alan, > > On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote: > > +Flushing registers to memory has performance implications and may be > > +an issue for time-sensitive code. You can provide better information > > +to GCC to avoid this, as shown in the following examples. At a > > +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} > > +need to be flushed. Also, if GCC can prove that all of the outputs of > > +a non-volatile @code{asm} statement are unused, then the @code{asm} > > +may be deleted. Removal of otherwise dead @code{asm} statements will > > +not happen if they clobber @code{"memory"}. > > void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); } > void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); } > > Both f and g are completely removed by the first jump pass immediately > after expand (via delete_trivially_dead_insns). > > Do you have a testcase for the behaviour you saw? Oh my. I was sure that was how "memory" worked! I see though that every gcc I have lying around, going all the way back to gcc-2.95, deletes the asm in your testcase. I definitely don't want to put something in the docs that is plain wrong, or just my idea of how things ought to work, so the last two sentences quoted above need to go. Thanks for the correction. Fixed in this revised patch. The only controversial aspect now should be whether those array casts ought to be officially blessed. I've checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and "=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled apparently properly by alias.c and other code. For example, at -O3 the following shows gcc moving the read of "val" before the asm, while an asm using a "memory" clobber forces the read to occur after the asm. static int f (double *x) { int res; asm ("#%0 %1 %2" : "=r" (res) : "r" (x), "m" (*(double (*)[]) x)); return res; } int val = 123; double foo[10]; int main () { int b = f (foo); __builtin_printf ("%d %d\n", val, b); return 0; } I'm also encouraged by comments like the following by rth in 2004 (gcc/c/c-typeck.c), which say that using non-kosher lvalues in memory output constraints must continue to be supported. /* ??? Really, this should not be here. Users should be using a proper lvalue, dammit. But there's a long history of using casts in the output operands. In cases like longlong.h, this becomes a primitive form of typechecking -- if the cast can be removed, then the output operand had a type of the proper width; otherwise we'll get an error. Gross, but ... */ STRIP_NOPS (output); * doc/extend.texi (Clobbers): Correct vax example. Delete old example of a memory input for a string of known length. Move commentary out of table. Add a number of new examples covering array memory inputs. testsuite/ * gcc.target/i386/asm-mem.c: New test. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 649be01..940490e 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8755,7 +8755,7 @@ registers: asm volatile ("movc3 %0, %1, %2" : /* No outputs. */ : "g" (from), "g" (to), "g" (count) - : "r0", "r1", "r2", "r3", "r4", "r5"); + : "r0", "r1", "r2", "r3", "r4", "r5", "memory"); @end example Also, there are two special clobber arguments: @@ -8786,14 +8786,72 @@ Note that this clobber does not prevent the @emph{processor} from doing speculative reads past the @code{asm} statement. To prevent that, you need processor-specific fence instructions. -Flushing registers to memory has performance implications and may be an issue -for time-sensitive code. You can use a trick to avoid this if the size of -the memory being accessed is known at compile time. For example, if accessing -ten bytes of a string, use a memory input like: +@end table -@code{@{"m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )@}}. +Flushing registers to memory has performance implications and may be +an issue for time-sensitive code. You can provide better information +to GCC to avoid this, as shown in the following examples. At a +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} +need to be flushed. -@end table +Here is a fictitious sum of squares instruction, that takes two +pointers to floating point values in memory and produces a floating +point register output. +Notice that @code{x}, and @code{y} both appear twice in the @code{asm} +parameters, once to specify memory accessed, and once to specify a +base register used by the @code{asm}. You won't normally be wasting a +register by doing this as GCC can use the same register for both +purposes. However, it would be foolish to use both @code{%1} and +@code{%3} for @code{x} in this @code{asm} and expect them to be the +same. In fact, @code{%3} may well not be a register. It might be a +symbolic memory reference to the object pointed to by @code{x}. + +@smallexample +asm ("sumsq %0, %1, %2" + : "+f" (result) + : "r" (x), "r" (y), "m" (*x), "m" (*y)); +@end smallexample + +Here is a fictitious @code{*z++ = *x++ * *y++} instruction. +Notice that the @code{x}, @code{y} and @code{z} pointer registers +must be specified as input/output because the @code{asm} modifies +them. + +@smallexample +asm ("vecmul %0, %1, %2" + : "+r" (z), "+r" (x), "+r" (y), "=m" (*z) + : "m" (*x), "m" (*y)); +@end smallexample + +An x86 example where the string memory argument is of unknown length. + +@smallexample +asm("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); +@end smallexample + +If you know the above will only be reading a ten byte array then you +could instead use a memory input like: +@code{"m" (*(const char (*)[10]) p)}. + +Here is an example of a PowerPC vector scale implemented in assembly, +complete with vector and condition code clobbers, and some initialized +offset registers that are unchanged by the @code{asm}. + +@smallexample +void +dscal (size_t n, double *x, double alpha) +@{ + asm ("/* lots of asm here */" + : "+m" (*(double (*)[n]) x), "+r" (n), "+b" (x) + : "d" (alpha), "b" (32), "b" (48), "b" (64), + "b" (80), "b" (96), "b" (112) + : "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"); +@} +@end smallexample @anchor{GotoLabels} @subsubsection Goto Labels diff --git a/gcc/testsuite/gcc.target/i386/asm-mem.c b/gcc/testsuite/gcc.target/i386/asm-mem.c new file mode 100644 index 0000000..89b713f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/asm-mem.c @@ -0,0 +1,59 @@ +/* { dg-do run } */ +/* { dg-options "-O3" } */ + +/* Check that "m" array references are effective in preventing the + array initialization from wandering past a use in the asm, and + that the casts remain supported. */ + +static int +f1 (const char *p) +{ + int count; + + __asm__ ("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[]) p), "0" (-1), "a" (0)); + return -2 - count; +} + +static int +f2 (const char *p) +{ + int count; + + __asm__ ("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[48]) p), "0" (-1), "a" (0)); + return -2 - count; +} + +static int +f3 (int n, const char *p) +{ + int count; + + __asm__ ("repne scasb" + : "=c" (count), "+D" (p) + : "m" (*(const char (*)[n]) p), "0" (-1), "a" (0)); + return -2 - count; +} + +int +main () +{ + int a; + char buff[48] = "hello world"; + buff[4] = 0; + a = f1 (buff); + if (a != 4) + __builtin_abort (); + buff[4] = 'o'; + a = f2 (buff); + if (a != 11) + __builtin_abort (); + buff[4] = 0; + a = f3 (48, buff); + if (a != 4) + __builtin_abort (); + return 0; +} -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Clobbers and Scratch Registers 2017-08-21 7:36 ` Alan Modra @ 2017-08-21 7:44 ` Alan Modra 2017-08-21 19:03 ` Richard Sandiford 2017-09-29 1:06 ` [PATCH] Asm memory constraints Alan Modra 2017-10-12 18:27 ` Jeff Law 2 siblings, 1 reply; 10+ messages in thread From: Alan Modra @ 2017-08-21 7:44 UTC (permalink / raw) To: gcc-patches; +Cc: Sandra Loosemore This is a revised version of https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to showing just the scratch register aspect, as a followup to https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html * doc/extend.texi (Extended Asm <Clobbers>): Rename to "Clobbers and Scratch Registers". Add paragraph on alternative to clobbers for scratch registers and OpenBLAS example. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 940490e..0637672 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the instructions in the @item Clobbers A comma-separated list of registers or other values changed by the @var{AssemblerTemplate}, beyond those listed as outputs. -An empty list is permitted. @xref{Clobbers}. +An empty list is permitted. @xref{Clobbers and Scratch Registers}. @item GotoLabels When you are using the @code{goto} form of @code{asm}, this section contains @@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the syntax. When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers -(@pxref{Clobbers}). +(@pxref{Clobbers and Scratch Registers}). Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being @@ -8671,7 +8671,8 @@ as input. The enclosing parentheses are a required part of the syntax. @end table When the compiler selects the registers to use to represent the input -operands, it does not use any of the clobbered registers (@pxref{Clobbers}). +operands, it does not use any of the clobbered registers +(@pxref{Clobbers and Scratch Registers}). If there are no output operands but there are input operands, place two consecutive colons where the output operands would go: @@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]" : "r" (test), "r" (new), "[result]" (old)); @end example -@anchor{Clobbers} -@subsubsection Clobbers +@anchor{Clobbers and Scratch Registers} +@subsubsection Clobbers and Scratch Registers @cindex @code{asm} clobbers +@cindex @code{asm} scratch registers While the compiler is aware of changes to entries listed in the output operands, the inline @code{asm} code may modify more than just the outputs. For @@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha) @} @end smallexample +Rather than allocating fixed registers via clobbers to provide scratch +registers for an @code{asm} statement, an alternative is to define a +variable and make it an early-clobber output as with @code{a2} and +@code{a3} in the example below. This gives the compiler register +allocator more freedom. You can also define a variable and make it an +output tied to an input as with @code{a0} and @code{a1}, tied +respectively to @code{ap} and @code{lda}. Of course, with tied +outputs your @code{asm} can't use the input value after modifying the +output register since they are one and the same register. Note also +that tying an input to an output is the way to set up an initialized +temporary register modified by an @code{asm} statement. An input not +tied to an output is assumed by GCC to be unchanged, for example +@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that +register in following code if the value 16 happened to be needed. You +can even use a normal @code{asm} output for a scratch if all inputs +that might share the same register are consumed before the scratch is +used. The VSX registers clobbered by the @code{asm} statement could +have used this technique except for GCC's limit on the number of +@code{asm} parameters. + +@smallexample +static void +dgemv_kernel_4x4 (long n, const double *ap, long lda, + const double *x, double *y, double alpha) +@{ + double *a0; + double *a1; + double *a2; + double *a3; + + __asm__ + ( + /* lots of asm here */ + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" + "#a0=%3 a1=%4 a2=%5 a3=%6" + : + "+m" (*(double (*)[n]) y), + "+r" (n), // 1 + "+b" (y), // 2 + "=b" (a0), // 3 + "=b" (a1), // 4 + "=&b" (a2), // 5 + "=&b" (a3) // 6 + : + "m" (*(const double (*)[n]) x), + "m" (*(const double (*)[]) ap), + "d" (alpha), // 9 + "r" (x), // 10 + "b" (16), // 11 + "3" (ap), // 12 + "4" (lda) // 13 + : + "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" + ); +@} +@end smallexample + @anchor{GotoLabels} @subsubsection Goto Labels @cindex @code{asm} goto labels -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Clobbers and Scratch Registers 2017-08-21 7:44 ` Clobbers and Scratch Registers Alan Modra @ 2017-08-21 19:03 ` Richard Sandiford 2017-08-22 6:32 ` Alan Modra 0 siblings, 1 reply; 10+ messages in thread From: Richard Sandiford @ 2017-08-21 19:03 UTC (permalink / raw) To: Alan Modra; +Cc: gcc-patches, Sandra Loosemore Thanks for doing this. Alan Modra <amodra@gmail.com> writes: > This is a revised version of > https://gcc.gnu.org/ml/gcc-patches/2017-03/msg01562.html limited to > showing just the scratch register aspect, as a followup to > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html > > * doc/extend.texi (Extended Asm <Clobbers>): Rename to > "Clobbers and Scratch Registers". Add paragraph on > alternative to clobbers for scratch registers and OpenBLAS > example. > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index 940490e..0637672 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the instructions in the > @item Clobbers > A comma-separated list of registers or other values changed by the > @var{AssemblerTemplate}, beyond those listed as outputs. > -An empty list is permitted. @xref{Clobbers}. > +An empty list is permitted. @xref{Clobbers and Scratch Registers}. > > @item GotoLabels > When you are using the @code{goto} form of @code{asm}, this section contains > @@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the syntax. > > When the compiler selects the registers to use to > represent the output operands, it does not use any of the clobbered registers > -(@pxref{Clobbers}). > +(@pxref{Clobbers and Scratch Registers}). > > Output operand expressions must be lvalues. The compiler cannot check whether > the operands have data types that are reasonable for the instruction being > @@ -8671,7 +8671,8 @@ as input. The enclosing parentheses are a required part of the syntax. > @end table > > When the compiler selects the registers to use to represent the input > -operands, it does not use any of the clobbered registers (@pxref{Clobbers}). > +operands, it does not use any of the clobbered registers > +(@pxref{Clobbers and Scratch Registers}). > > If there are no output operands but there are input operands, place two > consecutive colons where the output operands would go: > @@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]" > : "r" (test), "r" (new), "[result]" (old)); > @end example > > -@anchor{Clobbers} > -@subsubsection Clobbers > +@anchor{Clobbers and Scratch Registers} > +@subsubsection Clobbers and Scratch Registers > @cindex @code{asm} clobbers > +@cindex @code{asm} scratch registers > > While the compiler is aware of changes to entries listed in the output > operands, the inline @code{asm} code may modify more than just the outputs. For > @@ -8853,6 +8855,65 @@ dscal (size_t n, double *x, double alpha) > @} > @end smallexample > > +Rather than allocating fixed registers via clobbers to provide scratch > +registers for an @code{asm} statement, an alternative is to define a > +variable and make it an early-clobber output as with @code{a2} and > +@code{a3} in the example below. This gives the compiler register > +allocator more freedom. You can also define a variable and make it an > +output tied to an input as with @code{a0} and @code{a1}, tied > +respectively to @code{ap} and @code{lda}. I think it's worth emphasising that tying operands doesn't change whether an output needs an earlyclobber or not. E.g. for: asm ("%0 = f(%1); use %2" : "=r" (a) : "0" (b), "r" (c)); the compiler can assign the same register to all three operands if it can prove that b == c on entry. Since %0 is being modified before %2 is used, it needs to be: asm ("%0 = f(%1); use %2" : "=&r" (a) : "0" (b), "r" (c)); instead. Thanks, Richard > Of course, with tied > +outputs your @code{asm} can't use the input value after modifying the > +output register since they are one and the same register. Note also > +that tying an input to an output is the way to set up an initialized > +temporary register modified by an @code{asm} statement. An input not > +tied to an output is assumed by GCC to be unchanged, for example > +@code{"b" (16)} below sets up @code{%11} to 16, and GCC might use that > +register in following code if the value 16 happened to be needed. You > +can even use a normal @code{asm} output for a scratch if all inputs > +that might share the same register are consumed before the scratch is > +used. The VSX registers clobbered by the @code{asm} statement could > +have used this technique except for GCC's limit on the number of > +@code{asm} parameters. > + > +@smallexample > +static void > +dgemv_kernel_4x4 (long n, const double *ap, long lda, > + const double *x, double *y, double alpha) > +@{ > + double *a0; > + double *a1; > + double *a2; > + double *a3; > + > + __asm__ > + ( > + /* lots of asm here */ > + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" > + "#a0=%3 a1=%4 a2=%5 a3=%6" > + : > + "+m" (*(double (*)[n]) y), > + "+r" (n), // 1 > + "+b" (y), // 2 > + "=b" (a0), // 3 > + "=b" (a1), // 4 > + "=&b" (a2), // 5 > + "=&b" (a3) // 6 > + : > + "m" (*(const double (*)[n]) x), > + "m" (*(const double (*)[]) ap), > + "d" (alpha), // 9 > + "r" (x), // 10 > + "b" (16), // 11 > + "3" (ap), // 12 > + "4" (lda) // 13 > + : > + "cr0", > + "vs32","vs33","vs34","vs35","vs36","vs37", > + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" > + ); > +@} > +@end smallexample > + > @anchor{GotoLabels} > @subsubsection Goto Labels > @cindex @code{asm} goto labels ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Clobbers and Scratch Registers 2017-08-21 19:03 ` Richard Sandiford @ 2017-08-22 6:32 ` Alan Modra 2017-08-22 6:33 ` Alan Modra 2017-10-12 18:36 ` Jeff Law 0 siblings, 2 replies; 10+ messages in thread From: Alan Modra @ 2017-08-22 6:32 UTC (permalink / raw) To: gcc-patches, Sandra Loosemore, richard.sandiford On Mon, Aug 21, 2017 at 06:33:09PM +0100, Richard Sandiford wrote: > I think it's worth emphasising that tying operands doesn't change > whether an output needs an earlyclobber or not. E.g. for: Thanks for noticing this. It turns out that my OpenBLAS example actually ought to have an early-clobber on one of the tied outputs, so you've also alerted me to another bug in the power8 code. (Well, only if the dgemv kernel was called directly from user code with a 16*N A matrix, or I suppose if LTO was used.) So I now have a real-world example of the situation where you need an early-clobber on tied outputs, and also where an early-clobber is undesirable. Revised and expanded. * doc/extend.texi (Extended Asm <Clobbers>): Rename to "Clobbers and Scratch Registers". Add paragraph on alternative to clobbers for scratch registers and OpenBLAS example. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 940490e..cef6c57 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8075,7 +8075,7 @@ A comma-separated list of C expressions read by the instructions in the @item Clobbers A comma-separated list of registers or other values changed by the @var{AssemblerTemplate}, beyond those listed as outputs. -An empty list is permitted. @xref{Clobbers}. +An empty list is permitted. @xref{Clobbers and Scratch Registers}. @item GotoLabels When you are using the @code{goto} form of @code{asm}, this section contains @@ -8435,7 +8435,7 @@ The enclosing parentheses are a required part of the syntax. When the compiler selects the registers to use to represent the output operands, it does not use any of the clobbered registers -(@pxref{Clobbers}). +(@pxref{Clobbers and Scratch Registers}). Output operand expressions must be lvalues. The compiler cannot check whether the operands have data types that are reasonable for the instruction being @@ -8671,7 +8671,8 @@ as input. The enclosing parentheses are a required part of the syntax. @end table When the compiler selects the registers to use to represent the input -operands, it does not use any of the clobbered registers (@pxref{Clobbers}). +operands, it does not use any of the clobbered registers +(@pxref{Clobbers and Scratch Registers}). If there are no output operands but there are input operands, place two consecutive colons where the output operands would go: @@ -8722,9 +8723,10 @@ asm ("cmoveq %1, %2, %[result]" : "r" (test), "r" (new), "[result]" (old)); @end example -@anchor{Clobbers} -@subsubsection Clobbers +@anchor{Clobbers and Scratch Registers} +@subsubsection Clobbers and Scratch Registers @cindex @code{asm} clobbers +@cindex @code{asm} scratch registers While the compiler is aware of changes to entries listed in the output operands, the inline @code{asm} code may modify more than just the outputs. For @@ -8853,6 +8855,75 @@ dscal (size_t n, double *x, double alpha) @} @end smallexample +Rather than allocating fixed registers via clobbers to provide scratch +registers for an @code{asm} statement, an alternative is to define a +variable and make it an early-clobber output as with @code{a2} and +@code{a3} in the example below. This gives the compiler register +allocator more freedom. You can also define a variable and make it an +output tied to an input as with @code{a0} and @code{a1}, tied +respectively to @code{ap} and @code{lda}. Of course, with tied +outputs your @code{asm} can't use the input value after modifying the +output register since they are one and the same register. What's +more, if you omit the early-clobber on the output, it is possible that +GCC might allocate the same register to another of the inputs if GCC +could prove they had the same value on entry to the @code{asm}. This +is why @code{a1} has an early-clobber. Its tied input, @code{lda} +might conceivably be known to have the value 16 and without an +early-clobber share the same register as @code{%11}. On the other +hand, @code{ap} can't be the same as any of the other inputs, so an +early-clobber on @code{a0} is not needed. It is also not desirable in +this case. An early-clobber on @code{a0} would cause GCC to allocate +a separate register for the @code{"m" (*(const double (*)[]) ap)} +input. Note that tying an input to an output is the way to set up an +initialized temporary register modified by an @code{asm} statement. +An input not tied to an output is assumed by GCC to be unchanged, for +example @code{"b" (16)} below sets up @code{%11} to 16, and GCC might +use that register in following code if the value 16 happened to be +needed. You can even use a normal @code{asm} output for a scratch if +all inputs that might share the same register are consumed before the +scratch is used. The VSX registers clobbered by the @code{asm} +statement could have used this technique except for GCC's limit on the +number of @code{asm} parameters. + +@smallexample +static void +dgemv_kernel_4x4 (long n, const double *ap, long lda, + const double *x, double *y, double alpha) +@{ + double *a0; + double *a1; + double *a2; + double *a3; + + __asm__ + ( + /* lots of asm here */ + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" + "#a0=%3 a1=%4 a2=%5 a3=%6" + : + "+m" (*(double (*)[n]) y), + "+r" (n), // 1 + "+b" (y), // 2 + "=b" (a0), // 3 + "=&b" (a1), // 4 + "=&b" (a2), // 5 + "=&b" (a3) // 6 + : + "m" (*(const double (*)[n]) x), + "m" (*(const double (*)[]) ap), + "d" (alpha), // 9 + "r" (x), // 10 + "b" (16), // 11 + "3" (ap), // 12 + "4" (lda) // 13 + : + "cr0", + "vs32","vs33","vs34","vs35","vs36","vs37", + "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47" + ); +@} +@end smallexample + @anchor{GotoLabels} @subsubsection Goto Labels @cindex @code{asm} goto labels -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Clobbers and Scratch Registers 2017-08-22 6:32 ` Alan Modra @ 2017-08-22 6:33 ` Alan Modra 2017-10-12 18:36 ` Jeff Law 1 sibling, 0 replies; 10+ messages in thread From: Alan Modra @ 2017-08-22 6:33 UTC (permalink / raw) To: gcc-patches, Sandra Loosemore, richard.sandiford On Tue, Aug 22, 2017 at 01:41:21PM +0930, Alan Modra wrote: > + "#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n" > + "#a0=%3 a1=%4 a2=%5 a3=%6" > + : > + "+m" (*(double (*)[n]) y), > + "+r" (n), // 1 Another small revision. That needs to be "+&r" (n), in case n can be deduced to be 16, matching one of the other inputs. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Clobbers and Scratch Registers 2017-08-22 6:32 ` Alan Modra 2017-08-22 6:33 ` Alan Modra @ 2017-10-12 18:36 ` Jeff Law 1 sibling, 0 replies; 10+ messages in thread From: Jeff Law @ 2017-10-12 18:36 UTC (permalink / raw) To: Alan Modra, gcc-patches, Sandra Loosemore, richard.sandiford On 08/21/2017 10:11 PM, Alan Modra wrote: > On Mon, Aug 21, 2017 at 06:33:09PM +0100, Richard Sandiford wrote: >> I think it's worth emphasising that tying operands doesn't change >> whether an output needs an earlyclobber or not. E.g. for: > > Thanks for noticing this. It turns out that my OpenBLAS example > actually ought to have an early-clobber on one of the tied outputs, so > you've also alerted me to another bug in the power8 code. (Well, only > if the dgemv kernel was called directly from user code with a 16*N A > matrix, or I suppose if LTO was used.) So I now have a real-world > example of the situation where you need an early-clobber on tied > outputs, and also where an early-clobber is undesirable. > > Revised and expanded. > > * doc/extend.texi (Extended Asm <Clobbers>): Rename to > "Clobbers and Scratch Registers". Add paragraph on > alternative to clobbers for scratch registers and OpenBLAS > example. Also OK. I think you had a minor revision on this which is OK as well. I never would have spotted the additional earlyclobber requirement. jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Asm memory constraints 2017-08-21 7:36 ` Alan Modra 2017-08-21 7:44 ` Clobbers and Scratch Registers Alan Modra @ 2017-09-29 1:06 ` Alan Modra 2017-10-12 18:27 ` Jeff Law 2 siblings, 0 replies; 10+ messages in thread From: Alan Modra @ 2017-09-29 1:06 UTC (permalink / raw) To: gcc-patches; +Cc: law On Mon, Aug 21, 2017 at 10:29:30AM +0930, Alan Modra wrote: > Fixed in this revised patch. The only controversial aspect now should > be whether those array casts ought to be officially blessed. I've > checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and > "=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled > apparently properly by alias.c and other code. Ping https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01174.html Needs a global reviewer to bless array casts in asm constraints. -- Alan Modra Australia Development Lab, IBM ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Asm memory constraints 2017-08-21 7:36 ` Alan Modra 2017-08-21 7:44 ` Clobbers and Scratch Registers Alan Modra 2017-09-29 1:06 ` [PATCH] Asm memory constraints Alan Modra @ 2017-10-12 18:27 ` Jeff Law 2 siblings, 0 replies; 10+ messages in thread From: Jeff Law @ 2017-10-12 18:27 UTC (permalink / raw) To: Alan Modra, Segher Boessenkool; +Cc: gcc-patches On 08/20/2017 06:59 PM, Alan Modra wrote: > On Sun, Aug 20, 2017 at 08:00:53AM -0500, Segher Boessenkool wrote: >> Hi Alan, >> >> On Sat, Aug 19, 2017 at 12:19:35AM +0930, Alan Modra wrote: >>> +Flushing registers to memory has performance implications and may be >>> +an issue for time-sensitive code. You can provide better information >>> +to GCC to avoid this, as shown in the following examples. At a >>> +minimum, aliasing rules allow GCC to know what memory @emph{doesn't} >>> +need to be flushed. Also, if GCC can prove that all of the outputs of >>> +a non-volatile @code{asm} statement are unused, then the @code{asm} >>> +may be deleted. Removal of otherwise dead @code{asm} statements will >>> +not happen if they clobber @code{"memory"}. >> >> void f(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x) : "memory"); } >> void g(int x) { int z; asm("hcf %0,%1" : "=r"(z) : "r"(x)); } >> >> Both f and g are completely removed by the first jump pass immediately >> after expand (via delete_trivially_dead_insns). >> >> Do you have a testcase for the behaviour you saw? > > Oh my. I was sure that was how "memory" worked! I see though that > every gcc I have lying around, going all the way back to gcc-2.95, > deletes the asm in your testcase. I definitely don't want to put > something in the docs that is plain wrong, or just my idea of how > things ought to work, so the last two sentences quoted above need to > go. Thanks for the correction. > > Fixed in this revised patch. The only controversial aspect now should > be whether those array casts ought to be officially blessed. I've > checked that "=m" (*(T (*)[]) ptr), "=m" (*(T (*)[n]) ptr), and > "=m" (*(T (*)[10]) ptr), all generate reasonable MEM_ATTRS handled > apparently properly by alias.c and other code. > > For example, at -O3 the following shows gcc moving the read of "val" > before the asm, while an asm using a "memory" clobber forces the read > to occur after the asm. > > static int > f (double *x) > { > int res; > asm ("#%0 %1 %2" : "=r" (res) : "r" (x), "m" (*(double (*)[]) x)); > return res; > } > > int val = 123; > double foo[10]; > > int > main () > { > int b = f (foo); > __builtin_printf ("%d %d\n", val, b); > return 0; > } > > > I'm also encouraged by comments like the following by rth in 2004 > (gcc/c/c-typeck.c), which say that using non-kosher lvalues in memory > output constraints must continue to be supported. > > /* ??? Really, this should not be here. Users should be using a > proper lvalue, dammit. But there's a long history of using casts > in the output operands. In cases like longlong.h, this becomes a > primitive form of typechecking -- if the cast can be removed, then > the output operand had a type of the proper width; otherwise we'll > get an error. Gross, but ... */ > STRIP_NOPS (output); > > > * doc/extend.texi (Clobbers): Correct vax example. Delete old > example of a memory input for a string of known length. Move > commentary out of table. Add a number of new examples > covering array memory inputs. > testsuite/ > * gcc.target/i386/asm-mem.c: New test. OK. Sorry about the long wait. jeff ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2017-10-12 18:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-18 17:51 [PATCH] Asm memory constraints Alan Modra 2017-08-21 0:59 ` Segher Boessenkool 2017-08-21 7:36 ` Alan Modra 2017-08-21 7:44 ` Clobbers and Scratch Registers Alan Modra 2017-08-21 19:03 ` Richard Sandiford 2017-08-22 6:32 ` Alan Modra 2017-08-22 6:33 ` Alan Modra 2017-10-12 18:36 ` Jeff Law 2017-09-29 1:06 ` [PATCH] Asm memory constraints Alan Modra 2017-10-12 18:27 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).