public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* PROPOSAL: Extend inline asm syntax with size spec
       [not found] <20181003213100.189959-1-namit@vmware.com>
@ 2018-10-07  9:46 ` Borislav Petkov
  2018-10-07 13:23   ` Segher Boessenkool
       [not found]   ` <4F2F1BCE-7875-4160-9E1E-9F8EF962D989@vmware.com>
  0 siblings, 2 replies; 40+ messages in thread
From: Borislav Petkov @ 2018-10-07  9:46 UTC (permalink / raw)
  To: gcc, Richard Biener, Michael Matz
  Cc: Nadav Amit, Ingo Molnar, linux-kernel, x86, Masahiro Yamada,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	H. Peter Anvin, Jan Beulich, Josh Poimboeuf, Juergen Gross,
	Kate Stewart, Kees Cook, linux-sparse, Peter Zijlstra,
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

Hi people,

this is an attempt to see whether gcc's inline asm heuristic when
estimating inline asm statements' cost for better inlining can be
improved.

AFAIU, the problematic arises when one ends up using a lot of inline
asm statements in the kernel but due to the inline asm cost estimation
heuristic which counts lines, I think, for example like in this here
macro:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/include/asm/cpufeature.h#n162

the resulting code ends up not inlining the functions themselves which
use this macro. I.e., you see a CALL <function> instead of its body
getting inlined directly.

Even though it should be because the actual instructions are only a
couple in most cases and all those other directives end up in another
section anyway.

The issue is explained below in the forwarded mail in a larger detail
too.

Now, Richard suggested doing something like:

 1) inline asm ("...")
 2) asm ("..." : : : : <size-expr>)
 3) asm ("...") __attribute__((asm_size(<size-expr>)));

with which user can tell gcc what the size of that inline asm statement
is and thus allow for more precise cost estimation and in the end better
inlining.

And FWIW 3) looks pretty straight-forward to me because attributes are
pretty common anyways.

But I'm sure there are other options and I'm sure people will have
better/different ideas so feel free to chime in.

Thx.

On Wed, Oct 03, 2018 at 02:30:50PM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
> 
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion. I also separated the removal of unnecessary new-lines which
> would be sent separately.
> 
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
> 
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block.  As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code.
> 
> To avoid uglification of the code, as many noted, the macros are first
> precompiled into an assembly file, which is later assembled together
> with the C files. This also enables to avoid duplicate implementation
> that was set before for the asm and C code. This can be seen in the
> exception table changes.
> 
> Overall this patch-set slightly increases the kernel size (my build was
> done using my Ubuntu 18.04 config + localyesconfig for the record):
> 
>    text	   data	    bss	    dec	    hex	filename
> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
> 
> The number of static functions in the image is reduced by 379, but
> actually inlining is even better, which does not always shows in these
> numbers: a function may be inlined causing the calling function not to
> be inlined.
> 
> I ran some limited number of benchmarks, and in general the performance
> impact is not very notable. You can still see >10 cycles shaved off some
> syscalls that manipulate page-tables (e.g., mprotect()), in which
> paravirt caused many functions not to be inlined. In addition this
> patch-set can prevent issues such as [1], and improves code readability
> and maintainability.
> 
> Update: Rasmus recently caused me (inadvertently) to become paranoid
> about the dependencies. To clarify: if any of the headers changes, any c
> file which uses macros that are included in macros.S would be fine as
> long as it includes the header as well (as it should). Adding an
> assertion to check this is done might become slightly ugly, and nobody
> else is concerned about it. Another minor issue is that changes of
> macros.S would not trigger a global rebuild, but that is pretty similar
> to changes of the Makefile that do not trigger a rebuild.
> 
> [1] https://patchwork.kernel.org/patch/10450037/
> 
> v8->v9: * Restoring the '-pipe' parameter (Rasmus)
> 	* Adding Kees's tested-by tag (Kees)
> 
> v7->v8:	* Add acks (Masahiro, Max)
> 	* Rebase on 4.19 (Ingo)
> 
> v6->v7: * Fix context switch tracking (Ingo)
> 	* Fix xtensa build error (Ingo)
> 	* Rebase on 4.18-rc8
> 
> v5->v6:	* Removing more code from jump-labels (PeterZ)
> 	* Fix build issue on i386 (0-day, PeterZ)
> 
> v4->v5:	* Makefile fixes (Masahiro, Sam)
> 
> v3->v4: * Changed naming of macros in 2 patches (PeterZ)
> 	* Minor cleanup of the paravirt patch
> 
> v2->v3: * Several build issues resolved (0-day)
> 	* Wrong comments fix (Josh)
> 	* Change asm vs C order in refcount (Kees)
> 
> v1->v2:	* Compiling the macros into a separate .s file, improving
> 	  readability (Linus)
> 	* Improving assembly formatting, applying most of the comments
> 	  according to my judgment (Jan)
> 	* Adding exception-table, cpufeature and jump-labels
> 	* Removing new-line cleanup; to be submitted separately
> 
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Cc: Alok Kataria <akataria@vmware.com>
> Cc: Christopher Li <sparse@chrisli.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jan Beulich <JBeulich@suse.com>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Kate Stewart <kstewart@linuxfoundation.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: linux-sparse@vger.kernel.org
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Philippe Ombredanne <pombredanne@nexb.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: virtualization@lists.linux-foundation.org
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: x86@kernel.org
> Cc: Chris Zankel <chris@zankel.net>
> Cc: Max Filippov <jcmvbkbc@gmail.com>
> Cc: linux-xtensa@linux-xtensa.org
> 
> Nadav Amit (10):
>   xtensa: defining LINKER_SCRIPT for the linker script
>   Makefile: Prepare for using macros for inline asm
>   x86: objtool: use asm macro for better compiler decisions
>   x86: refcount: prevent gcc distortions
>   x86: alternatives: macrofy locks for better inlining
>   x86: bug: prevent gcc distortions
>   x86: prevent inline distortion by paravirt ops
>   x86: extable: use macros instead of inline assembly
>   x86: cpufeature: use macros instead of inline assembly
>   x86: jump-labels: use macros instead of inline assembly
> 
>  Makefile                               |  9 ++-
>  arch/x86/Makefile                      |  7 ++
>  arch/x86/entry/calling.h               |  2 +-
>  arch/x86/include/asm/alternative-asm.h | 20 ++++--
>  arch/x86/include/asm/alternative.h     | 11 +--
>  arch/x86/include/asm/asm.h             | 61 +++++++---------
>  arch/x86/include/asm/bug.h             | 98 +++++++++++++++-----------
>  arch/x86/include/asm/cpufeature.h      | 82 ++++++++++++---------
>  arch/x86/include/asm/jump_label.h      | 77 ++++++++------------
>  arch/x86/include/asm/paravirt_types.h  | 56 +++++++--------
>  arch/x86/include/asm/refcount.h        | 74 +++++++++++--------
>  arch/x86/kernel/macros.S               | 16 +++++
>  arch/xtensa/kernel/Makefile            |  4 +-
>  include/asm-generic/bug.h              |  8 +--
>  include/linux/compiler.h               | 56 +++++++++++----
>  scripts/Kbuild.include                 |  4 +-
>  scripts/mod/Makefile                   |  2 +
>  17 files changed, 331 insertions(+), 256 deletions(-)
>  create mode 100644 arch/x86/kernel/macros.S
> 
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07  9:46 ` PROPOSAL: Extend inline asm syntax with size spec Borislav Petkov
@ 2018-10-07 13:23   ` Segher Boessenkool
  2018-10-07 14:14     ` Borislav Petkov
  2018-10-07 15:53     ` Michael Matz
       [not found]   ` <4F2F1BCE-7875-4160-9E1E-9F8EF962D989@vmware.com>
  1 sibling, 2 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-07 13:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: gcc, Richard Biener, Michael Matz, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> this is an attempt to see whether gcc's inline asm heuristic when
> estimating inline asm statements' cost for better inlining can be
> improved.

GCC already estimates the *size* of inline asm, and this is required
*for correctness*.  So any workaround that works against this will only
end in tears.

Taking size as an estimate of cost is not very good.  But in your
motivating example you actually *do* care mostly about size, namely, for
the inlining decisions.

So I guess the real issue is that the inline asm size estimate for x86
isn't very good (since it has to be pessimistic, and x86 insns can be
huge)?

> Now, Richard suggested doing something like:
> 
>  1) inline asm ("...")

What would the semantics of this be?

>  2) asm ("..." : : : : <size-expr>)

This potentially conflicts with the syntax for asm goto.

>  3) asm ("...") __attribute__((asm_size(<size-expr>)));

Eww.

> with which user can tell gcc what the size of that inline asm statement
> is and thus allow for more precise cost estimation and in the end better
> inlining.

More precise *size* estimates, yes.  And if the user lies he should not
be surprised to get assembler errors, etc.

> And FWIW 3) looks pretty straight-forward to me because attributes are
> pretty common anyways.

I don't like 2) either.  But 1) looks interesting, depends what its
semantics would be?  "Don't count this insn's size for inlining decisions",
maybe?


Another option is to just force inlining for those few functions where
GCC currently makes an inlining decision you don't like.  Or are there
more than a few?


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 13:23   ` Segher Boessenkool
@ 2018-10-07 14:14     ` Borislav Petkov
  2018-10-07 15:14       ` Segher Boessenkool
  2018-10-07 15:53     ` Michael Matz
  1 sibling, 1 reply; 40+ messages in thread
From: Borislav Petkov @ 2018-10-07 14:14 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: gcc, Richard Biener, Michael Matz, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Sun, Oct 07, 2018 at 08:22:28AM -0500, Segher Boessenkool wrote:
> GCC already estimates the *size* of inline asm, and this is required
> *for correctness*.

I didn't say it didn't - but the heuristic could use improving.

> So I guess the real issue is that the inline asm size estimate for x86
> isn't very good (since it has to be pessimistic, and x86 insns can be
> huge)?

Well, the size thing could be just a "parameter" or "hint" of sorts, to
tell gcc to inline the function X which is inlining the asm statement
into the function Y which is calling function X. If you look at the
patchset, it is moving everything to asm macros where gcc is apparently
able to do better inlining.

> >  3) asm ("...") __attribute__((asm_size(<size-expr>)));
> 
> Eww.

Why?

> More precise *size* estimates, yes.  And if the user lies he should not
> be surprised to get assembler errors, etc.

Yes.

Another option would be if gcc parses the inline asm directly and
does a more precise size estimation. Which is a lot more involved and
complicated solution so I guess we wanna look at the simpler ones first.

:-)

> I don't like 2) either.  But 1) looks interesting, depends what its
> semantics would be?  "Don't count this insn's size for inlining decisions",
> maybe?

Or simply "this asm statement has a size of 1" to mean, inline it
everywhere. Which has the same caveats as above.

> Another option is to just force inlining for those few functions where
> GCC currently makes an inlining decision you don't like.  Or are there
> more than a few?

I'm afraid they're more than a few and this should work automatically,
if possible.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 14:14     ` Borislav Petkov
@ 2018-10-07 15:14       ` Segher Boessenkool
       [not found]         ` <20181008055838.GA66819@gmail.com>
  0 siblings, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-07 15:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: gcc, Richard Biener, Michael Matz, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Sun, Oct 07, 2018 at 04:13:49PM +0200, Borislav Petkov wrote:
> On Sun, Oct 07, 2018 at 08:22:28AM -0500, Segher Boessenkool wrote:
> > GCC already estimates the *size* of inline asm, and this is required
> > *for correctness*.
> 
> I didn't say it didn't - but the heuristic could use improving.

How?  It is as sharp an estimate as can be *already*: number of insns
times maximum size per insn.

If you get this wrong, conditional branches (and similar things, but
conditional branches usually hit first, and hurt most) will stop working
correctly, unless binutils uses relaxation for those on your architecture
(most don't).

> > So I guess the real issue is that the inline asm size estimate for x86
> > isn't very good (since it has to be pessimistic, and x86 insns can be
> > huge)?
> 
> Well, the size thing could be just a "parameter" or "hint" of sorts, to
> tell gcc to inline the function X which is inlining the asm statement
> into the function Y which is calling function X. If you look at the
> patchset, it is moving everything to asm macros where gcc is apparently
> able to do better inlining.

Yes, that will cause fewer problems I think: do not override size _at all_,
but give a hint to the inliner.

> > >  3) asm ("...") __attribute__((asm_size(<size-expr>)));
> > 
> > Eww.
> 
> Why?

Attributes are clumsy and clunky and kludgy.

It never is well-defined how attributes interact, and the more attributes
we define and use, the more that matters.

> > More precise *size* estimates, yes.  And if the user lies he should not
> > be surprised to get assembler errors, etc.
> 
> Yes.
> 
> Another option would be if gcc parses the inline asm directly and
> does a more precise size estimation. Which is a lot more involved and
> complicated solution so I guess we wanna look at the simpler ones first.
> 
> :-)

Which is *impossible* to do.  Inline assembler is free-form text.

> > I don't like 2) either.  But 1) looks interesting, depends what its
> > semantics would be?  "Don't count this insn's size for inlining decisions",
> > maybe?
> 
> Or simply "this asm statement has a size of 1" to mean, inline it
> everywhere. Which has the same caveats as above.

"Has minimum length" then (size 1 cannot work on most archs).

> > Another option is to just force inlining for those few functions where
> > GCC currently makes an inlining decision you don't like.  Or are there
> > more than a few?
> 
> I'm afraid they're more than a few and this should work automatically,
> if possible.

Would counting *all* asms as having minimum length for inlining decisions
work?  Will that give bad side effects?

Or since this problem is quite specific to x86, maybe some target hook is
wanted?  Things work quite well elsewhere as-is, degrading that is not a
good idea.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 13:23   ` Segher Boessenkool
  2018-10-07 14:14     ` Borislav Petkov
@ 2018-10-07 15:53     ` Michael Matz
  2018-10-08  7:31       ` Segher Boessenkool
       [not found]       ` <20181008061323.GB66819@gmail.com>
  1 sibling, 2 replies; 40+ messages in thread
From: Michael Matz @ 2018-10-07 15:53 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Borislav Petkov, gcc, Richard Biener, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

Hi Segher,

On Sun, 7 Oct 2018, Segher Boessenkool wrote:

> On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > this is an attempt to see whether gcc's inline asm heuristic when
> > estimating inline asm statements' cost for better inlining can be
> > improved.
> 
> GCC already estimates the *size* of inline asm, and this is required
> *for correctness*.  So any workaround that works against this will only
> end in tears.

You're right and wrong.  GCC can't even estimate the size of mildly 
complicated inline asms right now, so your claim of it being necessary for 
correctness can't be true in this absolute form.  I know what you try to 
say, but still, consider inline asms like this:

     insn1
  .section bla
     insn2
  .previous

or
   invoke_asm_macro foo,bar

in both cases GCCs size estimate will be wrong however you want to count 
it.  This is actually the motivating example for the kernel guys, the 
games they play within their inline asms make the estimates be wildly 
wrong to a point it interacts with the inliner.

> So I guess the real issue is that the inline asm size estimate for x86 
> isn't very good (since it has to be pessimistic, and x86 insns can be 
> huge)?

No, see above, even if we were to improve the size estimates (e.g. based 
on some average instruction size) the kernel examples would still be off 
because they switch sections back and forth, use asm macros and computed 
.fill directives and maybe further stuff.  GCC will never be able to 
accurately calculate these sizes (without an built-in assembler which 
hopefully noone proposes).

So, there is a case for extending the inline-asm facility to say 
"size is complicated here, assume this for inline decisions".

> > Now, Richard suggested doing something like:
> > 
> >  1) inline asm ("...")
> 
> What would the semantics of this be?

The size of the inline asm wouldn't be counted towards the inliner size 
limits (or be counted as "1").

> I don't like 2) either.  But 1) looks interesting, depends what its
> semantics would be?  "Don't count this insn's size for inlining decisions",
> maybe?

TBH, I like the inline asm (...) suggestion most currently, but what if we 
want to add more attributes to asms?  We could add further special 
keywords to the clobber list:
  asm ("...." : : : "cc,memory,inline");
sure, it might seem strange to "clobber" inline, but if we reinterpret the 
clobber list as arbitrary set of attributes for this asm, it'd be fine.

> Another option is to just force inlining for those few functions where 
> GCC currently makes an inlining decision you don't like.  Or are there 
> more than a few?

I think the examples I saw from Boris were all indirect inlines:

  static inline void foo() { asm("large-looking-but-small-asm"); }
  static void bar1() { ... foo() ... }
  static void bar2() { ... foo() ... }
  void goo (void) { bar1(); }  // bar1 should have been inlined

So, while the immediate asm user was marked as always inline that in turn 
caused users of it to become non-inlined.  I'm assuming the kernel guys 
did proper measurements that they _really_ get some non-trivial speed 
benefit by inlining bar1/bar2, but for some reasons (I didn't inquire) 
didn't want to mark them all as inline as well.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [RESEND] PROPOSAL: Extend inline asm syntax with size spec
       [not found]   ` <4F2F1BCE-7875-4160-9E1E-9F8EF962D989@vmware.com>
@ 2018-10-07 16:13     ` Nadav Amit
  2018-10-07 16:46     ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Nadav Amit @ 2018-10-07 16:13 UTC (permalink / raw)
  To: Borislav Petkov, gcc, Richard Biener, Michael Matz
  Cc: Ingo Molnar, LKML, X86 ML, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

Second try after being blocked by gcc mailing list:

at 9:09 AM, Nadav Amit <namit@vmware.com<mailto:namit@vmware.com>> wrote:

at 2:18 AM, Borislav Petkov <bp@alien8.de<mailto:bp@alien8.de>> wrote:

Hi people,

this is an attempt to see whether gcc's inline asm heuristic when
estimating inline asm statements' cost for better inlining can be
improved.

AFAIU, the problematic arises when one ends up using a lot of inline
asm statements in the kernel but due to the inline asm cost estimation
heuristic which counts lines, I think, for example like in this here
macro:

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&amp;sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&amp;reserved=0

the resulting code ends up not inlining the functions themselves which
use this macro. I.e., you see a CALL <function> instead of its body
getting inlined directly.

Even though it should be because the actual instructions are only a
couple in most cases and all those other directives end up in another
section anyway.

The issue is explained below in the forwarded mail in a larger detail
too.

Now, Richard suggested doing something like:

1) inline asm ("...")
2) asm ("..." : : : : <size-expr>)
3) asm ("...") __attribute__((asm_size(<size-expr>)));

with which user can tell gcc what the size of that inline asm statement
is and thus allow for more precise cost estimation and in the end better
inlining.

And FWIW 3) looks pretty straight-forward to me because attributes are
pretty common anyways.

But I'm sure there are other options and I'm sure people will have
better/different ideas so feel free to chime in.

Thanks for taking care of it. I would like to mention a second issue, since
you may want to resolve both with a single solution: not inlining
conditional __builtin_constant_p(), in which there are two code-paths - one
for constants and one for variables.

Consider for example the Linux kernel ilog2 macro, which has a condition
based on __builtin_constant_p() (
https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
). The compiler mistakenly considers the “heavy” code-path that is supposed
to be evaluated only in compilation time to evaluate the code size. This
causes the kernel to consider functions such as kmalloc() as “big”.
kmalloc() is marked with always_inline attribute, so instead the calling
functions, such as kzalloc() are not inlined.

When I thought about hacking gcc to solve this issue, I considered an
intrinsic that would override the cost of a given statement. This solution
is not too nice, but may solve both issues.

In addition, note that AFAIU the impact of a wrong cost of code estimation
can also impact loop and other optimizations.

Regards,
Nadav



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]   ` <4F2F1BCE-7875-4160-9E1E-9F8EF962D989@vmware.com>
  2018-10-07 16:13     ` [RESEND] " Nadav Amit
@ 2018-10-07 16:46     ` Richard Biener
  2018-10-07 19:06       ` Nadav Amit
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Biener @ 2018-10-07 16:46 UTC (permalink / raw)
  To: Nadav Amit, Borislav Petkov, gcc, Michael Matz
  Cc: Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <namit@vmware.com> wrote:
>at 2:18 AM, Borislav Petkov <bp@alien8.de> wrote:
>
>> Hi people,
>> 
>> this is an attempt to see whether gcc's inline asm heuristic when
>> estimating inline asm statements' cost for better inlining can be
>> improved.
>> 
>> AFAIU, the problematic arises when one ends up using a lot of inline
>> asm statements in the kernel but due to the inline asm cost
>estimation
>> heuristic which counts lines, I think, for example like in this here
>> macro:
>> 
>>
>https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C6db1258c65ea45bbe4ea08d62c35ceec%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745007006838299&amp;sdata=iehl%2Fb8h%2BZE%2Frqb4qjac19WekSgOObc9%2BM1Jto1VgF4%3D&amp;reserved=0
>> 
>> the resulting code ends up not inlining the functions themselves
>which
>> use this macro. I.e., you see a CALL <function> instead of its body
>> getting inlined directly.
>> 
>> Even though it should be because the actual instructions are only a
>> couple in most cases and all those other directives end up in another
>> section anyway.
>> 
>> The issue is explained below in the forwarded mail in a larger detail
>> too.
>> 
>> Now, Richard suggested doing something like:
>> 
>> 1) inline asm ("...")
>> 2) asm ("..." : : : : <size-expr>)
>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>> 
>> with which user can tell gcc what the size of that inline asm
>statement
>> is and thus allow for more precise cost estimation and in the end
>better
>> inlining.
>> 
>> And FWIW 3) looks pretty straight-forward to me because attributes
>are
>> pretty common anyways.
>> 
>> But I'm sure there are other options and I'm sure people will have
>> better/different ideas so feel free to chime in.
>
>Thanks for taking care of it. I would like to mention a second issue,
>since
>you may want to resolve both with a single solution: not inlining
>conditional __builtin_constant_p(), in which there are two code-paths -
>one
>for constants and one for variables.
>
>Consider for example the Linux kernel ilog2 macro, which has a
>condition
>based on __builtin_constant_p() (
>https://elixir.bootlin.com/linux/v4.19-rc7/source/include/linux/log2.h#L160
>). The compiler mistakenly considers the “heavy” code-path that is
>supposed
>to be evaluated only in compilation time to evaluate the code size.

But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do). 

Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course. 

Richard. 

>This
>causes the kernel to consider functions such as kmalloc() as “big”.
>kmalloc() is marked with always_inline attribute, so instead the
>calling
>functions, such as kzalloc() are not inlined.
>
>When I thought about hacking gcc to solve this issue, I considered an
>intrinsic that would override the cost of a given statement. This
>solution
>is not too nice, but may solve both issues.
>
>In addition, note that AFAIU the impact of a wrong cost of code
>estimation
>can also impact loop and other optimizations.
>
>Regards,
>Nadav

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 16:46     ` Richard Biener
@ 2018-10-07 19:06       ` Nadav Amit
  2018-10-07 19:52         ` Jeff Law
  2018-10-08  7:46         ` Richard Biener
  0 siblings, 2 replies; 40+ messages in thread
From: Nadav Amit @ 2018-10-07 19:06 UTC (permalink / raw)
  To: Richard Biener, Borislav Petkov, gcc, Michael Matz
  Cc: Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

at 9:46 AM, Richard Biener <rguenther@suse.de> wrote:

> On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <namit@vmware.com> wrote:
>> at 2:18 AM, Borislav Petkov <bp@alien8.de> wrote:
>> 
>>> Hi people,
>>> 
>>> this is an attempt to see whether gcc's inline asm heuristic when
>>> estimating inline asm statements' cost for better inlining can be
>>> improved.
>>> 
>>> AFAIU, the problematic arises when one ends up using a lot of inline
>>> asm statements in the kernel but due to the inline asm cost
>> estimation
>>> heuristic which counts lines, I think, for example like in this here
>>> macro:
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
>>> the resulting code ends up not inlining the functions themselves
>> which
>>> use this macro. I.e., you see a CALL <function> instead of its body
>>> getting inlined directly.
>>> 
>>> Even though it should be because the actual instructions are only a
>>> couple in most cases and all those other directives end up in another
>>> section anyway.
>>> 
>>> The issue is explained below in the forwarded mail in a larger detail
>>> too.
>>> 
>>> Now, Richard suggested doing something like:
>>> 
>>> 1) inline asm ("...")
>>> 2) asm ("..." : : : : <size-expr>)
>>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>>> 
>>> with which user can tell gcc what the size of that inline asm
>> statement
>>> is and thus allow for more precise cost estimation and in the end
>> better
>>> inlining.
>>> 
>>> And FWIW 3) looks pretty straight-forward to me because attributes
>> are
>>> pretty common anyways.
>>> 
>>> But I'm sure there are other options and I'm sure people will have
>>> better/different ideas so feel free to chime in.
>> 
>> Thanks for taking care of it. I would like to mention a second issue,
>> since
>> you may want to resolve both with a single solution: not inlining
>> conditional __builtin_constant_p(), in which there are two code-paths -
>> one
>> for constants and one for variables.
>> 
>> Consider for example the Linux kernel ilog2 macro, which has a
>> condition
>> based on __builtin_constant_p() (
>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
>> ). The compiler mistakenly considers the “heavy” code-path that is
>> supposed
>> to be evaluated only in compilation time to evaluate the code size.
> 
> But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do). 
> 
> Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course. 

I understand that this is might not be the right way to implement macros
such as ilog2() and test_bit(), but this code is around for some time.

I thought of using __builtin_choose_expr() instead of ternary operator, but
this introduces a different problem, as the variable version is used instead
of the constant one in many cases. From my brief experiments with llvm, it
appears that llvm does not have both of these issues (wrong cost attributed
to inline asm and conditions based on __builtin_constant_p()).

So what alternative do you propose to implement ilog2() like behavior? I was
digging through the gcc code to find a workaround with no success.

Thanks,
Nadav


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 19:06       ` Nadav Amit
@ 2018-10-07 19:52         ` Jeff Law
  2018-10-08  7:46         ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Jeff Law @ 2018-10-07 19:52 UTC (permalink / raw)
  To: Nadav Amit, Richard Biener, Borislav Petkov, gcc, Michael Matz
  Cc: Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov

On 10/7/18 1:06 PM, Nadav Amit wrote:
> at 9:46 AM, Richard Biener <rguenther@suse.de> wrote:
> 
>> On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <namit@vmware.com> wrote:
>>> at 2:18 AM, Borislav Petkov <bp@alien8.de> wrote:
>>>
>>>> Hi people,
>>>>
>>>> this is an attempt to see whether gcc's inline asm heuristic when
>>>> estimating inline asm statements' cost for better inlining can be
>>>> improved.
>>>>
>>>> AFAIU, the problematic arises when one ends up using a lot of inline
>>>> asm statements in the kernel but due to the inline asm cost
>>> estimation
>>>> heuristic which counts lines, I think, for example like in this here
>>>> macro:
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
>>>> the resulting code ends up not inlining the functions themselves
>>> which
>>>> use this macro. I.e., you see a CALL <function> instead of its body
>>>> getting inlined directly.
>>>>
>>>> Even though it should be because the actual instructions are only a
>>>> couple in most cases and all those other directives end up in another
>>>> section anyway.
>>>>
>>>> The issue is explained below in the forwarded mail in a larger detail
>>>> too.
>>>>
>>>> Now, Richard suggested doing something like:
>>>>
>>>> 1) inline asm ("...")
>>>> 2) asm ("..." : : : : <size-expr>)
>>>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
>>>>
>>>> with which user can tell gcc what the size of that inline asm
>>> statement
>>>> is and thus allow for more precise cost estimation and in the end
>>> better
>>>> inlining.
>>>>
>>>> And FWIW 3) looks pretty straight-forward to me because attributes
>>> are
>>>> pretty common anyways.
>>>>
>>>> But I'm sure there are other options and I'm sure people will have
>>>> better/different ideas so feel free to chime in.
>>>
>>> Thanks for taking care of it. I would like to mention a second issue,
>>> since
>>> you may want to resolve both with a single solution: not inlining
>>> conditional __builtin_constant_p(), in which there are two code-paths -
>>> one
>>> for constants and one for variables.
>>>
>>> Consider for example the Linux kernel ilog2 macro, which has a
>>> condition
>>> based on __builtin_constant_p() (
>>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
>>> ). The compiler mistakenly considers the “heavy” code-path that is
>>> supposed
>>> to be evaluated only in compilation time to evaluate the code size.
>>
>> But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do). 
>>
>> Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course. 
> 
> I understand that this is might not be the right way to implement macros
> such as ilog2() and test_bit(), but this code is around for some time.
That doesn't make it right -- and there's been numerous bogus bugs
reported against ilog2 because the authors of ilog2 haven't had a clear
understanding of the semantics of builtin_constant_p.


Jeff

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 15:53     ` Michael Matz
@ 2018-10-08  7:31       ` Segher Boessenkool
  2018-10-08  9:09         ` Richard Biener
       [not found]       ` <20181008061323.GB66819@gmail.com>
  1 sibling, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-08  7:31 UTC (permalink / raw)
  To: Michael Matz
  Cc: Borislav Petkov, gcc, Richard Biener, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

Hi!

On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > this is an attempt to see whether gcc's inline asm heuristic when
> > > estimating inline asm statements' cost for better inlining can be
> > > improved.
> > 
> > GCC already estimates the *size* of inline asm, and this is required
> > *for correctness*.  So any workaround that works against this will only
> > end in tears.
> 
> You're right and wrong.  GCC can't even estimate the size of mildly 
> complicated inline asms right now, so your claim of it being necessary for 
> correctness can't be true in this absolute form.  I know what you try to 
> say, but still, consider inline asms like this:
> 
>      insn1
>   .section bla
>      insn2
>   .previous
> 
> or
>    invoke_asm_macro foo,bar
> 
> in both cases GCCs size estimate will be wrong however you want to count 
> it.  This is actually the motivating example for the kernel guys, the 
> games they play within their inline asms make the estimates be wildly 
> wrong to a point it interacts with the inliner.

Right.  The manual says:

"""
Some targets require that GCC track the size of each instruction used
in order to generate correct code.  Because the final length of the
code produced by an @code{asm} statement is only known by the
assembler, GCC must make an estimate as to how big it will be.  It
does this by counting the number of instructions in the pattern of the
@code{asm} and multiplying that by the length of the longest
instruction supported by that processor.  (When working out the number
of instructions, it assumes that any occurrence of a newline or of
whatever statement separator character is supported by the assembler --
typically @samp{;} --- indicates the end of an instruction.)

Normally, GCC's estimate is adequate to ensure that correct
code is generated, but it is possible to confuse the compiler if you use
pseudo instructions or assembler macros that expand into multiple real
instructions, or if you use assembler directives that expand to more
space in the object file than is needed for a single instruction.
If this happens then the assembler may produce a diagnostic saying that
a label is unreachable.
"""

It *is* necessary for correctness, except you can do things that can
confuse the compiler and then you are on your own anyway.

> > So I guess the real issue is that the inline asm size estimate for x86 
> > isn't very good (since it has to be pessimistic, and x86 insns can be 
> > huge)?
> 
> No, see above, even if we were to improve the size estimates (e.g. based 
> on some average instruction size) the kernel examples would still be off 
> because they switch sections back and forth, use asm macros and computed 
> .fill directives and maybe further stuff.  GCC will never be able to 
> accurately calculate these sizes

What *is* such a size, anyway?  If it can be spread over multiple sections
(some of which support section merging), and you can have huge alignments,
etc.  What is needed here is not knowing the maximum size of the binary
output (however you want to define that), but some way for the compiler
to understand how bad it is to inline some assembler.  Maybe manual
direction, maybe just the current jeuristics can be tweaked a bit, maybe
we need to invent some attribute or two.

> (without an built-in assembler which hopefully noone proposes).

Not me, that's for sure.

> So, there is a case for extending the inline-asm facility to say 
> "size is complicated here, assume this for inline decisions".

Yeah, that's an option.  It may be too complicated though, or just not
useful in its generality, so that everyone will use "1" (or "1 normal
size instruction"), and then we are better off just making something
for _that_ (or making it the default).

> > > Now, Richard suggested doing something like:
> > > 
> > >  1) inline asm ("...")
> > 
> > What would the semantics of this be?
> 
> The size of the inline asm wouldn't be counted towards the inliner size 
> limits (or be counted as "1").

That sounds like a good option.

> > I don't like 2) either.  But 1) looks interesting, depends what its
> > semantics would be?  "Don't count this insn's size for inlining decisions",
> > maybe?
> 
> TBH, I like the inline asm (...) suggestion most currently, but what if we 
> want to add more attributes to asms?  We could add further special 
> keywords to the clobber list:
>   asm ("...." : : : "cc,memory,inline");
> sure, it might seem strange to "clobber" inline, but if we reinterpret the 
> clobber list as arbitrary set of attributes for this asm, it'd be fine.

All of a targets register names and alternative register names are
allowed in the clobber list.  Will that never conflict with an attribute
name?  We already *have* syntax for specifying attributes on an asm (on
*any* statement even), so mixing these two things has no advantage.

Both "cc" and "memory" have their own problems of course, adding more
things to this just feels bad.  It may not be so bad ;-)

> > Another option is to just force inlining for those few functions where 
> > GCC currently makes an inlining decision you don't like.  Or are there 
> > more than a few?
> 
> I think the examples I saw from Boris were all indirect inlines:
> 
>   static inline void foo() { asm("large-looking-but-small-asm"); }
>   static void bar1() { ... foo() ... }
>   static void bar2() { ... foo() ... }
>   void goo (void) { bar1(); }  // bar1 should have been inlined
> 
> So, while the immediate asm user was marked as always inline that in turn 
> caused users of it to become non-inlined.  I'm assuming the kernel guys 
> did proper measurements that they _really_ get some non-trivial speed 
> benefit by inlining bar1/bar2, but for some reasons (I didn't inquire) 
> didn't want to mark them all as inline as well.

Yeah that makes sense, like if this happens with the fixup stuff, it will
quickly spiral out of control.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-07 19:06       ` Nadav Amit
  2018-10-07 19:52         ` Jeff Law
@ 2018-10-08  7:46         ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Biener @ 2018-10-08  7:46 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Borislav Petkov, gcc, Michael Matz, Ingo Molnar, linux-kernel,
	x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria, Christopher Li,
	Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich, Josh Poimboeuf,
	Juergen Gross, Kate Stewart, Kees Cook, linux-sparse,
	Peter Zijlstra, Philippe Ombredanne, Thomas Gleixner,
	virtualization, Linus Torvalds, Chris Zankel, Max Filippov,
	linux-xtensa

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

On Sun, 7 Oct 2018, Nadav Amit wrote:

> at 9:46 AM, Richard Biener <rguenther@suse.de> wrote:
> 
> > On October 7, 2018 6:09:30 PM GMT+02:00, Nadav Amit <namit@vmware.com> wrote:
> >> at 2:18 AM, Borislav Petkov <bp@alien8.de> wrote:
> >> 
> >>> Hi people,
> >>> 
> >>> this is an attempt to see whether gcc's inline asm heuristic when
> >>> estimating inline asm statements' cost for better inlining can be
> >>> improved.
> >>> 
> >>> AFAIU, the problematic arises when one ends up using a lot of inline
> >>> asm statements in the kernel but due to the inline asm cost
> >> estimation
> >>> heuristic which counts lines, I think, for example like in this here
> >>> macro:
> >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2Farch%2Fx86%2Finclude%2Fasm%2Fcpufeature.h%23n162&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975505381&amp;sdata=Nd0636K9Z1IsUs1RWSRAhVuVboLxlBCB4peiAMfmQzQ%3D&amp;reserved=0
> >>> the resulting code ends up not inlining the functions themselves
> >> which
> >>> use this macro. I.e., you see a CALL <function> instead of its body
> >>> getting inlined directly.
> >>> 
> >>> Even though it should be because the actual instructions are only a
> >>> couple in most cases and all those other directives end up in another
> >>> section anyway.
> >>> 
> >>> The issue is explained below in the forwarded mail in a larger detail
> >>> too.
> >>> 
> >>> Now, Richard suggested doing something like:
> >>> 
> >>> 1) inline asm ("...")
> >>> 2) asm ("..." : : : : <size-expr>)
> >>> 3) asm ("...") __attribute__((asm_size(<size-expr>)));
> >>> 
> >>> with which user can tell gcc what the size of that inline asm
> >> statement
> >>> is and thus allow for more precise cost estimation and in the end
> >> better
> >>> inlining.
> >>> 
> >>> And FWIW 3) looks pretty straight-forward to me because attributes
> >> are
> >>> pretty common anyways.
> >>> 
> >>> But I'm sure there are other options and I'm sure people will have
> >>> better/different ideas so feel free to chime in.
> >> 
> >> Thanks for taking care of it. I would like to mention a second issue,
> >> since
> >> you may want to resolve both with a single solution: not inlining
> >> conditional __builtin_constant_p(), in which there are two code-paths -
> >> one
> >> for constants and one for variables.
> >> 
> >> Consider for example the Linux kernel ilog2 macro, which has a
> >> condition
> >> based on __builtin_constant_p() (
> >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv4.19-rc7%2Fsource%2Finclude%2Flinux%2Flog2.h%23L160&amp;data=02%7C01%7Cnamit%40vmware.com%7C860403cecb874db64b7e08d62c746f46%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636745275975515386&amp;sdata=Hk39Za9%2FxcFyK0sGENB24d6QySjsDGzF%2FwqjnUEMiGk%3D&amp;reserved=0
> >> ). The compiler mistakenly considers the “heavy” code-path that is
> >> supposed
> >> to be evaluated only in compilation time to evaluate the code size.
> > 
> > But this is a misconception about __builtin_constant_p. It doesn't guard sth like 'constexpr' regions. If you try to use it with those semantics you'll fail (appearantly you do). 
> > 
> > Of course IPA CP code size estimates when seeing a constant fed to bcp might be not optimal, that's another issue of course. 
> 
> I understand that this is might not be the right way to implement macros
> such as ilog2() and test_bit(), but this code is around for some time.
> 
> I thought of using __builtin_choose_expr() instead of ternary operator, but
> this introduces a different problem, as the variable version is used instead
> of the constant one in many cases. From my brief experiments with llvm, it
> appears that llvm does not have both of these issues (wrong cost attributed
> to inline asm and conditions based on __builtin_constant_p()).
> 
> So what alternative do you propose to implement ilog2() like behavior? I was
> digging through the gcc code to find a workaround with no success.

1) Don't try to cheat the compilers constant propagation abilities
2) Use a language that allows this (C++)
3) Define (and implement) the corresponding GNU C extension

__builtin_constant_p() isn't the right fit (I wonder what it was
implemented for in the first place though...).

I suppose you want sth like

 if (__builtin_constant_p (x))
   return __constexpr ...;

or use a call and have constexpr functions.  Note it wouldn't be
C++-constexpr like since you want the constexpr evaluation to
happen very late in the compilation to benefit from optimizations
and you are fine with the non-constexpr path.

Properly defining a language extension is hard.

Richard.

> 
> Thanks,
> Nadav
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]         ` <20181008055838.GA66819@gmail.com>
@ 2018-10-08  7:53           ` Segher Boessenkool
  0 siblings, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-08  7:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Borislav Petkov, gcc, Richard Biener, Michael Matz, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Mon, Oct 08, 2018 at 07:58:38AM +0200, Ingo Molnar wrote:
> * Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > > > More precise *size* estimates, yes.  And if the user lies he should not
> > > > be surprised to get assembler errors, etc.
> > > 
> > > Yes.
> > > 
> > > Another option would be if gcc parses the inline asm directly and
> > > does a more precise size estimation. Which is a lot more involved and
> > > complicated solution so I guess we wanna look at the simpler ones first.
> > > 
> > > :-)
> > 
> > Which is *impossible* to do.  Inline assembler is free-form text.
> 
> "Impossible" is false: only under GCC's model and semantics of inline
> asm that is, and only under the (false) assumption that the semantics
> of the asm statement (which is a GCC extension to begin with) cannot
> be changed like it has been changed multiple times in the past.
> 
> "Difficult", "not worth our while", perhaps.

If we throw out our current definition of inline assembler, and of the
internal backend interfaces, then sure you can do it.  This of course
invalidates all code that uses GCC inline assembler, and all GCC backends
(both in-tree and out-of-tree, both current and historical).

If other compilers think everyone should rewrite all of their code because
those compiler do inline asm wro^H^H^Hdifferently, that is their problem;
GCC should not deny all history and screw over all its users.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]       ` <20181008061323.GB66819@gmail.com>
@ 2018-10-08  8:19         ` Segher Boessenkool
  0 siblings, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-08  8:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Michael Matz, Borislav Petkov, gcc, Richard Biener, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Mon, Oct 08, 2018 at 08:13:23AM +0200, Ingo Molnar wrote:
> * Michael Matz <matz@suse.de> wrote:
> > (without an built-in assembler which hopefully noone proposes).
> There are disadvantages (the main one is having to implement it), but a built-in assembler has 
> numerous advantages as well:
> 
>  - Better optimizations: for example -Os could more accurately estimate true instruction size.

GCC already knows the exact instruction size in almost all cases, for most
backends.  It is an advantage to not *have to* keep track of exact insn
size in all cases.

>  - Better inlining: as the examples in this thread are showing.

That's a red herring, the actual issue is inlining makes some spectacularly
wrong decisions for code involving asm currently.  That would not be solved
by outputting binary code instead of assembler code.  All those decisions
are done long before code is output.

>  - Better padding/alignment: right now GCC has no notion about the precise cache layout of the 
>    assembly code it generates and the code alignment options it has are crude.

This isn't true.  Maybe some targets do not care.

And of course GCC only knows this as far as it knows the alignments of the
sections it outputs into; for example, ASLR is a nice performance killer
at times.  And if your linker scripts align sections to less than a cache
line things do not look rosy either, etc.

> It got away with 
>    this so far because the x86 rule of thumb is that dense code is usually the right choice.

>  - Better compiler performance: it would be faster as well to immediately emit assembly
>    instructions, just like GCC's preprocessor library use speeds up compilation *significantly*
>    instead of creating a separate preprocessor task.

So how much faster do you think it would be?  Do you have actual numbers?
And no, -O0 does not count.

>  - Better future integration of assembly blocks: GCC could begin to actually understand the 
>    assembly statements in inline asm and allow more user-friendly extensions to its 
>    historically complex and difficult to master inline asm syntax.

If you want to add a different kind of inline assembler, you can already.
There is no need for outputting binary code for that.

> I mean, it's a fact that the GNU project has *already* defined their own assembly syntax which 
> departs from decades old platform assembly syntax

GCC's asm syntax is over three decades old itself.  When it was defined
all other C inline assembler syntaxes were much younger than that.  I don't
see what your argument is here.

> - and how the assembler is called by the 
> compiler is basically an implementation detail, not a conceptual choice.

But the *format* of the interface representation, in this case textual
assembler code, is quite fundamental.

> The random 
> multi-process unidirectional assembler choice of the past should not be treated as orthodoxy.

Then I have good news for you: no assembler works that way these days, and
that has been true for decades.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-08  7:31       ` Segher Boessenkool
@ 2018-10-08  9:09         ` Richard Biener
  2018-10-08 10:03           ` Segher Boessenkool
  2018-10-09 15:54           ` Segher Boessenkool
  0 siblings, 2 replies; 40+ messages in thread
From: Richard Biener @ 2018-10-08  9:09 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Matz, Borislav Petkov, gcc, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Mon, 8 Oct 2018, Segher Boessenkool wrote:

> Hi!
> 
> On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > this is an attempt to see whether gcc's inline asm heuristic when
> > > > estimating inline asm statements' cost for better inlining can be
> > > > improved.
> > > 
> > > GCC already estimates the *size* of inline asm, and this is required
> > > *for correctness*.  So any workaround that works against this will only
> > > end in tears.
> > 
> > You're right and wrong.  GCC can't even estimate the size of mildly 
> > complicated inline asms right now, so your claim of it being necessary for 
> > correctness can't be true in this absolute form.  I know what you try to 
> > say, but still, consider inline asms like this:
> > 
> >      insn1
> >   .section bla
> >      insn2
> >   .previous
> > 
> > or
> >    invoke_asm_macro foo,bar
> > 
> > in both cases GCCs size estimate will be wrong however you want to count 
> > it.  This is actually the motivating example for the kernel guys, the 
> > games they play within their inline asms make the estimates be wildly 
> > wrong to a point it interacts with the inliner.
> 
> Right.  The manual says:
> 
> """
> Some targets require that GCC track the size of each instruction used
> in order to generate correct code.  Because the final length of the
> code produced by an @code{asm} statement is only known by the
> assembler, GCC must make an estimate as to how big it will be.  It
> does this by counting the number of instructions in the pattern of the
> @code{asm} and multiplying that by the length of the longest
> instruction supported by that processor.  (When working out the number
> of instructions, it assumes that any occurrence of a newline or of
> whatever statement separator character is supported by the assembler --
> typically @samp{;} --- indicates the end of an instruction.)
> 
> Normally, GCC's estimate is adequate to ensure that correct
> code is generated, but it is possible to confuse the compiler if you use
> pseudo instructions or assembler macros that expand into multiple real
> instructions, or if you use assembler directives that expand to more
> space in the object file than is needed for a single instruction.
> If this happens then the assembler may produce a diagnostic saying that
> a label is unreachable.
> """
> 
> It *is* necessary for correctness, except you can do things that can
> confuse the compiler and then you are on your own anyway.
> 
> > > So I guess the real issue is that the inline asm size estimate for x86 
> > > isn't very good (since it has to be pessimistic, and x86 insns can be 
> > > huge)?
> > 
> > No, see above, even if we were to improve the size estimates (e.g. based 
> > on some average instruction size) the kernel examples would still be off 
> > because they switch sections back and forth, use asm macros and computed 
> > .fill directives and maybe further stuff.  GCC will never be able to 
> > accurately calculate these sizes
> 
> What *is* such a size, anyway?  If it can be spread over multiple sections
> (some of which support section merging), and you can have huge alignments,
> etc.  What is needed here is not knowing the maximum size of the binary
> output (however you want to define that), but some way for the compiler
> to understand how bad it is to inline some assembler.  Maybe manual
> direction, maybe just the current jeuristics can be tweaked a bit, maybe
> we need to invent some attribute or two.
> 
> > (without an built-in assembler which hopefully noone proposes).
> 
> Not me, that's for sure.
> 
> > So, there is a case for extending the inline-asm facility to say 
> > "size is complicated here, assume this for inline decisions".
> 
> Yeah, that's an option.  It may be too complicated though, or just not
> useful in its generality, so that everyone will use "1" (or "1 normal
> size instruction"), and then we are better off just making something
> for _that_ (or making it the default).
> 
> > > > Now, Richard suggested doing something like:
> > > > 
> > > >  1) inline asm ("...")
> > > 
> > > What would the semantics of this be?
> > 
> > The size of the inline asm wouldn't be counted towards the inliner size 
> > limits (or be counted as "1").
> 
> That sounds like a good option.

Yes, I also like it for simplicity.  It also avoids the requirement
of translating the number (in bytes?) given by the user to
"number of GIMPLE instructions" as needed by the inliner.

> > > I don't like 2) either.  But 1) looks interesting, depends what its
> > > semantics would be?  "Don't count this insn's size for inlining decisions",
> > > maybe?
> > 
> > TBH, I like the inline asm (...) suggestion most currently, but what if we 
> > want to add more attributes to asms?  We could add further special 
> > keywords to the clobber list:
> >   asm ("...." : : : "cc,memory,inline");
> > sure, it might seem strange to "clobber" inline, but if we reinterpret the 
> > clobber list as arbitrary set of attributes for this asm, it'd be fine.
> 
> All of a targets register names and alternative register names are
> allowed in the clobber list.  Will that never conflict with an attribute
> name?  We already *have* syntax for specifying attributes on an asm (on
> *any* statement even), so mixing these two things has no advantage.

Heh, but I failed to make an example with attribute synatx working.
IIRC attributes do not work on stmts.  What could work is to use
a #pragma though.

Richard.

> Both "cc" and "memory" have their own problems of course, adding more
> things to this just feels bad.  It may not be so bad ;-)
> 
> > > Another option is to just force inlining for those few functions where 
> > > GCC currently makes an inlining decision you don't like.  Or are there 
> > > more than a few?
> > 
> > I think the examples I saw from Boris were all indirect inlines:
> > 
> >   static inline void foo() { asm("large-looking-but-small-asm"); }
> >   static void bar1() { ... foo() ... }
> >   static void bar2() { ... foo() ... }
> >   void goo (void) { bar1(); }  // bar1 should have been inlined
> > 
> > So, while the immediate asm user was marked as always inline that in turn 
> > caused users of it to become non-inlined.  I'm assuming the kernel guys 
> > did proper measurements that they _really_ get some non-trivial speed 
> > benefit by inlining bar1/bar2, but for some reasons (I didn't inquire) 
> > didn't want to mark them all as inline as well.
> 
> Yeah that makes sense, like if this happens with the fixup stuff, it will
> quickly spiral out of control.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-08  9:09         ` Richard Biener
@ 2018-10-08 10:03           ` Segher Boessenkool
  2018-10-09 15:54           ` Segher Boessenkool
  1 sibling, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-08 10:03 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Borislav Petkov, gcc, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > All of a targets register names and alternative register names are
> > allowed in the clobber list.  Will that never conflict with an attribute
> > name?  We already *have* syntax for specifying attributes on an asm (on
> > *any* statement even), so mixing these two things has no advantage.
> 
> Heh, but I failed to make an example with attribute synatx working.
> IIRC attributes do not work on stmts.  What could work is to use
> a #pragma though.

Apparently statement attributes currently(?) only work for null statements.
Oh well.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-08  9:09         ` Richard Biener
  2018-10-08 10:03           ` Segher Boessenkool
@ 2018-10-09 15:54           ` Segher Boessenkool
  2018-10-10  7:12             ` Richard Biener
                               ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-09 15:54 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Borislav Petkov, gcc, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > Now, Richard suggested doing something like:
> > > > > 
> > > > >  1) inline asm ("...")
> > > > 
> > > > What would the semantics of this be?
> > > 
> > > The size of the inline asm wouldn't be counted towards the inliner size 
> > > limits (or be counted as "1").
> > 
> > That sounds like a good option.
> 
> Yes, I also like it for simplicity.  It also avoids the requirement
> of translating the number (in bytes?) given by the user to
> "number of GIMPLE instructions" as needed by the inliner.

This patch implements this, for C only so far.  And the syntax is
"asm inline", which is more in line with other syntax.

How does this look?


Segher



diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1f173fc..94b1d41 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -6287,7 +6287,7 @@ static tree
 c_parser_asm_statement (c_parser *parser)
 {
   tree quals, str, outputs, inputs, clobbers, labels, ret;
-  bool simple, is_goto;
+  bool simple, is_goto, is_inline;
   location_t asm_loc = c_parser_peek_token (parser)->location;
   int section, nsections;
 
@@ -6318,6 +6318,13 @@ c_parser_asm_statement (c_parser *parser)
       is_goto = true;
     }
 
+  is_inline = false;
+  if (c_parser_next_token_is_keyword (parser, RID_INLINE))
+    {
+      c_parser_consume_token (parser);
+      is_inline = true;
+    }
+
   /* ??? Follow the C++ parser rather than using the
      lex_untranslated_string kludge.  */
   parser->lex_untranslated_string = true;
@@ -6393,7 +6400,8 @@ c_parser_asm_statement (c_parser *parser)
     c_parser_skip_to_end_of_block_or_statement (parser);
 
   ret = build_asm_stmt (quals, build_asm_expr (asm_loc, str, outputs, inputs,
-					       clobbers, labels, simple));
+					       clobbers, labels, simple,
+					       is_inline));
 
  error:
   parser->lex_untranslated_string = false;
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 017c01c..f5629300 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -677,7 +677,8 @@ extern tree build_compound_literal (location_t, tree, tree, bool,
 extern void check_compound_literal_type (location_t, struct c_type_name *);
 extern tree c_start_case (location_t, location_t, tree, bool);
 extern void c_finish_case (tree, tree);
-extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool);
+extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool,
+			    bool);
 extern tree build_asm_stmt (tree, tree);
 extern int c_types_compatible_p (tree, tree);
 extern tree c_begin_compound_stmt (bool);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9d09b8d..e013100 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -10064,7 +10064,7 @@ build_asm_stmt (tree cv_qualifier, tree args)
    are subtly different.  We use a ASM_EXPR node to represent this.  */
 tree
 build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
-		tree clobbers, tree labels, bool simple)
+		tree clobbers, tree labels, bool simple, bool is_inline)
 {
   tree tail;
   tree args;
@@ -10182,6 +10182,7 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
      as volatile.  */
   ASM_INPUT_P (args) = simple;
   ASM_VOLATILE_P (args) = (noutputs == 0);
+  ASM_INLINE_P (args) = is_inline;
 
   return args;
 }
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index 83e2273..1a00fa3 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -2019,6 +2019,8 @@ dump_gimple_asm (pretty_printer *buffer, gasm *gs, int spc, dump_flags_t flags)
       pp_string (buffer, "__asm__");
       if (gimple_asm_volatile_p (gs))
 	pp_string (buffer, " __volatile__");
+      if (gimple_asm_inline_p (gs))
+	pp_string (buffer, " __inline__");
       if (gimple_asm_nlabels (gs))
 	pp_string (buffer, " goto");
       pp_string (buffer, "(\"");
diff --git a/gcc/gimple.h b/gcc/gimple.h
index a5dda93..8a58e07 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -137,6 +137,7 @@ enum gimple_rhs_class
 enum gf_mask {
     GF_ASM_INPUT		= 1 << 0,
     GF_ASM_VOLATILE		= 1 << 1,
+    GF_ASM_INLINE		= 1 << 2,
     GF_CALL_FROM_THUNK		= 1 << 0,
     GF_CALL_RETURN_SLOT_OPT	= 1 << 1,
     GF_CALL_TAILCALL		= 1 << 2,
@@ -3911,7 +3912,7 @@ gimple_asm_volatile_p (const gasm *asm_stmt)
 }
 
 
-/* If VOLATLE_P is true, mark asm statement ASM_STMT as volatile.  */
+/* If VOLATILE_P is true, mark asm statement ASM_STMT as volatile.  */
 
 static inline void
 gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
@@ -3923,6 +3924,27 @@ gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
 }
 
 
+/* Return true ASM_STMT ASM_STMT is an asm statement marked inline.  */
+
+static inline bool
+gimple_asm_inline_p (const gasm *asm_stmt)
+{
+  return (asm_stmt->subcode & GF_ASM_INLINE) != 0;
+}
+
+
+/* If INLINE_P is true, mark asm statement ASM_STMT as inline.  */
+
+static inline void
+gimple_asm_set_inline (gasm *asm_stmt, bool inline_p)
+{
+  if (inline_p)
+    asm_stmt->subcode |= GF_ASM_INLINE;
+  else
+    asm_stmt->subcode &= ~GF_ASM_INLINE;
+}
+
+
 /* If INPUT_P is true, mark asm ASM_STMT as an ASM_INPUT.  */
 
 static inline void
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 509fc2f..10b80f2 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6315,6 +6315,7 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
 
       gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
       gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
+      gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
 
       gimplify_seq_add_stmt (pre_p, stmt);
     }
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index ba39ea3..5361139 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -993,6 +993,9 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
   if (gimple_asm_input_p (g1) != gimple_asm_input_p (g2))
     return false;
 
+  if (gimple_asm_inline_p (g1) != gimple_asm_inline_p (g2))
+    return false;
+
   if (gimple_asm_ninputs (g1) != gimple_asm_ninputs (g2))
     return false;
 
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 9134253..6b1d2ea 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -4108,6 +4108,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
 	   with very long asm statements.  */
 	if (count > 1000)
 	  count = 1000;
+	if (gimple_asm_inline_p (as_a <gasm *> (stmt)))
+	  count = !!count;
 	return MAX (1, count);
       }
 
diff --git a/gcc/tree.h b/gcc/tree.h
index 2e45f9d..160b3a7 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1245,6 +1245,9 @@ extern tree maybe_wrap_with_location (tree, location_t);
    ASM_OPERAND with no operands.  */
 #define ASM_INPUT_P(NODE) (ASM_EXPR_CHECK (NODE)->base.static_flag)
 #define ASM_VOLATILE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.public_flag)
+/* Nonzero if we want to consider this asm as minimum length and cost
+   for inlining decisions.  */
+#define ASM_INLINE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.protected_flag)
 
 /* COND_EXPR accessors.  */
 #define COND_EXPR_COND(NODE)	(TREE_OPERAND (COND_EXPR_CHECK (NODE), 0))

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-09 15:54           ` Segher Boessenkool
@ 2018-10-10  7:12             ` Richard Biener
  2018-10-10  7:54               ` Segher Boessenkool
       [not found]               ` <20181010072240.GB103159@gmail.com>
  2018-10-10 16:31             ` Nadav Amit
       [not found]             ` <CAK7LNARmrdtyE7JRAdJH_zbfvD_cej+TGLh+5KfT20o538KUcA@mail.gmail.com>
  2 siblings, 2 replies; 40+ messages in thread
From: Richard Biener @ 2018-10-10  7:12 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Matz, Borislav Petkov, gcc, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Tue, 9 Oct 2018, Segher Boessenkool wrote:

> On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > > Now, Richard suggested doing something like:
> > > > > > 
> > > > > >  1) inline asm ("...")
> > > > > 
> > > > > What would the semantics of this be?
> > > > 
> > > > The size of the inline asm wouldn't be counted towards the inliner size 
> > > > limits (or be counted as "1").
> > > 
> > > That sounds like a good option.
> > 
> > Yes, I also like it for simplicity.  It also avoids the requirement
> > of translating the number (in bytes?) given by the user to
> > "number of GIMPLE instructions" as needed by the inliner.
> 
> This patch implements this, for C only so far.  And the syntax is
> "asm inline", which is more in line with other syntax.
> 
> How does this look?

Looks good.  A few nits - you need to document this in extend.texi, the
tree flag use needs documenting in tree-core.h, and we need a testcase
(I'd suggest one that shows we inline a function with "large" asm inline 
() even at -Os).

Oh, and I don't think we want C and C++ to diverge - so you need to
cook up C++ support as well.

Can kernel folks give this a second and third thought please so we
don't implement sth that in the end won't satisfy you guys?

Thanks for doing this,
Richard.

> 
> Segher
> 
> 
> 
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 1f173fc..94b1d41 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -6287,7 +6287,7 @@ static tree
>  c_parser_asm_statement (c_parser *parser)
>  {
>    tree quals, str, outputs, inputs, clobbers, labels, ret;
> -  bool simple, is_goto;
> +  bool simple, is_goto, is_inline;
>    location_t asm_loc = c_parser_peek_token (parser)->location;
>    int section, nsections;
>  
> @@ -6318,6 +6318,13 @@ c_parser_asm_statement (c_parser *parser)
>        is_goto = true;
>      }
>  
> +  is_inline = false;
> +  if (c_parser_next_token_is_keyword (parser, RID_INLINE))
> +    {
> +      c_parser_consume_token (parser);
> +      is_inline = true;
> +    }
> +
>    /* ??? Follow the C++ parser rather than using the
>       lex_untranslated_string kludge.  */
>    parser->lex_untranslated_string = true;
> @@ -6393,7 +6400,8 @@ c_parser_asm_statement (c_parser *parser)
>      c_parser_skip_to_end_of_block_or_statement (parser);
>  
>    ret = build_asm_stmt (quals, build_asm_expr (asm_loc, str, outputs, inputs,
> -					       clobbers, labels, simple));
> +					       clobbers, labels, simple,
> +					       is_inline));
>  
>   error:
>    parser->lex_untranslated_string = false;
> diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
> index 017c01c..f5629300 100644
> --- a/gcc/c/c-tree.h
> +++ b/gcc/c/c-tree.h
> @@ -677,7 +677,8 @@ extern tree build_compound_literal (location_t, tree, tree, bool,
>  extern void check_compound_literal_type (location_t, struct c_type_name *);
>  extern tree c_start_case (location_t, location_t, tree, bool);
>  extern void c_finish_case (tree, tree);
> -extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool);
> +extern tree build_asm_expr (location_t, tree, tree, tree, tree, tree, bool,
> +			    bool);
>  extern tree build_asm_stmt (tree, tree);
>  extern int c_types_compatible_p (tree, tree);
>  extern tree c_begin_compound_stmt (bool);
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index 9d09b8d..e013100 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -10064,7 +10064,7 @@ build_asm_stmt (tree cv_qualifier, tree args)
>     are subtly different.  We use a ASM_EXPR node to represent this.  */
>  tree
>  build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
> -		tree clobbers, tree labels, bool simple)
> +		tree clobbers, tree labels, bool simple, bool is_inline)
>  {
>    tree tail;
>    tree args;
> @@ -10182,6 +10182,7 @@ build_asm_expr (location_t loc, tree string, tree outputs, tree inputs,
>       as volatile.  */
>    ASM_INPUT_P (args) = simple;
>    ASM_VOLATILE_P (args) = (noutputs == 0);
> +  ASM_INLINE_P (args) = is_inline;
>  
>    return args;
>  }
> diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
> index 83e2273..1a00fa3 100644
> --- a/gcc/gimple-pretty-print.c
> +++ b/gcc/gimple-pretty-print.c
> @@ -2019,6 +2019,8 @@ dump_gimple_asm (pretty_printer *buffer, gasm *gs, int spc, dump_flags_t flags)
>        pp_string (buffer, "__asm__");
>        if (gimple_asm_volatile_p (gs))
>  	pp_string (buffer, " __volatile__");
> +      if (gimple_asm_inline_p (gs))
> +	pp_string (buffer, " __inline__");
>        if (gimple_asm_nlabels (gs))
>  	pp_string (buffer, " goto");
>        pp_string (buffer, "(\"");
> diff --git a/gcc/gimple.h b/gcc/gimple.h
> index a5dda93..8a58e07 100644
> --- a/gcc/gimple.h
> +++ b/gcc/gimple.h
> @@ -137,6 +137,7 @@ enum gimple_rhs_class
>  enum gf_mask {
>      GF_ASM_INPUT		= 1 << 0,
>      GF_ASM_VOLATILE		= 1 << 1,
> +    GF_ASM_INLINE		= 1 << 2,
>      GF_CALL_FROM_THUNK		= 1 << 0,
>      GF_CALL_RETURN_SLOT_OPT	= 1 << 1,
>      GF_CALL_TAILCALL		= 1 << 2,
> @@ -3911,7 +3912,7 @@ gimple_asm_volatile_p (const gasm *asm_stmt)
>  }
>  
>  
> -/* If VOLATLE_P is true, mark asm statement ASM_STMT as volatile.  */
> +/* If VOLATILE_P is true, mark asm statement ASM_STMT as volatile.  */
>  
>  static inline void
>  gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
> @@ -3923,6 +3924,27 @@ gimple_asm_set_volatile (gasm *asm_stmt, bool volatile_p)
>  }
>  
>  
> +/* Return true ASM_STMT ASM_STMT is an asm statement marked inline.  */
> +
> +static inline bool
> +gimple_asm_inline_p (const gasm *asm_stmt)
> +{
> +  return (asm_stmt->subcode & GF_ASM_INLINE) != 0;
> +}
> +
> +
> +/* If INLINE_P is true, mark asm statement ASM_STMT as inline.  */
> +
> +static inline void
> +gimple_asm_set_inline (gasm *asm_stmt, bool inline_p)
> +{
> +  if (inline_p)
> +    asm_stmt->subcode |= GF_ASM_INLINE;
> +  else
> +    asm_stmt->subcode &= ~GF_ASM_INLINE;
> +}
> +
> +
>  /* If INPUT_P is true, mark asm ASM_STMT as an ASM_INPUT.  */
>  
>  static inline void
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 509fc2f..10b80f2 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -6315,6 +6315,7 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
>  
>        gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
>        gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
> +      gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
>  
>        gimplify_seq_add_stmt (pre_p, stmt);
>      }
> diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
> index ba39ea3..5361139 100644
> --- a/gcc/ipa-icf-gimple.c
> +++ b/gcc/ipa-icf-gimple.c
> @@ -993,6 +993,9 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2)
>    if (gimple_asm_input_p (g1) != gimple_asm_input_p (g2))
>      return false;
>  
> +  if (gimple_asm_inline_p (g1) != gimple_asm_inline_p (g2))
> +    return false;
> +
>    if (gimple_asm_ninputs (g1) != gimple_asm_ninputs (g2))
>      return false;
>  
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 9134253..6b1d2ea 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -4108,6 +4108,8 @@ estimate_num_insns (gimple *stmt, eni_weights *weights)
>  	   with very long asm statements.  */
>  	if (count > 1000)
>  	  count = 1000;
> +	if (gimple_asm_inline_p (as_a <gasm *> (stmt)))
> +	  count = !!count;
>  	return MAX (1, count);
>        }
>  
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 2e45f9d..160b3a7 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -1245,6 +1245,9 @@ extern tree maybe_wrap_with_location (tree, location_t);
>     ASM_OPERAND with no operands.  */
>  #define ASM_INPUT_P(NODE) (ASM_EXPR_CHECK (NODE)->base.static_flag)
>  #define ASM_VOLATILE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.public_flag)
> +/* Nonzero if we want to consider this asm as minimum length and cost
> +   for inlining decisions.  */
> +#define ASM_INLINE_P(NODE) (ASM_EXPR_CHECK (NODE)->base.protected_flag)
>  
>  /* COND_EXPR accessors.  */
>  #define COND_EXPR_COND(NODE)	(TREE_OPERAND (COND_EXPR_CHECK (NODE), 0))
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10  7:12             ` Richard Biener
@ 2018-10-10  7:54               ` Segher Boessenkool
       [not found]               ` <20181010072240.GB103159@gmail.com>
  1 sibling, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-10  7:54 UTC (permalink / raw)
  To: Richard Biener
  Cc: Michael Matz, Borislav Petkov, gcc, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

On Wed, Oct 10, 2018 at 09:12:48AM +0200, Richard Biener wrote:
> On Tue, 9 Oct 2018, Segher Boessenkool wrote:
> > On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > > On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > > > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > > > Now, Richard suggested doing something like:
> > > > > > > 
> > > > > > >  1) inline asm ("...")
> > > > > > 
> > > > > > What would the semantics of this be?
> > > > > 
> > > > > The size of the inline asm wouldn't be counted towards the inliner size 
> > > > > limits (or be counted as "1").
> > > > 
> > > > That sounds like a good option.
> > > 
> > > Yes, I also like it for simplicity.  It also avoids the requirement
> > > of translating the number (in bytes?) given by the user to
> > > "number of GIMPLE instructions" as needed by the inliner.
> > 
> > This patch implements this, for C only so far.  And the syntax is
> > "asm inline", which is more in line with other syntax.
> > 
> > How does this look?
> 
> Looks good.  A few nits - you need to document this in extend.texi, the

Yup.

> tree flag use needs documenting in tree-core.h,

Ah yes.

> and we need a testcase
> (I'd suggest one that shows we inline a function with "large" asm inline 
> () even at -Os).

I have one.  Oh, and I probably should do a comment at the one line of
code that isn't just bookkeeping ;-)

> Oh, and I don't think we want C and C++ to diverge - so you need to
> cook up C++ support as well.

Right, that's why I said "C only so far".

> Can kernel folks give this a second and third thought please so we
> don't implement sth that in the end won't satisfy you guys?

Or actually try it out and see if it has the desired effect!  Nothing
beats field trials.

I'll do the C++ thing today hopefully, and send things to gcc-patches@.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]               ` <20181010072240.GB103159@gmail.com>
@ 2018-10-10  8:03                 ` Segher Boessenkool
  2018-10-10  8:19                   ` Borislav Petkov
  2018-10-10 10:29                   ` Richard Biener
  0 siblings, 2 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-10  8:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Richard Biener, Michael Matz, Borislav Petkov, gcc, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, Oct 10, 2018 at 09:22:40AM +0200, Ingo Molnar wrote:
> * Richard Biener <rguenther@suse.de> wrote:
> > Can kernel folks give this a second and third thought please so we
> > don't implement sth that in the end won't satisfy you guys?
> 
> So this basically passes '0 size' to the inliner, which should be better
> than passing in the explicit size, as we'd inevitably get it wrong
> in cases.

The code immediately after this makes it size 1, even for things like
asm(""), I suppose this works better for the inliner.  But that's a detail
(and it might change); the description says "consider this asm as minimum
length and cost for inlining decisions", which works for either 0 or 1.

> I also like 'size 0' for the reason that we tend to write assembly code
> and mark it 'inline' if we really think it matters to performance,
> so making it more likely to be inlined when used within another inline
> function is a plus as well.

You can think of it as meaning "we want this asm inlined always", and then
whether that actually happens depends on if the function around it is
inlined or not.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10  8:03                 ` Segher Boessenkool
@ 2018-10-10  8:19                   ` Borislav Petkov
  2018-10-10  8:36                     ` Richard Biener
  2018-10-10 18:55                     ` Segher Boessenkool
  2018-10-10 10:29                   ` Richard Biener
  1 sibling, 2 replies; 40+ messages in thread
From: Borislav Petkov @ 2018-10-10  8:19 UTC (permalink / raw)
  To: Segher Boessenkool, Ingo Molnar
  Cc: Richard Biener, Michael Matz, gcc, Nadav Amit, Ingo Molnar,
	linux-kernel, x86, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, Oct 10, 2018 at 03:03:25AM -0500, Segher Boessenkool wrote:
> The code immediately after this makes it size 1, even for things like
> asm(""), I suppose this works better for the inliner.  But that's a detail
> (and it might change); the description says "consider this asm as minimum
> length and cost for inlining decisions", which works for either 0 or 1.

Thanks for implementing this, much appreciated. If you need people to
test stuff, lemme know.

> You can think of it as meaning "we want this asm inlined always", and then
> whether that actually happens depends on if the function around it is
> inlined or not.

My only concern is how we would catch the other extremity where the
inline asm grows too big and we end up inlining it everywhere and thus
getting fat. The 0day bot already builds tinyconfigs but we should be
looking at vmlinux size growth too.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10  8:19                   ` Borislav Petkov
@ 2018-10-10  8:36                     ` Richard Biener
  2018-10-10 18:55                     ` Segher Boessenkool
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Biener @ 2018-10-10  8:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Segher Boessenkool, Ingo Molnar, Michael Matz, gcc, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, 10 Oct 2018, Borislav Petkov wrote:

> On Wed, Oct 10, 2018 at 03:03:25AM -0500, Segher Boessenkool wrote:
> > The code immediately after this makes it size 1, even for things like
> > asm(""), I suppose this works better for the inliner.  But that's a detail
> > (and it might change); the description says "consider this asm as minimum
> > length and cost for inlining decisions", which works for either 0 or 1.
> 
> Thanks for implementing this, much appreciated. If you need people to
> test stuff, lemme know.
> 
> > You can think of it as meaning "we want this asm inlined always", and then
> > whether that actually happens depends on if the function around it is
> > inlined or not.
> 
> My only concern is how we would catch the other extremity where the
> inline asm grows too big and we end up inlining it everywhere and thus
> getting fat. The 0day bot already builds tinyconfigs but we should be
> looking at vmlinux size growth too.

Well, it's like always-inline functions, you have to be careful
and _not_ put it everywhere...  It's also somewhat different to
always-inline functions as those lose their special-ness once
inlined (the inlined body is properly accounted for size).

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10  8:03                 ` Segher Boessenkool
  2018-10-10  8:19                   ` Borislav Petkov
@ 2018-10-10 10:29                   ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Biener @ 2018-10-10 10:29 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Ingo Molnar, Michael Matz, Borislav Petkov, gcc, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, 10 Oct 2018, Segher Boessenkool wrote:

> On Wed, Oct 10, 2018 at 09:22:40AM +0200, Ingo Molnar wrote:
> > * Richard Biener <rguenther@suse.de> wrote:
> > > Can kernel folks give this a second and third thought please so we
> > > don't implement sth that in the end won't satisfy you guys?
> > 
> > So this basically passes '0 size' to the inliner, which should be better
> > than passing in the explicit size, as we'd inevitably get it wrong
> > in cases.
> 
> The code immediately after this makes it size 1, even for things like
> asm(""), I suppose this works better for the inliner.  But that's a detail
> (and it might change); the description says "consider this asm as minimum
> length and cost for inlining decisions", which works for either 0 or 1.

It was made 1 as otherwise the inliner happily explodes on

void foo () { asm(""); foo(); }

> > I also like 'size 0' for the reason that we tend to write assembly code
> > and mark it 'inline' if we really think it matters to performance,
> > so making it more likely to be inlined when used within another inline
> > function is a plus as well.
> 
> You can think of it as meaning "we want this asm inlined always", and then
> whether that actually happens depends on if the function around it is
> inlined or not.

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-09 15:54           ` Segher Boessenkool
  2018-10-10  7:12             ` Richard Biener
@ 2018-10-10 16:31             ` Nadav Amit
  2018-10-10 19:31               ` Segher Boessenkool
  2018-10-11  7:05               ` Richard Biener
       [not found]             ` <CAK7LNARmrdtyE7JRAdJH_zbfvD_cej+TGLh+5KfT20o538KUcA@mail.gmail.com>
  2 siblings, 2 replies; 40+ messages in thread
From: Nadav Amit @ 2018-10-10 16:31 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Biener
  Cc: Michael Matz, Borislav Petkov, gcc, Ingo Molnar, LKML, X86 ML,
	Masahiro Yamada, Sam Ravnborg, Alok Kataria, Christopher Li,
	Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich, Josh Poimboeuf,
	Juergen Gross, Kate Stewart, Kees Cook, linux-sparse,
	Peter Zijlstra, Philippe Ombredanne, Thomas Gleixner,
	virtualization, Linus Torvalds, Chris Zankel, Max Filippov,
	linux-xtensa

at 7:53 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
>> On Mon, 8 Oct 2018, Segher Boessenkool wrote:
>>> On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
>>>> On Sun, 7 Oct 2018, Segher Boessenkool wrote:
>>>>> On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
>>>>>> Now, Richard suggested doing something like:
>>>>>> 
>>>>>> 1) inline asm ("...")
>>>>> 
>>>>> What would the semantics of this be?
>>>> 
>>>> The size of the inline asm wouldn't be counted towards the inliner size 
>>>> limits (or be counted as "1").
>>> 
>>> That sounds like a good option.
>> 
>> Yes, I also like it for simplicity.  It also avoids the requirement
>> of translating the number (in bytes?) given by the user to
>> "number of GIMPLE instructions" as needed by the inliner.
> 
> This patch implements this, for C only so far.  And the syntax is
> "asm inline", which is more in line with other syntax.
> 
> How does this look?

It looks good to me in general. I have a couple of reservations, but I
suspect you will not want to address them:

1. It is not backward compatible, requiring a C macro to wrap it, as the 
kernel might be built with different compilers.

2. It is specific to asm. I do not have in mind another use case (excluding
the __builtin_constant_p), but it would be nicer IMHO to have a builtin
saying “ignore the cost of this statement” for the matter of optimizations.

Regards,
Nadav

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10  8:19                   ` Borislav Petkov
  2018-10-10  8:36                     ` Richard Biener
@ 2018-10-10 18:55                     ` Segher Boessenkool
  2018-10-10 19:14                       ` Borislav Petkov
  1 sibling, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-10 18:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, Richard Biener, Michael Matz, gcc, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, Oct 10, 2018 at 10:19:06AM +0200, Borislav Petkov wrote:
> On Wed, Oct 10, 2018 at 03:03:25AM -0500, Segher Boessenkool wrote:
> > The code immediately after this makes it size 1, even for things like
> > asm(""), I suppose this works better for the inliner.  But that's a detail
> > (and it might change); the description says "consider this asm as minimum
> > length and cost for inlining decisions", which works for either 0 or 1.
> 
> Thanks for implementing this, much appreciated. If you need people to
> test stuff, lemme know.

It would be great to hear from kernel people if it works adequately for
what you guys want it for :-)

> > You can think of it as meaning "we want this asm inlined always", and then
> > whether that actually happens depends on if the function around it is
> > inlined or not.
> 
> My only concern is how we would catch the other extremity where the
> inline asm grows too big and we end up inlining it everywhere and thus
> getting fat. The 0day bot already builds tinyconfigs but we should be
> looking at vmlinux size growth too.

But this isn't really different from other always_inline concerns afaics?
So you should be able to catch it the same way, too.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10 18:55                     ` Segher Boessenkool
@ 2018-10-10 19:14                       ` Borislav Petkov
  2018-10-13 19:33                         ` Borislav Petkov
  0 siblings, 1 reply; 40+ messages in thread
From: Borislav Petkov @ 2018-10-10 19:14 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Ingo Molnar, Richard Biener, Michael Matz, gcc, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, Oct 10, 2018 at 01:54:33PM -0500, Segher Boessenkool wrote:
> It would be great to hear from kernel people if it works adequately for
> what you guys want it for :-)

Sure, ping me when you have the final version and I'll try to build gcc
with it and do some size comparisons.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10 16:31             ` Nadav Amit
@ 2018-10-10 19:31               ` Segher Boessenkool
  2018-10-11  7:05               ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-10 19:31 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Richard Biener, Michael Matz, Borislav Petkov, gcc, Ingo Molnar,
	LKML, X86 ML, Masahiro Yamada, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich,
	Josh Poimboeuf, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

Hi Nadav,

On Wed, Oct 10, 2018 at 04:31:41PM +0000, Nadav Amit wrote:
> at 7:53 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > How does this look?
> 
> It looks good to me in general. I have a couple of reservations, but I
> suspect you will not want to address them:
> 
> 1. It is not backward compatible, requiring a C macro to wrap it, as the 
> kernel might be built with different compilers.

How *could* it be backward compatible?  There should be an error or at least
a warning if the compiler does not support this, in general.

For the kernel, the kernel already has plenty of infrastructure to support
this (compiler.h etc.)  For other applications it is quite trivial, too.

> 2. It is specific to asm.

Yes, and that is on purpose.

> I do not have in mind another use case (excluding
> the __builtin_constant_p), but it would be nicer IMHO to have a builtin
> saying “ignore the cost of this statement” for the matter of optimizations.

That is a hundred or a thousand times more work to design and implement
(including testing etc.)  I'm not going to do it, but feel free to try
yourself!


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10 16:31             ` Nadav Amit
  2018-10-10 19:31               ` Segher Boessenkool
@ 2018-10-11  7:05               ` Richard Biener
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Biener @ 2018-10-11  7:05 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Segher Boessenkool, Michael Matz, Borislav Petkov, gcc,
	Ingo Molnar, LKML, X86 ML, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

On Wed, 10 Oct 2018, Nadav Amit wrote:

> at 7:53 AM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> > On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> >> On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> >>> On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> >>>> On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> >>>>> On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> >>>>>> Now, Richard suggested doing something like:
> >>>>>> 
> >>>>>> 1) inline asm ("...")
> >>>>> 
> >>>>> What would the semantics of this be?
> >>>> 
> >>>> The size of the inline asm wouldn't be counted towards the inliner size 
> >>>> limits (or be counted as "1").
> >>> 
> >>> That sounds like a good option.
> >> 
> >> Yes, I also like it for simplicity.  It also avoids the requirement
> >> of translating the number (in bytes?) given by the user to
> >> "number of GIMPLE instructions" as needed by the inliner.
> > 
> > This patch implements this, for C only so far.  And the syntax is
> > "asm inline", which is more in line with other syntax.
> > 
> > How does this look?
> 
> It looks good to me in general. I have a couple of reservations, but I
> suspect you will not want to address them:
> 
> 1. It is not backward compatible, requiring a C macro to wrap it, as the 
> kernel might be built with different compilers.
> 
> 2. It is specific to asm. I do not have in mind another use case (excluding
> the __builtin_constant_p), but it would be nicer IMHO to have a builtin
> saying “ignore the cost of this statement” for the matter of optimizations.

The only easy possibility that comes to my mid is sth like

__attribute__((always_inline, zero_cost)) foo ()
{
  ... your stmts ...
}

and us, upon inlining, marking the inlined stmts properly.  That would
also work for the asm() case and it would be backwards compatible
(well, you'd get a diagnostic for the unknown zero_cost attribute).

There's the slight complication that if you have, say

 _1 = _2 * 3; // zero-cost
 _4 = _1 * 2;

and optimization ends up combining those to

 _4 = _2 * 6;

then is this stmt supposed to be zero-cost or not?  Compare that to

 _1 = _2 * 3;
 _4 = _1 * 2; // zero-cost

So outside of asm() there are new issues that come up with respect
to expected (cost) semantics.

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-10 19:14                       ` Borislav Petkov
@ 2018-10-13 19:33                         ` Borislav Petkov
       [not found]                           ` <alpine.LNX.2.20.13.1810132355180.13914@monopod.intra.ispras.ru>
                                             ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Borislav Petkov @ 2018-10-13 19:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Ingo Molnar, Richard Biener, Michael Matz, gcc, Nadav Amit,
	Ingo Molnar, linux-kernel, x86, Masahiro Yamada, Sam Ravnborg,
	Alok Kataria, Christopher Li, Greg Kroah-Hartman, H. Peter Anvin,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

Ok,

with Segher's help I've been playing with his patch ontop of bleeding
edge gcc 9 and here are my observations. Please double-check me for
booboos so that they can be addressed while there's time.

So here's what I see ontop of 4.19-rc7:

First marked the alternative asm() as inline and undeffed the "inline"
keyword. I need to do that because of the funky games we do with
"inline" redefinitions in include/linux/compiler_types.h.

And Segher hinted at either doing:

asm volatile inline(...

or

asm volatile __inline__(...

but both "inline" variants are defined as macros in that file.

Which means we either need to #undef inline before using it in asm() or
come up with something cleverer.

Anyway, did this:

---
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4cd6a3b71824..7c0639087da7 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -165,11 +165,13 @@ static inline int alternatives_text_reserved(void *start, void *end)
  * For non barrier like inlines please define new variants
  * without volatile and memory clobber.
  */
+
+#undef inline
 #define alternative(oldinstr, newinstr, feature)                       \
-       asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
+       asm volatile inline(ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
 
 #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
-       asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
+       asm volatile inline(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
 
 /*
  * Alternative inline assembly with input.
---

Build failed at link time with:

arch/x86/boot/compressed/cmdline.o: In function `native_save_fl':
cmdline.c:(.text+0x0): multiple definition of `native_save_fl'
arch/x86/boot/compressed/misc.o:misc.c:(.text+0x1e0): first defined here
arch/x86/boot/compressed/cmdline.o: In function `native_restore_fl':
cmdline.c:(.text+0x10): multiple definition of `native_restore_fl'
arch/x86/boot/compressed/misc.o:misc.c:(.text+0x1f0): first defined here
arch/x86/boot/compressed/error.o: In function `native_save_fl':
error.c:(.text+0x0): multiple definition of `native_save_fl'

which I had to fix with this:

---
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 15450a675031..0d772598c37c 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -14,8 +14,7 @@
  */
 
 /* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
-extern inline unsigned long native_save_fl(void);
-extern inline unsigned long native_save_fl(void)
+static inline unsigned long native_save_fl(void)
 {
        unsigned long flags;
 
@@ -33,8 +32,7 @@ ex
---

That "extern inline" declaration looks fishy to me anyway, maybe not really
needed ? Apparently, gcc < 4.9 complains with -Werror=missing-prototypes...

Then the build worked and the results look like this:

   text    data     bss     dec     hex filename
17287384        5040656 2019532 24347572        17383b4 vmlinux-before
17288020        5040656 2015436 24344112        1737630 vmlinux-2nd-version

so some inlining must be happening.

Then I did this:

---
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 9c5606d88f61..a0170344cf08 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -20,7 +20,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
        /* no memory constraint because it doesn't change any memory gcc knows
           about */
        stac();
-       asm volatile(
+       asm volatile inline(
                "       testq  %[size8],%[size8]\n"
                "       jz     4f\n"
                "0:     movq $0,(%[dst])\n"
---

to force inlining of a somewhat bigger asm() statement. And yap, more
got inlined:

   text    data     bss     dec     hex filename
17287384        5040656 2019532 24347572        17383b4 vmlinux-before
17288020        5040656 2015436 24344112        1737630 vmlinux-2nd-version
17288076        5040656 2015436 24344168        1737668 vmlinux-2nd-version__clear_user

so more stuff gets inlined.

Looking at the asm output, it had before:

---
clear_user:
# ./arch/x86/include/asm/current.h:15:  return this_cpu_read_stable(current_task);
#APP
# 15 "./arch/x86/include/asm/current.h" 1
        movq %gs:current_task,%rax      #, pfo_ret__
# 0 "" 2
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
#NO_APP
        movq    2520(%rax), %rdx        # pfo_ret___12->thread.addr_limit.seg, _1
        movq    %rdi, %rax      # to, tmp93
        addq    %rsi, %rax      # n, tmp93
        jc      .L3     #,
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
        cmpq    %rax, %rdx      # tmp93, _1
        jb      .L3     #,
# arch/x86/lib/usercopy_64.c:52:                return __clear_user(to, n);
        jmp     __clear_user    #
---

note the JMP to __clear_user. After marking the asm() in __clear_user() as
inline, clear_user() inlines __clear_user() directly:

---
clear_user:
# ./arch/x86/include/asm/current.h:15:  return this_cpu_read_stable(current_task);
#APP
# 15 "./arch/x86/include/asm/current.h" 1
        movq %gs:current_task,%rax      #, pfo_ret__
# 0 "" 2
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
#NO_APP
        movq    2520(%rax), %rdx        # pfo_ret___12->thread.addr_limit.seg, _1
        movq    %rdi, %rax      # to, tmp95
        addq    %rsi, %rax      # n, tmp95
        jc      .L8     #,
# arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
        cmpq    %rax, %rdx      # tmp95, _1
        jb      .L8     #,
# ./arch/x86/include/asm/smap.h:58:     alternative("", __stringify(__ASM_STAC), X86_FEATURE_SMAP);
...

this last line is the stac() macro which gets inlined due to the
alternative() inlined macro above.

So I guess this all looks like what we wanted.

And if this lands in gcc9, we would need to do a asm_volatile() macro
which is defined differently based on the compiler used.

Thoughts, suggestions, etc are most welcome.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]                           ` <alpine.LNX.2.20.13.1810132355180.13914@monopod.intra.ispras.ru>
@ 2018-10-13 21:31                             ` Borislav Petkov
  0 siblings, 0 replies; 40+ messages in thread
From: Borislav Petkov @ 2018-10-13 21:31 UTC (permalink / raw)
  To: Alexander Monakov
  Cc: Segher Boessenkool, Ingo Molnar, Richard Biener, Michael Matz,
	gcc, Nadav Amit, Ingo Molnar, linux-kernel, x86, Masahiro Yamada,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	H. Peter Anvin, Jan Beulich, Josh Poimboeuf, Juergen Gross,
	Kate Stewart, Kees Cook, linux-sparse, Peter Zijlstra,
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa,
	Andrew Morton

On Sun, Oct 14, 2018 at 12:14:02AM +0300, Alexander Monakov wrote:
> I apologize for coming in late here with an alternative proposal, but would
> you be happy if GCC gave you a way to designate a portion of the asm template
> string that shouldn't be counted as its cost because it doesn't go into the
> .text section? This wouldn't interact with your redefinitions of the inline
> keyword, and you could do something like (assuming we go with %` ... %`
> delimiters)

I don't mind it but I see you guys are still discussing what would be
the better solution here, on the gcc ML. And we, as one user, are a
happy camper as long as it does what it is meant to do. But how the
feature looks like syntactically is something for gcc folk to decide as
they're going to support it for the foreseeable future and I'm very well
aware of how important it is for a supportable feature to be designed
properly.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-10-13 19:33                         ` Borislav Petkov
       [not found]                           ` <alpine.LNX.2.20.13.1810132355180.13914@monopod.intra.ispras.ru>
@ 2018-10-25 10:23                           ` Borislav Petkov
       [not found]                           ` <20181031125526.GA13219@hirez.programming.kicks-ass.net>
  2 siblings, 0 replies; 40+ messages in thread
From: Borislav Petkov @ 2018-10-25 10:23 UTC (permalink / raw)
  To: Segher Boessenkool, Ingo Molnar, Richard Biener, Michael Matz,
	gcc, Nadav Amit, Ingo Molnar, linux-kernel, x86, Masahiro Yamada,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	H. Peter Anvin, Jan Beulich, Josh Poimboeuf, Juergen Gross,
	Kate Stewart, Kees Cook, linux-sparse, Peter Zijlstra,
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa,
	Andrew Morton

Ping.

This is a good point in time, methinks, where kernel folk on CC here
should have a look at this and speak up whether it is useful for us in
this form.

Frankly, I'm a bit unsure on the aspect of us using this and supporting
old compilers which don't have it and new compilers which do. Because
the old compilers should get to see the now upstreamed assembler macros
and the new compilers will simply inline the "asm volatile inline"
constructs. And how ugly that would become...

I guess we can try this with smaller macros first and keep them all
nicely hidden in header files.

On Sat, Oct 13, 2018 at 09:33:35PM +0200, Borislav Petkov wrote:
> Ok,
> 
> with Segher's help I've been playing with his patch ontop of bleeding
> edge gcc 9 and here are my observations. Please double-check me for
> booboos so that they can be addressed while there's time.
> 
> So here's what I see ontop of 4.19-rc7:
> 
> First marked the alternative asm() as inline and undeffed the "inline"
> keyword. I need to do that because of the funky games we do with
> "inline" redefinitions in include/linux/compiler_types.h.
> 
> And Segher hinted at either doing:
> 
> asm volatile inline(...
> 
> or
> 
> asm volatile __inline__(...
> 
> but both "inline" variants are defined as macros in that file.
> 
> Which means we either need to #undef inline before using it in asm() or
> come up with something cleverer.
> 
> Anyway, did this:
> 
> ---
> diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
> index 4cd6a3b71824..7c0639087da7 100644
> --- a/arch/x86/include/asm/alternative.h
> +++ b/arch/x86/include/asm/alternative.h
> @@ -165,11 +165,13 @@ static inline int alternatives_text_reserved(void *start, void *end)
>   * For non barrier like inlines please define new variants
>   * without volatile and memory clobber.
>   */
> +
> +#undef inline
>  #define alternative(oldinstr, newinstr, feature)                       \
> -       asm volatile (ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
> +       asm volatile inline(ALTERNATIVE(oldinstr, newinstr, feature) : : : "memory")
>  
>  #define alternative_2(oldinstr, newinstr1, feature1, newinstr2, feature2) \
> -       asm volatile(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
> +       asm volatile inline(ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2) ::: "memory")
>  
>  /*
>   * Alternative inline assembly with input.
> ---
> 
> Build failed at link time with:
> 
> arch/x86/boot/compressed/cmdline.o: In function `native_save_fl':
> cmdline.c:(.text+0x0): multiple definition of `native_save_fl'
> arch/x86/boot/compressed/misc.o:misc.c:(.text+0x1e0): first defined here
> arch/x86/boot/compressed/cmdline.o: In function `native_restore_fl':
> cmdline.c:(.text+0x10): multiple definition of `native_restore_fl'
> arch/x86/boot/compressed/misc.o:misc.c:(.text+0x1f0): first defined here
> arch/x86/boot/compressed/error.o: In function `native_save_fl':
> error.c:(.text+0x0): multiple definition of `native_save_fl'
> 
> which I had to fix with this:
> 
> ---
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index 15450a675031..0d772598c37c 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -14,8 +14,7 @@
>   */
>  
>  /* Declaration required for gcc < 4.9 to prevent -Werror=missing-prototypes */
> -extern inline unsigned long native_save_fl(void);
> -extern inline unsigned long native_save_fl(void)
> +static inline unsigned long native_save_fl(void)
>  {
>         unsigned long flags;
>  
> @@ -33,8 +32,7 @@ ex
> ---
> 
> That "extern inline" declaration looks fishy to me anyway, maybe not really
> needed ? Apparently, gcc < 4.9 complains with -Werror=missing-prototypes...
> 
> Then the build worked and the results look like this:
> 
>    text    data     bss     dec     hex filename
> 17287384        5040656 2019532 24347572        17383b4 vmlinux-before
> 17288020        5040656 2015436 24344112        1737630 vmlinux-2nd-version
> 
> so some inlining must be happening.
> 
> Then I did this:
> 
> ---
> diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
> index 9c5606d88f61..a0170344cf08 100644
> --- a/arch/x86/lib/usercopy_64.c
> +++ b/arch/x86/lib/usercopy_64.c
> @@ -20,7 +20,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
>         /* no memory constraint because it doesn't change any memory gcc knows
>            about */
>         stac();
> -       asm volatile(
> +       asm volatile inline(
>                 "       testq  %[size8],%[size8]\n"
>                 "       jz     4f\n"
>                 "0:     movq $0,(%[dst])\n"
> ---
> 
> to force inlining of a somewhat bigger asm() statement. And yap, more
> got inlined:
> 
>    text    data     bss     dec     hex filename
> 17287384        5040656 2019532 24347572        17383b4 vmlinux-before
> 17288020        5040656 2015436 24344112        1737630 vmlinux-2nd-version
> 17288076        5040656 2015436 24344168        1737668 vmlinux-2nd-version__clear_user
> 
> so more stuff gets inlined.
> 
> Looking at the asm output, it had before:
> 
> ---
> clear_user:
> # ./arch/x86/include/asm/current.h:15:  return this_cpu_read_stable(current_task);
> #APP
> # 15 "./arch/x86/include/asm/current.h" 1
>         movq %gs:current_task,%rax      #, pfo_ret__
> # 0 "" 2
> # arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
> #NO_APP
>         movq    2520(%rax), %rdx        # pfo_ret___12->thread.addr_limit.seg, _1
>         movq    %rdi, %rax      # to, tmp93
>         addq    %rsi, %rax      # n, tmp93
>         jc      .L3     #,
> # arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
>         cmpq    %rax, %rdx      # tmp93, _1
>         jb      .L3     #,
> # arch/x86/lib/usercopy_64.c:52:                return __clear_user(to, n);
>         jmp     __clear_user    #
> ---
> 
> note the JMP to __clear_user. After marking the asm() in __clear_user() as
> inline, clear_user() inlines __clear_user() directly:
> 
> ---
> clear_user:
> # ./arch/x86/include/asm/current.h:15:  return this_cpu_read_stable(current_task);
> #APP
> # 15 "./arch/x86/include/asm/current.h" 1
>         movq %gs:current_task,%rax      #, pfo_ret__
> # 0 "" 2
> # arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
> #NO_APP
>         movq    2520(%rax), %rdx        # pfo_ret___12->thread.addr_limit.seg, _1
>         movq    %rdi, %rax      # to, tmp95
>         addq    %rsi, %rax      # n, tmp95
>         jc      .L8     #,
> # arch/x86/lib/usercopy_64.c:51:        if (access_ok(VERIFY_WRITE, to, n))
>         cmpq    %rax, %rdx      # tmp95, _1
>         jb      .L8     #,
> # ./arch/x86/include/asm/smap.h:58:     alternative("", __stringify(__ASM_STAC), X86_FEATURE_SMAP);
> ...
> 
> this last line is the stac() macro which gets inlined due to the
> alternative() inlined macro above.
> 
> So I guess this all looks like what we wanted.
> 
> And if this lands in gcc9, we would need to do a asm_volatile() macro
> which is defined differently based on the compiler used.
> 
> Thoughts, suggestions, etc are most welcome.
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]                           ` <20181031125526.GA13219@hirez.programming.kicks-ass.net>
@ 2018-10-31 22:04                             ` Segher Boessenkool
  2018-11-01  5:23                             ` Joe Perches
  1 sibling, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-10-31 22:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Ingo Molnar, Richard Biener, Michael Matz, gcc,
	Nadav Amit, Ingo Molnar, linux-kernel, x86, Masahiro Yamada,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	H. Peter Anvin, Jan Beulich, Josh Poimboeuf, Juergen Gross,
	Kate Stewart, Kees Cook, linux-sparse, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, Oct 31, 2018 at 01:55:26PM +0100, Peter Zijlstra wrote:
> Anyway, with the below patch, I get:
> 
>    text    data     bss     dec     hex filename
> 17385183        5064780 1953892 24403855        1745f8f defconfig-build/vmlinux
> 17385678        5064780 1953892 24404350        174617e defconfig-build/vmlinux
> 
> Which shows we inline more (look for asm_volatile for the actual
> changes).
> 
> 
> So yes, this seems like something we could work with.

Great to hear.  Note that with my latest patches
(see https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01931.html ) there
is no required order "asm volatile inline" anymore, so you can just say
"asm inline volatile".  (And similar for "goto" as well).


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]                           ` <20181031125526.GA13219@hirez.programming.kicks-ass.net>
  2018-10-31 22:04                             ` Segher Boessenkool
@ 2018-11-01  5:23                             ` Joe Perches
       [not found]                               ` <20181101090114.GE3159@hirez.programming.kicks-ass.net>
  1 sibling, 1 reply; 40+ messages in thread
From: Joe Perches @ 2018-11-01  5:23 UTC (permalink / raw)
  To: Peter Zijlstra, Borislav Petkov
  Cc: Segher Boessenkool, Ingo Molnar, Richard Biener, Michael Matz,
	gcc, Nadav Amit, Ingo Molnar, linux-kernel, x86, Masahiro Yamada,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	H. Peter Anvin, Jan Beulich, Josh Poimboeuf, Juergen Gross,
	Kate Stewart, Kees Cook, linux-sparse, Philippe Ombredanne,
	Thomas Gleixner, virtualization, Linus Torvalds, Chris Zankel,
	Max Filippov, linux-xtensa, Andrew Morton

On Wed, 2018-10-31 at 13:55 +0100, Peter Zijlstra wrote:
> 
> Anyway, with the below patch, I get:
> 
>    text    data     bss     dec     hex filename
> 17385183        5064780 1953892 24403855        1745f8f defconfig-build/vmlinux
> 17385678        5064780 1953892 24404350        174617e defconfig-build/vmlinux
> 
> Which shows we inline more (look for asm_volatile for the actual
> changes).
[]
>  scripts/checkpatch.pl                              |  8 ++---
>  scripts/genksyms/keywords.c                        |  4 +--
>  scripts/kernel-doc                                 |  4 +--

I believe these should be excluded from the conversions.

Other than that, generic conversion by script seems a good idea.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]                               ` <20181101090114.GE3159@hirez.programming.kicks-ass.net>
@ 2018-11-01  9:20                                 ` Joe Perches
  0 siblings, 0 replies; 40+ messages in thread
From: Joe Perches @ 2018-11-01  9:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Borislav Petkov, Segher Boessenkool, Ingo Molnar, Richard Biener,
	Michael Matz, gcc, Nadav Amit, Ingo Molnar, linux-kernel, x86,
	Masahiro Yamada, Sam Ravnborg, Alok Kataria, Christopher Li,
	Greg Kroah-Hartman, H. Peter Anvin, Jan Beulich, Josh Poimboeuf,
	Juergen Gross, Kate Stewart, Kees Cook, linux-sparse,
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa,
	Andrew Morton

On Thu, 2018-11-01 at 10:01 +0100, Peter Zijlstra wrote:
> On Wed, Oct 31, 2018 at 10:20:00PM -0700, Joe Perches wrote:
> > On Wed, 2018-10-31 at 13:55 +0100, Peter Zijlstra wrote:
> > > Anyway, with the below patch, I get:
> > > 
> > >    text    data     bss     dec     hex filename
> > > 17385183        5064780 1953892 24403855        1745f8f defconfig-build/vmlinux
> > > 17385678        5064780 1953892 24404350        174617e defconfig-build/vmlinux
> > > 
> > > Which shows we inline more (look for asm_volatile for the actual
> > > changes).
> > []
> > >  scripts/checkpatch.pl                              |  8 ++---
> > >  scripts/genksyms/keywords.c                        |  4 +--
> > >  scripts/kernel-doc                                 |  4 +--
> > 
> > I believe these should be excluded from the conversions.
> 
> Probably, yes. It compiled, which was all I cared about :-)
> 
> BTW, if we do that conversion, we should upgrade the checkpatch warn to
> an error I suppose.

More like remove altogether as __inline and __inline__
will no longer be #defined

$ git grep -P 'define\s+__inline'
include/linux/compiler_types.h:#define __inline__ inline
include/linux/compiler_types.h:#define __inline   inline

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]             ` <CAK7LNARmrdtyE7JRAdJH_zbfvD_cej+TGLh+5KfT20o538KUcA@mail.gmail.com>
@ 2018-11-29 13:09               ` Borislav Petkov via gcc
  2018-11-29 13:16                 ` Richard Biener
  2018-11-30  9:06               ` Segher Boessenkool
  1 sibling, 1 reply; 40+ messages in thread
From: Borislav Petkov via gcc @ 2018-11-29 13:09 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Segher Boessenkool, Nadav Amit, Ingo Molnar, H. Peter Anvin,
	rguenther, matz, gcc, Linux Kernel Mailing List, X86 ML,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On Thu, Nov 29, 2018 at 08:46:34PM +0900, Masahiro Yamada wrote:
> But, I'd like to ask if x86 people want to keep this macros.s approach.
> Revert 77b0bf55bc675 right now
> assuming the compiler will eventually solve the issue?

Yap, considering how elegant the compiler solution is and how much
problems this macros-in-asm thing causes, I think we should patch
out the latter and wait for gcc9. I mean, the savings are not so
mind-blowing to have to deal with the fallout.

But this is just my opinion.

Thx.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-11-29 13:09               ` Borislav Petkov via gcc
@ 2018-11-29 13:16                 ` Richard Biener
  2018-11-29 13:25                   ` Borislav Petkov via gcc
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Biener @ 2018-11-29 13:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Masahiro Yamada, Segher Boessenkool, Nadav Amit, Ingo Molnar,
	H. Peter Anvin, matz, gcc, Linux Kernel Mailing List, X86 ML,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On Thu, 29 Nov 2018, Borislav Petkov wrote:

> On Thu, Nov 29, 2018 at 08:46:34PM +0900, Masahiro Yamada wrote:
> > But, I'd like to ask if x86 people want to keep this macros.s approach.
> > Revert 77b0bf55bc675 right now
> > assuming the compiler will eventually solve the issue?
> 
> Yap, considering how elegant the compiler solution is and how much
> problems this macros-in-asm thing causes, I think we should patch
> out the latter and wait for gcc9. I mean, the savings are not so
> mind-blowing to have to deal with the fallout.
> 
> But this is just my opinion.

I'd be not opposed to backporting the asm inline support.

Of course we still have to be happy with it and install the patch ;)

Are you (kernel folks) happy with asm inline ()?

Richard.

> Thx.
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-11-29 13:16                 ` Richard Biener
@ 2018-11-29 13:25                   ` Borislav Petkov via gcc
  2018-11-29 13:52                     ` Richard Biener
  0 siblings, 1 reply; 40+ messages in thread
From: Borislav Petkov via gcc @ 2018-11-29 13:25 UTC (permalink / raw)
  To: Richard Biener
  Cc: Masahiro Yamada, Segher Boessenkool, Nadav Amit, Ingo Molnar,
	H. Peter Anvin, matz, gcc, Linux Kernel Mailing List, X86 ML,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On Thu, Nov 29, 2018 at 02:09:25PM +0100, Richard Biener wrote:
> I'd be not opposed to backporting the asm inline support.

Even better! :-)

> Of course we still have to be happy with it and install the patch ;)
> 
> Are you (kernel folks) happy with asm inline ()?

Yes, I think we are:

https://lkml.kernel.org/r/20181031125526.GA13219@hirez.programming.kicks-ass.net

Thx.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-11-29 13:25                   ` Borislav Petkov via gcc
@ 2018-11-29 13:52                     ` Richard Biener
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Biener @ 2018-11-29 13:52 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Masahiro Yamada, Segher Boessenkool, Nadav Amit, Ingo Molnar,
	H. Peter Anvin, matz, gcc, Linux Kernel Mailing List, X86 ML,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On Thu, 29 Nov 2018, Borislav Petkov wrote:

> On Thu, Nov 29, 2018 at 02:09:25PM +0100, Richard Biener wrote:
> > I'd be not opposed to backporting the asm inline support.
> 
> Even better! :-)
> 
> > Of course we still have to be happy with it and install the patch ;)
> > 
> > Are you (kernel folks) happy with asm inline ()?
> 
> Yes, I think we are:
> 
> https://lkml.kernel.org/r/20181031125526.GA13219@hirez.programming.kicks-ass.net

OK, Segher - can you ping the latest version of the patch please?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
       [not found]             ` <CAK7LNARmrdtyE7JRAdJH_zbfvD_cej+TGLh+5KfT20o538KUcA@mail.gmail.com>
  2018-11-29 13:09               ` Borislav Petkov via gcc
@ 2018-11-30  9:06               ` Segher Boessenkool
  2018-11-30 10:04                 ` Boris Petkov via gcc
  1 sibling, 1 reply; 40+ messages in thread
From: Segher Boessenkool @ 2018-11-30  9:06 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Nadav Amit, Ingo Molnar, H. Peter Anvin, rguenther, matz,
	Borislav Petkov, gcc, Linux Kernel Mailing List, X86 ML,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On Thu, Nov 29, 2018 at 08:46:34PM +0900, Masahiro Yamada wrote:
> On Wed, Oct 10, 2018 at 1:14 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Mon, Oct 08, 2018 at 11:07:46AM +0200, Richard Biener wrote:
> > > On Mon, 8 Oct 2018, Segher Boessenkool wrote:
> > > > On Sun, Oct 07, 2018 at 03:53:26PM +0000, Michael Matz wrote:
> > > > > On Sun, 7 Oct 2018, Segher Boessenkool wrote:
> > > > > > On Sun, Oct 07, 2018 at 11:18:06AM +0200, Borislav Petkov wrote:
> > > > > > > Now, Richard suggested doing something like:
> > > > > > >
> > > > > > >  1) inline asm ("...")
> > > > > >
> > > > > > What would the semantics of this be?
> > > > >
> > > > > The size of the inline asm wouldn't be counted towards the inliner size
> > > > > limits (or be counted as "1").
> > > >
> > > > That sounds like a good option.
> > >
> > > Yes, I also like it for simplicity.  It also avoids the requirement
> > > of translating the number (in bytes?) given by the user to
> > > "number of GIMPLE instructions" as needed by the inliner.
> >
> > This patch implements this, for C only so far.  And the syntax is
> > "asm inline", which is more in line with other syntax.
> >
> > How does this look?
> 
> 
> Thank you very much for your work.
> 
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01932.html
> 
> How is the progress of this in GCC ML?

Latest patch was pinged a few times:
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg01569.html .

I'll ping it again.  Will fix the subject as well if I remember to, sigh.

> I am really hoping the issue will be solved by compiler
> instead of the in-kernel workaround.

This will only be fixed from GCC 9 on, if the compiler adopts it.  The
kernel wants to support ancient GCC, so it will need to have a workaround
for older GCC versions anyway.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-11-30  9:06               ` Segher Boessenkool
@ 2018-11-30 10:04                 ` Boris Petkov via gcc
  2018-12-02  6:03                   ` Segher Boessenkool
  0 siblings, 1 reply; 40+ messages in thread
From: Boris Petkov via gcc @ 2018-11-30 10:04 UTC (permalink / raw)
  To: Segher Boessenkool, Masahiro Yamada
  Cc: Nadav Amit, Ingo Molnar, H. Peter Anvin, rguenther, matz, gcc,
	Linux Kernel Mailing List, X86 ML, Sam Ravnborg, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, Jan Beulich, Josh Poimboeuf,
	Juergen Gross, Kate Stewart, Kees Cook, linux-sparse,
	Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On November 29, 2018 1:25:02 PM GMT+01:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>This will only be fixed from GCC 9 on, if the compiler adopts it.  The
>kernel wants to support ancient GCC, so it will need to have a
>workaround
>for older GCC versions anyway.

What about backporting it, like Richard says?


-- 
Sent from a small device: formatting sux and brevity is inevitable.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: PROPOSAL: Extend inline asm syntax with size spec
  2018-11-30 10:04                 ` Boris Petkov via gcc
@ 2018-12-02  6:03                   ` Segher Boessenkool
  0 siblings, 0 replies; 40+ messages in thread
From: Segher Boessenkool @ 2018-12-02  6:03 UTC (permalink / raw)
  To: Boris Petkov
  Cc: Masahiro Yamada, Nadav Amit, Ingo Molnar, H. Peter Anvin,
	rguenther, matz, gcc, Linux Kernel Mailing List, X86 ML,
	Sam Ravnborg, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	Jan Beulich, Josh Poimboeuf, Juergen Gross, Kate Stewart,
	Kees Cook, linux-sparse, Peter Zijlstra (Intel),
	Philippe Ombredanne, Thomas Gleixner, virtualization,
	Linus Torvalds, Chris Zankel, Max Filippov, linux-xtensa

On Fri, Nov 30, 2018 at 10:06:02AM +0100, Boris Petkov wrote:
> On November 29, 2018 1:25:02 PM GMT+01:00, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >This will only be fixed from GCC 9 on, if the compiler adopts it.  The
> >kernel wants to support ancient GCC, so it will need to have a
> >workaround
> >for older GCC versions anyway.
> 
> What about backporting it, like Richard says?

Let me first get it into GCC trunk :-)

It should backport fine, sure; and I'll work on that.


Segher

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2018-12-02  6:03 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20181003213100.189959-1-namit@vmware.com>
2018-10-07  9:46 ` PROPOSAL: Extend inline asm syntax with size spec Borislav Petkov
2018-10-07 13:23   ` Segher Boessenkool
2018-10-07 14:14     ` Borislav Petkov
2018-10-07 15:14       ` Segher Boessenkool
     [not found]         ` <20181008055838.GA66819@gmail.com>
2018-10-08  7:53           ` Segher Boessenkool
2018-10-07 15:53     ` Michael Matz
2018-10-08  7:31       ` Segher Boessenkool
2018-10-08  9:09         ` Richard Biener
2018-10-08 10:03           ` Segher Boessenkool
2018-10-09 15:54           ` Segher Boessenkool
2018-10-10  7:12             ` Richard Biener
2018-10-10  7:54               ` Segher Boessenkool
     [not found]               ` <20181010072240.GB103159@gmail.com>
2018-10-10  8:03                 ` Segher Boessenkool
2018-10-10  8:19                   ` Borislav Petkov
2018-10-10  8:36                     ` Richard Biener
2018-10-10 18:55                     ` Segher Boessenkool
2018-10-10 19:14                       ` Borislav Petkov
2018-10-13 19:33                         ` Borislav Petkov
     [not found]                           ` <alpine.LNX.2.20.13.1810132355180.13914@monopod.intra.ispras.ru>
2018-10-13 21:31                             ` Borislav Petkov
2018-10-25 10:23                           ` Borislav Petkov
     [not found]                           ` <20181031125526.GA13219@hirez.programming.kicks-ass.net>
2018-10-31 22:04                             ` Segher Boessenkool
2018-11-01  5:23                             ` Joe Perches
     [not found]                               ` <20181101090114.GE3159@hirez.programming.kicks-ass.net>
2018-11-01  9:20                                 ` Joe Perches
2018-10-10 10:29                   ` Richard Biener
2018-10-10 16:31             ` Nadav Amit
2018-10-10 19:31               ` Segher Boessenkool
2018-10-11  7:05               ` Richard Biener
     [not found]             ` <CAK7LNARmrdtyE7JRAdJH_zbfvD_cej+TGLh+5KfT20o538KUcA@mail.gmail.com>
2018-11-29 13:09               ` Borislav Petkov via gcc
2018-11-29 13:16                 ` Richard Biener
2018-11-29 13:25                   ` Borislav Petkov via gcc
2018-11-29 13:52                     ` Richard Biener
2018-11-30  9:06               ` Segher Boessenkool
2018-11-30 10:04                 ` Boris Petkov via gcc
2018-12-02  6:03                   ` Segher Boessenkool
     [not found]       ` <20181008061323.GB66819@gmail.com>
2018-10-08  8:19         ` Segher Boessenkool
     [not found]   ` <4F2F1BCE-7875-4160-9E1E-9F8EF962D989@vmware.com>
2018-10-07 16:13     ` [RESEND] " Nadav Amit
2018-10-07 16:46     ` Richard Biener
2018-10-07 19:06       ` Nadav Amit
2018-10-07 19:52         ` Jeff Law
2018-10-08  7:46         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).