public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: random commentary on -fsplit-stack (and a bug report)
       [not found] <74487365.8071330719690349.JavaMail.root@frigate>
@ 2012-03-03  7:11 ` Jay Freeman (saurik)
  2012-03-04 20:11   ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Jay Freeman (saurik) @ 2012-03-03  7:11 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

> > "Jay Freeman (saurik)" <saurik@saurik.com> writes:
> "Ian Lance Taylor" <iant@google.com> wrote:

> > After getting more sleep, I realize that this problem is actually much
> > more endemic than I had even previously thought. Most any vaguely
> > object-oriented library is going to have tons of function pointers in
> > it, and you often interact solely with those function pointers (as in,
> > you have no actual symbol references anywhere). A simple example: in
> > the case of C++, any call to a non-split-stack virtual function will
> > fail.
> 
> Certainly true in principle, but unlikely in practice.  Why would you
> compile part of your C++ program with split-stack and part without?
> Implementing child classes that define virtual methods for classes
> defined in a precompiled library seems like an unusual case to me.

That is actually incredible common, due to object-oriented GUI libraries such as Qt or MFC. Most development for systems like Symbian, or kernel drivers for Darwin (and I personally believe split stacks could be a great boon in kernel development) are also going to involve subclassing code compiled by someone else and overriding virtual methods. The Qt "Getting Started" one-pager actually does this.

That said, the specific situation I was describing was even simpler: just using a library compiled by someone else, not subclassing it. If you call a virtual method it will be a call via a function pointer without a symbol reference. Any and all C++ language bindings are thereby off-limits to code that has been compiled with -fsplit-stack until the function pointer issue is addressed.

> The abstraction break exists not because I thought it was a good idea,
> but because I couldn't see any other way to do it.  The split-stack
> system needs to work in a world with precompiled libraries and where
> people will not change their source code.  Any approach that requires
> people to rebuild the world, or to edit their source code, is a
> non-starter for me.  I don't mind if there is some more efficient
> mechanism which works if we require those steps, but I wanted a system
> that would work where we do not require them.  I'm willing to impose
> restrictions like "you must compile all your source code with
> -fsplit-stack;" I'm not willing to say "you must not use precompiled
> libraries."

I feel like there was some misunderstanding here, mostly because I agree with your constraints. ;P That said, I feel the current implementation does not quit provide this dream: the function-pointer restriction alone makes it impossible to use numerous standard precompiled libraries. FWIW, though: I certainly do not want to go in a direction that requires /more/ intervention than I feel like I am having to make currently, and I agree that it is possible with none and that should be the target.

> > Part of me (and I realize that this causes other tradeoffs, and I'm
> > therefore not even recommending it: more just musing) feels like the
> > notion of "supports split stack" is more of a calling convention. In
> > the same way that gcc currently supports regparm, stdcall, thiscall,
> > fastcall... it seems like it might simply be a new attribute (probably
> > orthogonal to the calling convention) a function can have (and would
> > not have by default): splitcall.
> 
> I'm fine with that, but it requires a source code change, so I want the
> system to work without it.

Accepted. I believe that the extension of this idea can actually be pulled off with compiler flags and a feature I'm calling "fallback symbols" in the linker, which the linker may already support (although a long time ago I spent a lot of time looking at the linker and don't remember it), but I think would be easy to add and something a lot of other projects an interesting tool), but I will not mention this idea again unless I have a solid proposal that actually manages to accomplish that.

> > Actually, thinking about it more: it seems like 99% of these problems
> > could be solved by providing a second symbol definition for the
> > split-stack prologue and binding that as part of the type
> > signature. So, you could either call the "original implementation" of
> > a function using its normal symbol, or you could call the split-stack
> > prologue version of the same function using one that had been mangled
> > with some prefix.
...
> It's an interesting idea, but my immediate reaction is to ask how this
> helps us in the world where we do not require source code changes.

So, imagine a more general linker feature (again, one which may already exist) that allowed you to have a "low-priority symbol" in your object file. As in, one which if you find a copy elsewhere the linker will use that one, but if it doesn't then it will "fall back" to using the one defined locally in this object file. I think that this fairly general mechanism is easy to specify and can probably be used for all sorts of other tasks that people may come up with; it also is not architecture-dependent (as parsing the instructions for split-stacks is).

Then, when compiling an object file with -fsplit-stack, you A) generate both the original symbols and the ".split.*" symbols, as I described in my e-mail and B) make the lookup of /all/ function symbols be to the ".split.*" varieties. At this point, of course, using libraries compiled without -fsplit-stack would be impossible. However, then we then C) additionally throw a "fall back" ".split.*" stub for every function that we call from the library whose implementation is two instructions: call __morestack_non_split, and then branch to the original symbol.

So, to be clear/explicit: if you compiled a library without -fsplit-stack and a client with it, the client's version of ".split.*" would be used; but, if you compiled the library with -fsplit-stack, then the linker would prefer those exported symbols, and bind to them instead of the low-priority local-only "fallback symbols" we included in our object file.

This scheme also has the benefit that there would be almost no penalty (I say "almost", as technically having the extra few instructions in the file affects the CPU's code cache by bloating the code segments slightly) for distributing -fsplit-stack compiled versions of libraries, including libgcc and lib{std,sup}c++. People consuming those libraries from clients that are not compiled with -fsplit-stack, even if they are using a linker without this feature, would not notice any difference in behavior or any extra code executed: things would just work perfectly.


In the mean time I've faxed in the FSF copyright assignment and am waiting for a reply on that. I am going to be putting together some patches to the implementation for the __{more,split}stack{,_*} soon. I haven't disappeared, or been scared off (far from it, in fact ;P), although this is honestly a side-project for me, so I can't look at it every day (which I only say to be explicit on expectations, not because I thought you actually were bothered that I didn't respond to your e-mail for a few days, as I know you are also busy yourself with many projects ;P).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: random commentary on -fsplit-stack (and a bug report)
  2012-03-03  7:11 ` random commentary on -fsplit-stack (and a bug report) Jay Freeman (saurik)
@ 2012-03-04 20:11   ` Ian Lance Taylor
  0 siblings, 0 replies; 5+ messages in thread
From: Ian Lance Taylor @ 2012-03-04 20:11 UTC (permalink / raw)
  To: Jay Freeman (saurik); +Cc: gcc

"Jay Freeman (saurik)" <saurik@saurik.com> writes:

> So, imagine a more general linker feature (again, one which may
> already exist) that allowed you to have a "low-priority symbol" in
> your object file. As in, one which if you find a copy elsewhere the
> linker will use that one, but if it doesn't then it will "fall back"
> to using the one defined locally in this object file. I think that
> this fairly general mechanism is easy to specify and can probably be
> used for all sorts of other tasks that people may come up with; it
> also is not architecture-dependent (as parsing the instructions for
> split-stacks is).

You can do this with most ELF linkers by defining the local symbol as a
weak global symbol.

> Then, when compiling an object file with -fsplit-stack, you A)
> generate both the original symbols and the ".split.*" symbols, as I
> described in my e-mail and B) make the lookup of /all/ function
> symbols be to the ".split.*" varieties. At this point, of course,
> using libraries compiled without -fsplit-stack would be
> impossible. However, then we then C) additionally throw a "fall back"
> ".split.*" stub for every function that we call from the library whose
> implementation is two instructions: call __morestack_non_split, and
> then branch to the original symbol.

Nice idea.  I like it.

It would be nice if the linker could discard unused stubs here.  The
stubs could go in their own section, of course, but it would be great if
the linker could discard them without using general garbage collection.
Of course this could be a special purpose linker feature, but I also
wonder if it could be more generalized somehow.

Ian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: random commentary on -fsplit-stack (and a bug report)
  2012-02-28 20:32 ` Ian Lance Taylor
@ 2012-02-28 22:47   ` Jay Freeman (saurik)
  0 siblings, 0 replies; 5+ messages in thread
From: Jay Freeman (saurik) @ 2012-02-28 22:47 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

> > "Jay Freeman (saurik)" <saurik@saurik.com>
> "Ian Lance Taylor" <iant@google.com>

> Thanks for the bug report and the analysis.  I think it does simply
> require an '&'.  That makes it analogous to the way
> __morestack_release_segments is used in generic-morestack-thread.c. 

The only reason I hesitated on that is that it might not make sense to update the pointer in the context. In my specific case, that will actually cause it to crash ;P, as while the current stack I'm calling __splitstack_releasecontext from is valid, the context pointer I'm passing is actually stored on the old stack, and will be unallocated by __morestack_releasse_segments.

I can always just change my code to copy the context to the other stack before calling __splitstack_releasecontext, however, so that isn't a problem for me. Though, I also wasn't certain what the releasecontext function actually wanted to do with that pointer, as I hadn't yet read much of the morestack code; I now see that it is just the head of a linked list, so yeah: passing the address out of the context seems fine.

> As you know, I wanted to allow for future expansion.  I agree that it
> would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
> off space for time, since it would mean that setcontext would have to
> walk up the list.  I think CURRENT_STACK is required for
> __splitstack_find_context.  And __splitstack_find_context is required
> for Go's garbage collector.  At least, it's not obvious to me how to
> avoid requiring CURRENT_STACK for that case.

The basis of that suggestion was not just that the items in the context could be removed, but that the underlying state used by split stacks might not need the values at all. In this case, I am not certain why __morestack_segments is needed: it seems to only come in to play when __morestack_current_segment is NULL (and I'm not certain how that would happen) and while deallocating dynamic blocks (which is already linear).

I might provide a patch to better describe what I mean by this. I've started the process of getting a copyright assignment in place (sent an e-mail to fsf-records@gnu.org per http://gcc.gnu.org/wiki/CopyrightAssignment).

> I agree.  Want to write a patch?  Or at least file a bug report.

Sure.

> [paragraph moved below]

> > 7) Using the linker to handle the transition between split-stack and
> > non-split-stack code seems like a good way to solve the problem of "we
> > need large stacks when hitting external code", but in staring at the
> > resulting code I have in my project I'm seeing that it isn't reliable:
> > if you have a pointer to a function the linker will not know what you
> > are calling. In my case, this is coming up often due to using
> > std::function.
> 
> Yes, good point.  I think I had some plan for handling that but I no
> longer recall what it was.

After getting more sleep, I realize that this problem is actually much more endemic than I had even previously thought. Most any vaguely object-oriented library is going to have tons of function pointers in it, and you often interact solely with those function pointers (as in, you have no actual symbol references anywhere). A simple example: in the case of C++, any call to a non-split-stack virtual function will fail.

"""Function pointers are a tricky case. In general we don't know whether a function pointer points to split-stack code. Therefore, all calls through a function pointer will be modified to call (or jump to) a special function __fnptr_morestack. This will use a target specific function calling sequence, and will be implemented as though it were itself a function call instruction. That is, all the parameters will be set up, and then the code will jump to __fnptr_morestack. The __fnptr_morestack function takes two parameters: the function pointer to call, and the number of bytes of arguments pushed on the stack. (This is not yet implemented.)"""

That paragraph is from your design document (SplitStacks on the GCC wiki). I presume that this solution would only work if __fnptr_morestack always assumed that the target did not support split-stack? Alternatively, I can see having that stub look at the function to see if its first instruction was a comparison to the TCB stack limit entry (using similar logic to that used by the linker)? [also, see below in this e-mail]

> > More awkwardly, split-stack functions that mention (but do not call)
> > non-split-stack functions (such as to return their address) are being
> > mis-flagged by the linker. Honestly, I question whether the linker
> > fundamentally has enough information about what is going on to be able
> > to make sufficiently accurate decisions with regards to stack
> > constraints to warrant the painful abstraction breakage that
> > split-stack uses. :(
> 
> Your're right that the linker doesn't really have enough information.
> But is a split-stack function that returns the address of a
> non-split-stack function really so frequent that it's worth worrying
> about?

I guess the question I have is: is one of the goals to make this option "safe to turn on for a random project"? Given the abstraction break that was made between the compiler and the linker, it would seem like this was a rather critically important goal (as now both the linker and the compiler are less modular and more difficult to modify), but in fact the result doesn't manage to solve seemingly simple corner cases.

The reason I'm running into these issues is not due to virtual dispatch (at least yet: this codebase was C 5 years ago, but is now being ported to C++), but instead due to higher-order functions. I'm finding myself in situations where std::function and std::bind are disconnecting the symbol references from the call sites sufficiently (even moving them to different stacks ;P) to cause the linker to make seemingly random decisions.

That said, I can demonstrate a really common idiom, from C (not C++), that is almost always going to involve non-split-stack code (as malloc and free are normally going to be in libc, compiled without -fsplit-stack), and that is morally equivalent to "returning a function pointer and using it later": data structures that keep information on a block of dynamically allocated memory and "how to free it". Here's a lame version:

struct String {
    const char *data;
    void (*free)(void *);
};

void ClearString(String *string) {
    if (string->data != NULL && string->free != NULL)
        string->free(string->data);

    string->data = NULL;
    string->free = NULL;
}

void SetString(String *string, const char *data, bool alloc) {
    ClearString(string);

    string->data = data;
    string->free = alloc ? &free : NULL;
}

void f(String *string) {
    SetString(string, "hello", true);
    ClearString(string); /* potential stack overflow */
}

(Incidentally, if you use std::vector with a custom allocator that has any kind of indirection in it, this is going to come up quite a lot. The code for vector instantiated over your allocator will be compiled as part of your code with -fsplit-stack, but if the memory allocator being used is something compiled without then you are going to end up with a really complex version of the above code and a stack overflow.)

[paragraph from above]
> It would certainly be possible for the compiler to arrange to allocate a
> large stack as it called the non-split-stack function.  Unfortunately, I
> don't see how the linker could do it.  And it's the linker, not the
> compiler, that knows that it is a call to a non-split-stack function.

However, the linker doesn't actually have any notion of "calls", which is what causes the previous problems. In a language like C++ (or even C) it isn't really true that a function that calls another function will go through a symbol reference to do it. Anyone who uses code that involves dlsym, higher-order functions, or polymorphic object-oriented libraries will run into cases that the current -fsplit-stack implementation doesn't even provide good (certainly not documented) workarounds for.

Part of me (and I realize that this causes other tradeoffs, and I'm therefore not even recommending it: more just musing) feels like the notion of "supports split stack" is more of a calling convention. In the same way that gcc currently supports regparm, stdcall, thiscall, fastcall... it seems like it might simply be a new attribute (probably orthogonal to the calling convention) a function can have (and would not have by default): splitcall.

In such an implementation, like many of the existing calling-convention related attributes, splitcall would be considered part of the type signature (and thereby would not be allowed to be put on a definition and not on the related prototype), and could be opted in for a large block of code using a #pragma or a compiler-switch. (Again: this is just musing. I haven't put much thought into whether this would actually be semantically reasonable yet.)

[see above in this e-mail] For cases where the compiler "simply doesn't know", the solution that was brought up for function pointers could be used: have a level of indirection in the calls that includes the number of arguments. That code could then read the target of the call to see whether the function at the other side looked like it supported split-stack, and if not it could allocate more stack at the time of that call.

The developer would now be put in the position of thinking about what they are calling sometimes (and making certain that their usage of the pragma and header files lined up), but honestly I already am having to think about that (due to the linker having both false positives and false negatives for all of the above reasons, whether the inliner conflating calls or function pointers obfuscating them), and I have no explicit mechanism to override it.

In fact, I almost want to say that the worst-case scenario in the "rely on the compiler" is the developer throwing up their hands in defeat and attempting to recompile "the world" (including libgcc, libsupc++, etc.) with -fsplit-stack... but that's where I already am at with the current linker-based implementation: the main/only way I'm going to be able to avoid having function pointers to non-split-stack code is to recompile every library I need with -fsplit-stack.

> > A specific idea that might help, however, is to set things up so that
> > the PLT actually handles the stack increases when you are linking to
> > functions that are in a dynamic library. That way, calls to open (for
> > example) would not cause the function that called it to suddenly
> > require a large stack, but instead only as control is transferred to
> > open would the stack size increase. (This might be quite complex,
> > though.)
> 
> Yes, again you have to know how many bytes of arguments were pushed on
> the stack.  You can pretty much know this for open, of course, but it's
> a lot more complex for printf (if printf were compiled in split-stack
> mode it would straightforward, but of course in this example it is
> not).
> 
> I agree that this could be a lot nicer.  It's a bit less important for
> Go because obviously the Go compiler is completely in control of all
> functions called by Go code.

In this model (still using the linker, but pushing the stack-split into or around the @plt stub function), I would have to propose that variadic functions are treated specially (possibly using a similar/identical setup to the one you were proposing for function pointers) where the argument count was also passed. This could be pushed onto the stack right before the call and popped/thrown off the stack first thing in the stub when not needed (which has the benefit of being portable between targets and not messing with the existing argument placement).

> > That said, I don't have a better solution to suggest right now (I
> > really want to say that having attributes available to declare
> > split-stack functionality in the code would be better, but that has
> > other ramifications), but I do have concerns that due to attempts to
> > keep the ABI fixed decisions made now (when there seem to only be a
> > single major user, Go) will lock in how the mechanism is capable of
> > functioning in the future.
> 
> I may misunderstand your suggestion, but I think that keeping the ABI
> fixed is a requirement.  Any ABI change would require rebuilding all
> libraries and changing the debugger.  The result would not be usable
> for most people.

What I meant by these "concerns" is that it seems like the current mechanism for -fsplit-stack is going to get locked into place (and be unable to change due to ABI breakage) in a state where even with the abstraction leak between the compiler and the linker (which I feel is quite costly) it doesn't really solve the problem for many users (and possibly even, any other than Go, which might have enough constraints to make this work).

However, this might not actually be that big of a concern, thinking about it more. As the existing implementation is, as we both believed, unlikely to be incorporated into the default library build, it is really then just a matter that -fsplit-stack has to exist with this implementation and Gold needs to continue to supporting it; gcc already has tons of ld-specific flags: this could just be another one. A later/different implementation of split-stack could be -fsplit-stack-ex or something, and existing independently and in parallel.

[vaguely in reply to everything above]

Actually, thinking about it more: it seems like 99% of these problems could be solved by providing a second symbol definition for the split-stack prologue and binding that as part of the type signature. So, you could either call the "original implementation" of a function using its normal symbol, or you could call the split-stack prologue version of the same function using one that had been mangled with some prefix.

extern "C" int test() {
    return 0xdeadbeef;
}

0000000000404920 <test>:
  404920:       64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  404927:       00 00 
  404929:       72 06                   jb     404931 <test+0x11>
  40492b:       b8 ef be ad de          mov    $0xdeadbeef,%eax
  404930:       c3                      retq   
  404931:       45 31 d2                xor    %r10d,%r10d
  404934:       45 31 db                xor    %r11d,%r11d
  404937:       e8 6d 6b 00 00          callq  40b4a9 <__morestack>
  40493c:       c3                      retq   
  40493d:       eb ec                   jmp    40492b <test+0xb>
  40493f:       90                      nop

In this case (and yes: this is an example of a function that shouldn't need this prologue at all, but it was short ;P), the existing implementation of -fsplit-stack has modified the function to fundamentally check its stack. No matter how you attempt to call it, we now have to know whether the function supports the split-stack protocol using an out-of-line mechanism, and we cannot enforce our beliefs in the compiler: the linker is complete control of this decision. However, we could instead have it do this:

0000000000404920 <.split.test>:
  404920:       64 48 3b 24 25 70 00    cmp    %fs:0x70,%rsp
  404927:       00 00 
  404929:       72 06                   jb     404931 <test+0x6>
000000000040492b <test>:
  40492b:       b8 ef be ad de          mov    $0xdeadbeef,%eax
  404930:       c3                      retq   
  404931:       45 31 d2                xor    %r10d,%r10d
  404934:       45 31 db                xor    %r11d,%r11d
  404937:       e8 6d 6b 00 00          callq  40b4a9 <__morestack>
  40493c:       c3                      retq   
  40493d:       eb ec                   jmp    40492b <test>
  40493f:       90                      nop

Now the decision to call either test or .split.test becomes explicit. This would allow us to get a linker error if we made an incorrect decision in my earlier not-really-a-suggestion-more-of-a-musing of making this knowledge explicit in the compiler akin to a calling convention. If the compiler decided that something wasn't split-stack, then it would just handle allocating the larger stack before the call to the underlying function; or, if it decided the function was split-stack, the linker would enforce it, and the user would get a reasonable error.

This also has the amazing benefit that it no longer would extract a runtime performance cost for libraries such as libc, libsupc++, and libgcc to be compiled with -fsplit-stack. The existing people who were using the library would continue to call the original version of the code, and only people who were trying to opt-in to the split-stack universe would be using the new split-stack variations. You could imagine an entire distribution of Linux (or whatever) that was compiled with this feature active, and the result would only be a slight increase in code-size.

So: thoughts? I really do think that there is a way to do this -fsplit-stack feature that allows more people to use it and for it to "change everything" / "take over the world" ;P. In my eyes, doing that would require a) that there is no impediment to just compiling everything with -fsplit-stack and b) the functionality to work with a stock linker (as many platforms are not supported by Gold). I think that with some variation on some of my above implementation ideas, this feature could be done generically in the compiler (and libgcc) for all platforms.

(Obviously, though: this is something I've only been looking at for days, and only seriously thinking about the implementation concerns of for hours, so I could easily be overlooking something obvious that you thought about two years ago that makes any or all of these ideas untenable. I certainly will not be bothered to learn that this is all stupid, and in fact will highly appreciate the feedback. Again: thank you so much for even reading any of these thoughts in the first place. ;P)

Sincerely,
Jay Freeman (saurik)
saurik@saurik.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: random commentary on -fsplit-stack (and a bug report)
  2012-02-28  8:07 Jay Freeman (saurik)
@ 2012-02-28 20:32 ` Ian Lance Taylor
  2012-02-28 22:47   ` Jay Freeman (saurik)
  0 siblings, 1 reply; 5+ messages in thread
From: Ian Lance Taylor @ 2012-02-28 20:32 UTC (permalink / raw)
  To: Jay Freeman (saurik); +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 11559 bytes --]

"Jay Freeman (saurik)" <saurik@saurik.com> writes:

> As demonstrated by these snippets, __morestack_segments is a pointer
> to a stack_segment; it is being stored in the context as a void *, but
> is being removed from the context and being passed directly to
> __morestack_release_segments, which in turn expects a pointer to a
> pointer to a stack_segment, not just a bare pointer to a stack
> segment. Probably quite simple to fix (although might be more complex
> than just "add a &").

Thanks for the bug report and the analysis.  I think it does simply
require an '&'.  That makes it analogous to the way
__morestack_release_segments is used in generic-morestack-thread.c.  So,
fixed with the appended patch.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.


> 1) The current implementation (maybe this is intended to change?) uses
> mmap() to allocate stack segments, which means that every allocation
> involves a system call, a lock in the kernel on a slow data structure
> (anon_vma), and has some non-zero probability of ending up with a
> separate VMA (which is not only slow, but in my understanding uses up
> a limited resource: you can only have 64k VMAs per process).
>
> Is it possible to instead expose the functionality for allocating
> stack segments out of libgcc for easy replacement by coroutine
> runtimes? I would really love to be able to use my existing memory
> manager to allocate the stack segments. I realize that this allocation
> routine would need to be able to operate with almost no stack: that
> isn't a problem (I can always pivot to another stack if I need any
> stack).

Makes sense to me.  It could actually be used by Go as well, since libgo
has its own memory allocator.  Probably the best approach would be to
add another function, __splitstack_set_allocator or something, which
would set the allocator function.  The allocator would have to be
updated atomically, of course.  And, as you say, it would have to
operate with limited stack space.


> 2) I had seen a discussion on the mailing list regarding argument
> copying, and I must say I'm somewhat confused as to why it is
> sufficient to memcpy the arguments from the old stack to the new one:
> if I have an argument with a non-POD type that has a non-trivial copy
> constructor, it would seem like I need a copy operation to be able to
> use the object from the new stack (maybe, for example, it has an
> internal pointer).

Non-POD values are always passed by pointer.  They are not copied.  See
pass_by_reference in function.c.  Basically, the frontend is responsible
for calling the copy constructor where required.  This is because the
middle-end may want to shuffle the value around in various ways, and we
don't want it to call the copy constructor multiple times while doing
that.


> 3) If I have either blocked signals on my thread or have setup an
> alternate signal stack with sigaltstack, I can get away with
> super-tiny stacks. However, allocate_segment has a minimum stack size
> of MINSIGSTKSZ (I presume to allow for signals), which on some systems
> I use (such as Mac OS X) I've seen be set as high as 32kB. (Meanwhile,
> MINSIGSTKSZ on Linux is smaller than a page, so mmap() can't even
> allocate it.)

Makes sense to me.  There should be a way for the program to specify the
minimum stack segment size.


> 4) 10 64-bit words for the splitstack context is a really large amount
> of space. :( I don't even have that much CPU-state (there are only 8
> registers that really need to be saved when switching between
> coroutines). Considering the stack segments form a doubly-linked-list,
> it would seem like MORESTACK_SEGMENTS and CURRENT_SEGMENT are
> redundant. I also feel like CURRENT_STACK could be worked around
> fairly well.

As you know, I wanted to allow for future expansion.  I agree that it
would be possible to avoid storing MORESTACK_SEGMENTS--that would trade
off space for time, since it would mean that setcontext would have to
walk up the list.  I think CURRENT_STACK is required for
__splitstack_find_context.  And __splitstack_find_context is required
for Go's garbage collector.  At least, it's not obvious to me how to
avoid requiring CURRENT_STACK for that case.

Not sure what to do here.  Obviously you could cheat and use 7 words
instead of 10.


> 5) As currently implemented, the stack space check is added to every
> single function. However, some functions do not actually use the stack
> (or might even be avoiding memory accesses entirely). When I look at
> the disassembly of my project, I see many references to __morestack
> and "cmp %fs:0x70,%rsp" in functions that would otherwise be just a
> few instructions long. Functions that don't use stack should avoid the
> check.

I agree.  Want to write a patch?  Or at least file a bug report.


> 6) I have noticed that the purpose of having split stacks seems
> largely hobbled by the way the linker enforces humungous stacks on
> outgoing calls to non-split-stack code, even if that code isn't
> called. As an incredibly painful example: __splitstack_getcontext is
> not compiled with split-stack support, which means that the function I
> have to switch coroutines (called from every coroutine) allocates
> stack.

Yes, this is painful.

> To explain what I mean by "even if that code isn't called": my code
> hardly ever throws exception, but because I support them I end up with
> _Unwind_resume in most of my functions; I thereby get burned with
> giant stacks. It would seem more ideal (although I see how this would
> be much more difficult) if there were some way to only allocate the
> larger stack as the call is made to the non-split-stack function, not
> when entering the split-stack one.

It would certainly be possible for the compiler to arrange to allocate a
large stack as it called the non-split-stack function.  Unfortunately, I
don't see how the linker could do it.  And it's the linker, not the
compiler, that knows that it is a call to a non-split-stack function.
The linker could do it if the compiler could somehow indicate the number
of bytes of arguments that were pushed on the stack, but I don't see a
good way to do that offhand.

> A lot of these problems would be solved if libgcc (and whatever
> friends, such as libsupc++) were themselves compiled with
> -fsplit-stack. Of course, I can't imagine that anyone would want to
> pay the performance penalty for that globally ;P. So, is there some
> plan to either do that for the entire build, or to provide alternative
> versions of those libraries that can be linked to while using
> -fsplit-stack in your own code?

If you are building your own toolchain you could use multilib to get a
-fsplit-stack version of libgcc and libstdc++.  Or you could simply add
-fsplit-stack to CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET.  But I don't
see a clean way to handle this if you are not building your own
toolchain.  As you say, -fsplit-stack is unlikely to be incorporated
into the default library build any time soon.

> That said, I don't think that that entirely does away with this
> "uncalled function drags in stack requirements" problem, as I want to
> say the core issue comes down to how this interacts with the
> inliner. In many of these cases, the call to the non-split-stack
> function is in some leaf function of a giant call graph that was
> flattened to a single massive function during the optimization pass.
>
> The result is that if you ever interact with non-split-stack code
> anywhere, you really need to be quite explicit about __noinline__ to
> keep it from tainting the stack requirements of other functions. Part
> of me feels like there must be a better way of handling the stack
> expansions (such as by putting it at the call-site in situations like
> this), although I realize that might be difficult with the linker in
> charge of it.
>
> A specific idea that might help, however, is to set things up so that
> the PLT actually handles the stack increases when you are linking to
> functions that are in a dynamic library. That way, calls to open (for
> example) would not cause the function that called it to suddenly
> require a large stack, but instead only as control is transferred to
> open would the stack size increase. (This might be quite complex,
> though.)

Yes, again you have to know how many bytes of arguments were pushed on
the stack.  You can pretty much know this for open, of course, but it's
a lot more complex for printf (if printf were compiled in split-stack
mode it would straightforward, but of course in this example it is not).

I agree that this could be a lot nicer.  It's a bit less important for
Go because obviously the Go compiler is completely in control of all
functions called by Go code.



> 7) Using the linker to handle the transition between split-stack and
> non-split-stack code seems like a good way to solve the problem of "we
> need large stacks when hitting external code", but in staring at the
> resulting code I have in my project I'm seeing that it isn't reliable:
> if you have a pointer to a function the linker will not know what you
> are calling. In my case, this is coming up often due to using
> std::function.

Yes, good point.  I think I had some plan for handling that but I no
longer recall what it was.

> More awkwardly, split-stack functions that mention (but do not call)
> non-split-stack functions (such as to return their address) are being
> mis-flagged by the linker. Honestly, I question whether the linker
> fundamentally has enough information about what is going on to be able
> to make sufficiently accurate decisions with regards to stack
> constraints to warrant the painful abstraction breakage that
> split-stack uses. :(

Your're right that the linker doesn't really have enough information.
But is a split-stack function that returns the address of a
non-split-stack function really so frequent that it's worth worrying
about?

> That said, I don't have a better solution to suggest right now (I
> really want to say that having attributes available to declare
> split-stack functionality in the code would be better, but that has
> other ramifications), but I do have concerns that due to attempts to
> keep the ABI fixed decisions made now (when there seem to only be a
> single major user, Go) will lock in how the mechanism is capable of
> functioning in the future.

I may misunderstand your suggestion, but I think that keeping the ABI
fixed is a requirement.  Any ABI change would require rebuilding all
libraries and changing the debugger.  The result would not be usable for
most people.


> If some of these things are in the "yeah, changing that would be
> interesting, we are just don't have many people working on the
> feature", I'd be happy to throw some patches towards it. I hesitate to
> just start sending patches over the wall, however, without first doing
> some kind of verification that I have any clue what I'm doing; I
> certainly am not certain how things would be prioritized, or even
> really who is working on it. ;P

In general, patches would be great (sign the copyright assignment first,
of course).  You are correct that there aren't many people working on
the feature--I'm the only one.  And I'm not even really keeping up with
Go, much less the use of split stack code in other languages.

Thanks for the thoughtful comments.

Ian


2012-02-28  Ian Lance Taylor  <iant@google.com>

	* generic-morestack.c (__splitstack_releasecontext): Correct call
	to __morestack_release_segments.



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch --]
[-- Type: text/x-diff, Size: 565 bytes --]

Index: libgcc/generic-morestack.c
===================================================================
--- libgcc/generic-morestack.c	(revision 184633)
+++ libgcc/generic-morestack.c	(working copy)
@@ -1104,7 +1104,9 @@ __splitstack_resetcontext (void *context
 void
 __splitstack_releasecontext (void *context[10])
 {
-  __morestack_release_segments (context[MORESTACK_SEGMENTS], 1);
+  __morestack_release_segments (((struct stack_segment **)
+				 &context[MORESTACK_SEGMENTS]),
+				1);
 }
 
 /* Like __splitstack_block_signals, but operating on CONTEXT, rather

^ permalink raw reply	[flat|nested] 5+ messages in thread

* random commentary on -fsplit-stack (and a bug report)
@ 2012-02-28  8:07 Jay Freeman (saurik)
  2012-02-28 20:32 ` Ian Lance Taylor
  0 siblings, 1 reply; 5+ messages in thread
From: Jay Freeman (saurik) @ 2012-02-28  8:07 UTC (permalink / raw)
  To: gcc

Hello. A couple years ago I got really excited about the gcc "split stacks" feature that was being developed, and I recently noticed that it is ready to use now. I thereby have been spending the last few days trying it with one of my side-projects (currently just a toy, but something I hope to have in production one day: an event mediator that makes usage of light-weight coroutine-based threads to implement various protocols).

Yesterday, I integrated support for the new -fsplit-stacks libgcc __splitstack_*context functions (the ones that were added to allow coroutine libraries to save/restore the splitstack context; I've linked to the relevant mailing list threads below this paragraph), and I noticed that using __splitstack_releasecontext didn't actually seem to cause anything to get deallocated (watching with strace: mmap, no munmap).

http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg20517.html
http://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg21898.html

After staring at it some in gdb, I figured out why: a pointer is being passed as if it were a pointer to a pointer rather than a direct pointer, obscured and not found by the compiler because it is being cast and marshaled through the void *[10] array that is used to store split stack contexts (which, btw, might be better represented internally as a struct, to avoid issues like this ;P).

    223 __thread struct stack_segment *__morestack_segments
    224   __attribute__ ((visibility ("default")));
    ... ...
    992 __splitstack_getcontext (void *context[NUMBER_OFFSETS])
    993 {
    994   memset (context, 0, NUMBER_OFFSETS * sizeof (void *));
    995   context[MORESTACK_SEGMENTS] = (void *) __morestack_segments;
    ... ...
   1105 __splitstack_releasecontext (void *context[10])
   1106 {
   1107   __morestack_release_segments (context[MORESTACK_SEGMENTS], 1);
    ... ...
    441 struct dynamic_allocation_blocks *
    442 __morestack_release_segments (struct stack_segment **pp, int free_dynamic)

As demonstrated by these snippets, __morestack_segments is a pointer to a stack_segment; it is being stored in the context as a void *, but is being removed from the context and being passed directly to __morestack_release_segments, which in turn expects a pointer to a pointer to a stack_segment, not just a bare pointer to a stack segment. Probably quite simple to fix (although might be more complex than just "add a &").


While I am sending an e-mail regarding -fsplit-stack, though, I figured I would also mention some design issues I've noticed while using it. Some of these may just be "me being stupid" (as I've only been looking at this in depth over the last few days), but I at least have had this idea "on the back burner" for a long time now, and am actually integrating and consuming the APIs that are resulting. Feel free to ignore me.


1) The current implementation (maybe this is intended to change?) uses mmap() to allocate stack segments, which means that every allocation involves a system call, a lock in the kernel on a slow data structure (anon_vma), and has some non-zero probability of ending up with a separate VMA (which is not only slow, but in my understanding uses up a limited resource: you can only have 64k VMAs per process).

Is it possible to instead expose the functionality for allocating stack segments out of libgcc for easy replacement by coroutine runtimes? I would really love to be able to use my existing memory manager to allocate the stack segments. I realize that this allocation routine would need to be able to operate with almost no stack: that isn't a problem (I can always pivot to another stack if I need any stack).


2) I had seen a discussion on the mailing list regarding argument copying, and I must say I'm somewhat confused as to why it is sufficient to memcpy the arguments from the old stack to the new one: if I have an argument with a non-POD type that has a non-trivial copy constructor, it would seem like I need a copy operation to be able to use the object from the new stack (maybe, for example, it has an internal pointer).


3) If I have either blocked signals on my thread or have setup an alternate signal stack with sigaltstack, I can get away with super-tiny stacks. However, allocate_segment has a minimum stack size of MINSIGSTKSZ (I presume to allow for signals), which on some systems I use (such as Mac OS X) I've seen be set as high as 32kB. (Meanwhile, MINSIGSTKSZ on Linux is smaller than a page, so mmap() can't even allocate it.)


4) 10 64-bit words for the splitstack context is a really large amount of space. :( I don't even have that much CPU-state (there are only 8 registers that really need to be saved when switching between coroutines). Considering the stack segments form a doubly-linked-list, it would seem like MORESTACK_SEGMENTS and CURRENT_SEGMENT are redundant. I also feel like CURRENT_STACK could be worked around fairly well.


5) As currently implemented, the stack space check is added to every single function. However, some functions do not actually use the stack (or might even be avoiding memory accesses entirely). When I look at the disassembly of my project, I see many references to __morestack and "cmp    %fs:0x70,%rsp" in functions that would otherwise be just a few instructions long. Functions that don't use stack should avoid the check.


6) I have noticed that the purpose of having split stacks seems largely hobbled by the way the linker enforces humungous stacks on outgoing calls to non-split-stack code, even if that code isn't called. As an incredibly painful example: __splitstack_getcontext is not compiled with split-stack support, which means that the function I have to switch coroutines (called from every coroutine) allocates stack.

To explain what I mean by "even if that code isn't called": my code hardly ever throws exception, but because I support them I end up with _Unwind_resume in most of my functions; I thereby get burned with giant stacks. It would seem more ideal (although I see how this would be much more difficult) if there were some way to only allocate the larger stack as the call is made to the non-split-stack function, not when entering the split-stack one.

A lot of these problems would be solved if libgcc (and whatever friends, such as libsupc++) were themselves compiled with -fsplit-stack. Of course, I can't imagine that anyone would want to pay the performance penalty for that globally ;P. So, is there some plan to either do that for the entire build, or to provide alternative versions of those libraries that can be linked to while using -fsplit-stack in your own code?

That said, I don't think that that entirely does away with this "uncalled function drags in stack requirements" problem, as I want to say the core issue comes down to how this interacts with the inliner. In many of these cases, the call to the non-split-stack function is in some leaf function of a giant call graph that was flattened to a single massive function during the optimization pass.

The result is that if you ever interact with non-split-stack code anywhere, you really need to be quite explicit about __noinline__ to keep it from tainting the stack requirements of other functions. Part of me feels like there must be a better way of handling the stack expansions (such as by putting it at the call-site in situations like this), although I realize that might be difficult with the linker in charge of it.

A specific idea that might help, however, is to set things up so that the PLT actually handles the stack increases when you are linking to functions that are in a dynamic library. That way, calls to open (for example) would not cause the function that called it to suddenly require a large stack, but instead only as control is transferred to open would the stack size increase. (This might be quite complex, though.)


7) Using the linker to handle the transition between split-stack and non-split-stack code seems like a good way to solve the problem of "we need large stacks when hitting external code", but in staring at the resulting code I have in my project I'm seeing that it isn't reliable: if you have a pointer to a function the linker will not know what you are calling. In my case, this is coming up often due to using std::function.

More awkwardly, split-stack functions that mention (but do not call) non-split-stack functions (such as to return their address) are being mis-flagged by the linker. Honestly, I question whether the linker fundamentally has enough information about what is going on to be able to make sufficiently accurate decisions with regards to stack constraints to warrant the painful abstraction breakage that split-stack uses. :(

That said, I don't have a better solution to suggest right now (I really want to say that having attributes available to declare split-stack functionality in the code would be better, but that has other ramifications), but I do have concerns that due to attempts to keep the ABI fixed decisions made now (when there seem to only be a single major user, Go) will lock in how the mechanism is capable of functioning in the future.


Well, if you did read any of that, thanks for taking the time to do so. I really appreciate this feature you've been working on, and have been excited by it for a while now. When I ran into the aforementioned bug in the splitstack context implementation, I figured I'd send an e-mail that explained it, and I hope that I then didn't waste too much of peoples' time with my other split-stack related musings. ;P

If some of these things are in the "yeah, changing that would be interesting, we are just don't have many people working on the feature", I'd be happy to throw some patches towards it. I hesitate to just start sending patches over the wall, however, without first doing some kind of verification that I have any clue what I'm doing; I certainly am not certain how things would be prioritized, or even really who is working on it. ;P

Sincerely,
Jay Freeman (saurik)
saurik@saurik.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-03-04 20:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <74487365.8071330719690349.JavaMail.root@frigate>
2012-03-03  7:11 ` random commentary on -fsplit-stack (and a bug report) Jay Freeman (saurik)
2012-03-04 20:11   ` Ian Lance Taylor
2012-02-28  8:07 Jay Freeman (saurik)
2012-02-28 20:32 ` Ian Lance Taylor
2012-02-28 22:47   ` Jay Freeman (saurik)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).