* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
@ 2015-02-27 22:13 Manuel López-Ibáñez
2015-03-03 8:41 ` Chris Johns
0 siblings, 1 reply; 29+ messages in thread
From: Manuel López-Ibáñez @ 2015-02-27 22:13 UTC (permalink / raw)
To: gcc Mailing List, Joel Sherrill, Sandra Loosemore, Jakub Jelinek,
Chris Johns, David Malcolm, Jeff Prothero
On 02/19/15 14:56, Chris Johns wrote:
>
> My main concern is not knowing the trap has been added to the code. If I
> could build an application and audit it somehow then I can manage it. We
> have a similar issue with the possible use of FP registers being used in
> general code (ISR save/restore trade off).
>
> Can the ELF be annotated in some GCC specific way that makes it to the
> final executable to flag this is happening ? We can then create tools to
> audit the executables.
Simply ignore me if I'm misunderstanding the issue: Couldn't GCC
generate, instead of a trap, a call to a noinline noreturn cold weak
function __gcc_is_a_trap that by default calls the trap? Then, audit
tools can inspect the code and see if such a function call appears and
even override it with something else.
Chris, wouldn't that be enough for your purposes?
Cheers,
Manuel.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-27 22:13 Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference Manuel López-Ibáñez @ 2015-03-03 8:41 ` Chris Johns 0 siblings, 0 replies; 29+ messages in thread From: Chris Johns @ 2015-03-03 8:41 UTC (permalink / raw) To: Manuel López-Ibáñez, gcc Mailing List, Joel Sherrill, Sandra Loosemore, Jakub Jelinek, David Malcolm, Jeff Prothero On 28/02/2015 9:12 am, Manuel López-Ibáñez wrote: > On 02/19/15 14:56, Chris Johns wrote: >> >> My main concern is not knowing the trap has been added to the code. If I >> could build an application and audit it somehow then I can manage it. We >> have a similar issue with the possible use of FP registers being used in >> general code (ISR save/restore trade off). >> >> Can the ELF be annotated in some GCC specific way that makes it to the >> final executable to flag this is happening ? We can then create tools to >> audit the executables. > > Simply ignore me if I'm misunderstanding the issue: Couldn't GCC > generate, instead of a trap, a call to a noinline noreturn cold weak > function __gcc_is_a_trap that by default calls the trap? Then, audit > tools can inspect the code and see if such a function call appears and > even override it with something else. Yes it could and this is a nice idea. > > Chris, wouldn't that be enough for your purposes? > I think it does because we can scan an executable for the call locations and audit them. Chris ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference
@ 2015-02-20 1:04 Jeff Prothero
0 siblings, 0 replies; 29+ messages in thread
From: Jeff Prothero @ 2015-02-20 1:04 UTC (permalink / raw)
To: gcc
(Thanks to everyone for the helpful feedback!)
Daniel Gutson wrote:
> what about then two warnings (disabled by default), one intended to
> tell the user each time the compiler removes a conditional
> (-fdelete-null-pointer-checks)
> and another intended to tell the user each time the compiler adds a
> trap due to dereference an address 0?
>
> E.g.
> -Wnull-pointer-check-deleted
> -Wnull-dereference-considered-erroneous
I very much like the idea of such warnings.
I'm not clear why one would not warn by default when detecting
non-standards-conformant code and producing code guaranteed not
to do what the programmer intended. But presumably most sane
engineers these days compile with -Wall. :-)
-Jeff
^ permalink raw reply [flat|nested] 29+ messages in thread
* Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference @ 2015-02-18 19:24 Jeff Prothero 2015-02-18 19:29 ` Jakub Jelinek ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Jeff Prothero @ 2015-02-18 19:24 UTC (permalink / raw) To: gcc Starting with gcc 4.9, -O2 implicitly invokes -fisolate-erroneous-paths-dereference: which https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html documents as Detect paths that trigger erroneous or undefined behavior due to dereferencing a null pointer. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behavior into a trap. This flag is enabled by default at -O2 and higher. This results in a sizable number of previously working embedded programs mysteriously crashing when recompiled under gcc 4.9. The problem is that embedded programs will often have ram starting at address zero (think hardware-defined interrupt vectors, say) which gets initialized by code which the -fisolate-erroneous-paths-deference logic can recognize as reading and/or writing address zero. What happens then is that the previously running program compiles without any warnings, but then typically locks up mysteriously (often disabling the remote debug link) due to the trap not being gracefully handled by the embedded runtime. Granted, such code is out-of-spec wrt to C standards. None the less, the problem is quite painful to track down and unexpected. Is there any good reason the -fisolate-erroneous-paths-dereference logic could not issue a compiletime warning or error, instead of just silently generating code virtually certain to crash at runtime? Such a warning/error would save a lot of engineers significant amounts of time, energy and frustration tracking down this problem. I would like to think that the spirit of gcc is about helping engineers efficiently correct nonstandard pain, rather than inflicting maximal pain upon engineers violating C standards. :-) -Jeff BTW, I'd also be curious to know what is regarded as engineering best practice for writing a value to address zero when this is architecturally required by the hardware platform at hand. Obviously one can do various things to obscure the process sufficiently that the current gcc implementation won't detect it and complain, but as gcc gets smarter about optimization those are at risk of failing in a future release. It would be nice to have a guaranteed-to-work future-proof idiom for doing this. Do we have one, short of retreating to assembly code? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-18 19:24 Jeff Prothero @ 2015-02-18 19:29 ` Jakub Jelinek 2015-02-19 20:56 ` Sandra Loosemore 2015-02-18 19:30 ` Andrew Pinski 2015-02-20 9:30 ` Andrew Haley 2 siblings, 1 reply; 29+ messages in thread From: Jakub Jelinek @ 2015-02-18 19:29 UTC (permalink / raw) To: Jeff Prothero; +Cc: gcc On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: > Starting with gcc 4.9, -O2 implicitly invokes > > -fisolate-erroneous-paths-dereference: > > which > > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > documents as > > Detect paths that trigger erroneous or undefined behavior due to > dereferencing a null pointer. Isolate those paths from the main control > flow and turn the statement with erroneous or undefined behavior into a > trap. This flag is enabled by default at -O2 and higher. > > This results in a sizable number of previously working embedded programs mysteriously > crashing when recompiled under gcc 4.9. The problem is that embedded > programs will often have ram starting at address zero (think hardware-defined > interrupt vectors, say) which gets initialized by code which the > -fisolate-erroneous-paths-deference logic can recognize as reading and/or > writing address zero. If you have some pages mapped at address 0, you really should compile your code with -fno-delete-null-pointer-checks, otherwise you can run into tons of other issues. Also, there is -fsanitize=undefined that allows discovery of such invalid calls at runtime, though admittedly it isn't supported for all targets. Jakub ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-18 19:29 ` Jakub Jelinek @ 2015-02-19 20:56 ` Sandra Loosemore 2015-02-19 21:16 ` Daniel Gutson ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Sandra Loosemore @ 2015-02-19 20:56 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Jeff Prothero, gcc Jakub Jelinek wrote: > > On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >> Starting with gcc 4.9, -O2 implicitly invokes >> >> -fisolate-erroneous-paths-dereference: >> >> which >> >> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >> >> documents as >> >> Detect paths that trigger erroneous or undefined behavior due to >> dereferencing a null pointer. Isolate those paths from the main control >> flow and turn the statement with erroneous or undefined behavior into a >> trap. This flag is enabled by default at -O2 and higher. >> >> This results in a sizable number of previously working embedded programs mysteriously >> crashing when recompiled under gcc 4.9. The problem is that embedded >> programs will often have ram starting at address zero (think hardware-defined >> interrupt vectors, say) which gets initialized by code which the >> -fisolate-erroneous-paths-deference logic can recognize as reading and/or >> writing address zero. > > If you have some pages mapped at address 0, you really should compile your > code with -fno-delete-null-pointer-checks, otherwise you can run into tons > of other issues. Hmmmm, Passing the additional option in user code would be one thing, but what about library code? E.g., using memcpy (either explicitly or implicitly for a structure copy)? It looks to me like cr16 and avr are currently the only architectures that disable flag_delete_null_pointer_checks entirely, but I am sure that this issue affects other embedded targets besides nios2, too. E.g. scanning Mentor's ARM board support library, I see a whole pile of devices that have memory mapped at address zero (TI Stellaris/Tiva, Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ....). Plus our simulator BSPs assume a flat address space starting at address 0. I can see both sides of the issue here.... On the one hand, you get better code for 99.99% of situations by enabling -fdelete-null-pointer-checks, but if it makes GCC unusable in the other .01% case, that is a problem for the users for whom that case is critical. :-S So I think Jeff's request here is something that deserves an answer: >> BTW, I'd also be curious to know what is regarded as engineering best >> practice for writing a value to address zero when this is architecturally >> required by the hardware platform at hand. Obviously one can do various >> things to obscure the process sufficiently that the current gcc implementation >> won't detect it and complain, but as gcc gets smarter about optimization >> those are at risk of failing in a future release. It would be nice to have >> a guaranteed-to-work future-proof idiom for doing this. Do we have one, short >> of retreating to assembly code? -Sandra ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 20:56 ` Sandra Loosemore @ 2015-02-19 21:16 ` Daniel Gutson 2015-02-19 22:10 ` Jakub Jelinek 2015-02-19 21:23 ` Joel Sherrill 2015-02-20 11:06 ` Florian Weimer 2 siblings, 1 reply; 29+ messages in thread From: Daniel Gutson @ 2015-02-19 21:16 UTC (permalink / raw) To: Sandra Loosemore; +Cc: Jakub Jelinek, Jeff Prothero, gcc Mailing List (Hi Sandra, so long!) On Thu, Feb 19, 2015 at 5:56 PM, Sandra Loosemore <sandra@codesourcery.com> wrote: > Jakub Jelinek wrote: >> >> >> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >>> >>> Starting with gcc 4.9, -O2 implicitly invokes >>> >>> -fisolate-erroneous-paths-dereference: >>> >>> which >>> >>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>> >>> documents as >>> >>> Detect paths that trigger erroneous or undefined behavior due to >>> dereferencing a null pointer. Isolate those paths from the main >>> control >>> flow and turn the statement with erroneous or undefined behavior into >>> a >>> trap. This flag is enabled by default at -O2 and higher. >>> >>> This results in a sizable number of previously working embedded programs >>> mysteriously >>> crashing when recompiled under gcc 4.9. The problem is that embedded >>> programs will often have ram starting at address zero (think >>> hardware-defined >>> interrupt vectors, say) which gets initialized by code which the >>> -fisolate-erroneous-paths-deference logic can recognize as reading and/or >>> writing address zero. >> >> >> If you have some pages mapped at address 0, you really should compile your >> code with -fno-delete-null-pointer-checks, otherwise you can run into tons >> of other issues. > > > Hmmmm, Passing the additional option in user code would be one thing, but > what about library code? E.g., using memcpy (either explicitly or > implicitly for a structure copy)? > > It looks to me like cr16 and avr are currently the only architectures that > disable flag_delete_null_pointer_checks entirely, but I am sure that this > issue affects other embedded targets besides nios2, too. E.g. scanning > Mentor's ARM board support library, I see a whole pile of devices that have > memory mapped at address zero (TI Stellaris/Tiva, Energy Micro EFM32Gxxx, > Atmel AT91SAMxxx, ....). Plus our simulator BSPs assume a flat address > space starting at address 0. > > I can see both sides of the issue here.... On the one hand, you get better > code for 99.99% of situations by enabling -fdelete-null-pointer-checks, but > if it makes GCC unusable in the other .01% case, that is a problem for the > users for whom that case is critical. :-S So I think Jeff's request here > is something that deserves an answer: what about then two warnings (disabled by default), one intended to tell the user each time the compiler removes a conditional (-fdelete-null-pointer-checks) and another intended to tell the user each time the compiler adds a trap due to dereference an address 0? E.g. -Wnull-pointer-check-deleted -Wnull-dereference-considered-erroneous or alike ? Daniel. > > >>> BTW, I'd also be curious to know what is regarded as engineering best >>> practice for writing a value to address zero when this is architecturally >>> required by the hardware platform at hand. Obviously one can do various >>> things to obscure the process sufficiently that the current gcc >>> implementation >>> won't detect it and complain, but as gcc gets smarter about optimization >>> those are at risk of failing in a future release. It would be nice to >>> have >>> a guaranteed-to-work future-proof idiom for doing this. Do we have one, >>> short >>> of retreating to assembly code? > > > -Sandra > -- Daniel F. Gutson Chief Technology Officer, SPD San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211 Skype: dgutson LinkedIn: http://ar.linkedin.com/in/danielgutson ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 21:16 ` Daniel Gutson @ 2015-02-19 22:10 ` Jakub Jelinek 2015-02-19 22:26 ` Sandra Loosemore 0 siblings, 1 reply; 29+ messages in thread From: Jakub Jelinek @ 2015-02-19 22:10 UTC (permalink / raw) To: Daniel Gutson; +Cc: Sandra Loosemore, Jeff Prothero, gcc Mailing List On Thu, Feb 19, 2015 at 06:16:05PM -0300, Daniel Gutson wrote: > what about then two warnings (disabled by default), one intended to > tell the user each time the compiler removes a conditional > (-fdelete-null-pointer-checks) > and another intended to tell the user each time the compiler adds a trap due to > dereference an address 0? > > E.g. > -Wnull-pointer-check-deleted > -Wnull-dereference-considered-erroneous > > or alike That would be extremely difficult. The -fdelete-null-pointer-checks option is used in many places, like the path isolation, value range propagation, alias oracle, number of iteration analysis etc. E.g. in case of value range propagation, it is really hard to warn if something has been optimized some way because of it, because you really don't know the reason why after all the propagation some SSA_NAME got certain range, to warn you'd essentially have to do all of VRP twice, once with -fdelete-null-pointer-checks and once without, and then compare that when actually performing optimizations. But some optimizations are also done far later than directly in the VRP pass. If you have hw where NULL is mapped and you know your code violates the C/C++ standards by placing objects at that address, simply do use the option that is designed for that purpose. Jakub ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 22:10 ` Jakub Jelinek @ 2015-02-19 22:26 ` Sandra Loosemore 0 siblings, 0 replies; 29+ messages in thread From: Sandra Loosemore @ 2015-02-19 22:26 UTC (permalink / raw) To: Jakub Jelinek; +Cc: Daniel Gutson, Jeff Prothero, gcc Mailing List [-- Attachment #1: Type: text/plain, Size: 929 bytes --] On 02/19/2015 03:09 PM, Jakub Jelinek wrote: > > If you have hw where NULL is mapped and you know your code violates the > C/C++ standards by placing objects at that address, simply do use > the option that is designed for that purpose. As I pointed out earlier, though, that won't help you if your program (perhaps implicitly) uses library code that's been built without the magic option. I thought that declaring an object explicitly placed at address 0 as "volatile" ought to be sufficient to prevent writes to it from being optimized away and replaced with traps, but alas, that is not the case. Also, a structure copy to such a volatile object is implicitly invoking the regular memcpy function without complaining about casting away volatile-ness, as an explicit call would do. I've attached a couple quick test cases and .s output from arm-none-eabi-gcc -O2, using a mainline build from last night. -Sandra [-- Attachment #2: test.c --] [-- Type: text/x-csrc, Size: 261 bytes --] static int * const x0 = (int *) 0x00000000; static volatile int * const xv = (int *) 0x00000000; static int * const xn = (int *) 0x10000000; void f0 (int *data) { *x0 = *data; } void fv (int *data) { *xv = *data; } void fn (int *data) { *xn = *data; } [-- Attachment #3: test.s --] [-- Type: text/plain, Size: 1193 bytes --] .cpu arm7tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .arm .syntax divided .file "test.c" .text .align 2 .global f0 .type f0, %function f0: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #0 str r3, [r3] .inst 0xe7f000f0 .size f0, .-f0 .align 2 .global fv .type fv, %function fv: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #0 str r3, [r3] .inst 0xe7f000f0 .size fv, .-fv .align 2 .global fn .type fn, %function fn: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, #268435456 ldr r2, [r0] str r2, [r3] bx lr .size fn, .-fn .ident "GCC: (GNU) 5.0.0 20150219 (experimental)" [-- Attachment #4: test2.c --] [-- Type: text/x-csrc, Size: 370 bytes --] struct big { int a, b; char c [100]; }; static struct big * const x0 = (struct big *) 0x00000000; static volatile struct big * const xv = (struct big *) 0x00000000; static struct big * const xn = (struct big *) 0x10000000; void f0 (struct big *data) { *x0 = *data; } void fv (struct big *data) { *xv = *data; } void fn (struct big *data) { *xn = *data; } [-- Attachment #5: test2.s --] [-- Type: text/plain, Size: 1231 bytes --] .cpu arm7tdmi .fpu softvfp .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 2 .eabi_attribute 34, 0 .eabi_attribute 18, 4 .arm .syntax divided .file "test2.c" .text .align 2 .global f0 .type f0, %function f0: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 mov r1, r0 stmfd sp!, {r4, lr} mov r2, #108 mov r0, #0 bl memcpy .inst 0xe7f000f0 .size f0, .-f0 .align 2 .global fv .type fv, %function fv: @ Function supports interworking. @ Volatile: function does not return. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 mov r1, r0 stmfd sp!, {r4, lr} mov r2, #108 mov r0, #0 bl memcpy .inst 0xe7f000f0 .size fv, .-fv .align 2 .global fn .type fn, %function fn: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 stmfd sp!, {r4, lr} mov r1, r0 mov r2, #108 mov r0, #268435456 bl memcpy ldmfd sp!, {r4, lr} bx lr .size fn, .-fn .ident "GCC: (GNU) 5.0.0 20150219 (experimental)" ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 20:56 ` Sandra Loosemore 2015-02-19 21:16 ` Daniel Gutson @ 2015-02-19 21:23 ` Joel Sherrill 2015-02-19 21:56 ` Chris Johns 2015-02-20 11:06 ` Florian Weimer 2 siblings, 1 reply; 29+ messages in thread From: Joel Sherrill @ 2015-02-19 21:23 UTC (permalink / raw) To: Sandra Loosemore, Jakub Jelinek; +Cc: Jeff Prothero, gcc On 2/19/2015 2:56 PM, Sandra Loosemore wrote: > Jakub Jelinek wrote: >> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >>> Starting with gcc 4.9, -O2 implicitly invokes >>> >>> -fisolate-erroneous-paths-dereference: >>> >>> which >>> >>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>> >>> documents as >>> >>> Detect paths that trigger erroneous or undefined behavior due to >>> dereferencing a null pointer. Isolate those paths from the main control >>> flow and turn the statement with erroneous or undefined behavior into a >>> trap. This flag is enabled by default at -O2 and higher. >>> >>> This results in a sizable number of previously working embedded programs mysteriously >>> crashing when recompiled under gcc 4.9. The problem is that embedded >>> programs will often have ram starting at address zero (think hardware-defined >>> interrupt vectors, say) which gets initialized by code which the >>> -fisolate-erroneous-paths-deference logic can recognize as reading and/or >>> writing address zero. >> If you have some pages mapped at address 0, you really should compile your >> code with -fno-delete-null-pointer-checks, otherwise you can run into tons >> of other issues. > Hmmmm, Passing the additional option in user code would be one thing, > but what about library code? E.g., using memcpy (either explicitly or > implicitly for a structure copy)? > > It looks to me like cr16 and avr are currently the only architectures > that disable flag_delete_null_pointer_checks entirely, but I am sure > that this issue affects other embedded targets besides nios2, too. E.g. > scanning Mentor's ARM board support library, I see a whole pile of > devices that have memory mapped at address zero (TI Stellaris/Tiva, > Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ....). Plus our simulator > BSPs assume a flat address space starting at address 0. I forwarded this to the RTEMS list and was promptly pointed to a patch on a Coldfire BSP where someone worked around this behavior. We are discussing how to deal with this. It is likely OK in user code but horrible in BSP and driver code. We don't have a solution ourselves. We just recognize it impacts a number of targets. > I can see both sides of the issue here.... On the one hand, you get > better code for 99.99% of situations by enabling > -fdelete-null-pointer-checks, but if it makes GCC unusable in the other > .01% case, that is a problem for the users for whom that case is > critical. :-S So I think Jeff's request here is something that > deserves an answer: Which side is benefiting? The embedded side or self-hosted side? What is the actual benefit? Can the option be disabled/enabled on a per target basis? I honestly could see disabling it for *-rtems* and on purely embedded architectures. >>> BTW, I'd also be curious to know what is regarded as engineering best >>> practice for writing a value to address zero when this is architecturally >>> required by the hardware platform at hand. Obviously one can do various >>> things to obscure the process sufficiently that the current gcc implementation >>> won't detect it and complain, but as gcc gets smarter about optimization >>> those are at risk of failing in a future release. It would be nice to have >>> a guaranteed-to-work future-proof idiom for doing this. Do we have one, short >>> of retreating to assembly code? Or GCC specific code. I considered a special memcpy clone for our BSPs which was not inlined and used a pragma to turn this optimization off. Then it would just be a matter of using it in the right places. But knowing all the right places is a hard problem by itself. :( > -Sandra > -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherrill@OARcorp.com On-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 21:23 ` Joel Sherrill @ 2015-02-19 21:56 ` Chris Johns 2015-02-20 17:30 ` Jeff Law 0 siblings, 1 reply; 29+ messages in thread From: Chris Johns @ 2015-02-19 21:56 UTC (permalink / raw) To: Joel Sherrill, Sandra Loosemore, Jakub Jelinek; +Cc: Jeff Prothero, gcc On 20/02/2015 8:23 am, Joel Sherrill wrote: > > On 2/19/2015 2:56 PM, Sandra Loosemore wrote: >> Jakub Jelinek wrote: >>> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >>>> Starting with gcc 4.9, -O2 implicitly invokes >>>> >>>> -fisolate-erroneous-paths-dereference: >>>> >>>> which >>>> >>>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>>> >>>> documents as >>>> >>>> Detect paths that trigger erroneous or undefined behavior due to >>>> dereferencing a null pointer. Isolate those paths from the main control >>>> flow and turn the statement with erroneous or undefined behavior into a >>>> trap. This flag is enabled by default at -O2 and higher. >>>> >>>> This results in a sizable number of previously working embedded programs mysteriously >>>> crashing when recompiled under gcc 4.9. The problem is that embedded >>>> programs will often have ram starting at address zero (think hardware-defined >>>> interrupt vectors, say) which gets initialized by code which the >>>> -fisolate-erroneous-paths-deference logic can recognize as reading and/or >>>> writing address zero. >>> If you have some pages mapped at address 0, you really should compile your >>> code with -fno-delete-null-pointer-checks, otherwise you can run into tons >>> of other issues. >> Hmmmm, Passing the additional option in user code would be one thing, >> but what about library code? E.g., using memcpy (either explicitly or >> implicitly for a structure copy)? >> >> It looks to me like cr16 and avr are currently the only architectures >> that disable flag_delete_null_pointer_checks entirely, but I am sure >> that this issue affects other embedded targets besides nios2, too. E.g. >> scanning Mentor's ARM board support library, I see a whole pile of >> devices that have memory mapped at address zero (TI Stellaris/Tiva, >> Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ....). Plus our simulator >> BSPs assume a flat address space starting at address 0. > I forwarded this to the RTEMS list and was promptly pointed to a patch > on a Coldfire BSP where someone worked around this behavior. > > We are discussing how to deal with this. It is likely OK in user code but > horrible in BSP and driver code. We don't have a solution ourselves. We > just recognize it impacts a number of targets. > My main concern is not knowing the trap has been added to the code. If I could build an application and audit it somehow then I can manage it. We have a similar issue with the possible use of FP registers being used in general code (ISR save/restore trade off). Can the ELF be annotated in some GCC specific way that makes it to the final executable to flag this is happening ? We can then create tools to audit the executables. Chris ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 21:56 ` Chris Johns @ 2015-02-20 17:30 ` Jeff Law 2015-02-26 16:23 ` David Malcolm 0 siblings, 1 reply; 29+ messages in thread From: Jeff Law @ 2015-02-20 17:30 UTC (permalink / raw) To: Chris Johns, Joel Sherrill, Sandra Loosemore, Jakub Jelinek Cc: Jeff Prothero, gcc On 02/19/15 14:56, Chris Johns wrote: > On 20/02/2015 8:23 am, Joel Sherrill wrote: >> >> On 2/19/2015 2:56 PM, Sandra Loosemore wrote: >>> Jakub Jelinek wrote: >>>> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: >>>>> Starting with gcc 4.9, -O2 implicitly invokes >>>>> >>>>> -fisolate-erroneous-paths-dereference: >>>>> >>>>> which >>>>> >>>>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html >>>>> >>>>> documents as >>>>> >>>>> Detect paths that trigger erroneous or undefined behavior due to >>>>> dereferencing a null pointer. Isolate those paths from the >>>>> main control >>>>> flow and turn the statement with erroneous or undefined >>>>> behavior into a >>>>> trap. This flag is enabled by default at -O2 and higher. >>>>> >>>>> This results in a sizable number of previously working embedded >>>>> programs mysteriously >>>>> crashing when recompiled under gcc 4.9. The problem is that embedded >>>>> programs will often have ram starting at address zero (think >>>>> hardware-defined >>>>> interrupt vectors, say) which gets initialized by code which the >>>>> -fisolate-erroneous-paths-deference logic can recognize as reading >>>>> and/or >>>>> writing address zero. >>>> If you have some pages mapped at address 0, you really should >>>> compile your >>>> code with -fno-delete-null-pointer-checks, otherwise you can run >>>> into tons >>>> of other issues. >>> Hmmmm, Passing the additional option in user code would be one thing, >>> but what about library code? E.g., using memcpy (either explicitly or >>> implicitly for a structure copy)? >>> >>> It looks to me like cr16 and avr are currently the only architectures >>> that disable flag_delete_null_pointer_checks entirely, but I am sure >>> that this issue affects other embedded targets besides nios2, too. E.g. >>> scanning Mentor's ARM board support library, I see a whole pile of >>> devices that have memory mapped at address zero (TI Stellaris/Tiva, >>> Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ....). Plus our simulator >>> BSPs assume a flat address space starting at address 0. >> I forwarded this to the RTEMS list and was promptly pointed to a patch >> on a Coldfire BSP where someone worked around this behavior. >> >> We are discussing how to deal with this. It is likely OK in user code but >> horrible in BSP and driver code. We don't have a solution ourselves. We >> just recognize it impacts a number of targets. >> > > My main concern is not knowing the trap has been added to the code. If I > could build an application and audit it somehow then I can manage it. We > have a similar issue with the possible use of FP registers being used in > general code (ISR save/restore trade off). > > Can the ELF be annotated in some GCC specific way that makes it to the > final executable to flag this is happening ? We can then create tools to > audit the executables. Not really, for a variety of reasons. However, the compiler can do better for warning about some of these kinds of things -- but we certainly can't guarantee we catch all of them as there are cases where the point where we determine a property (such as non-nullness) may be very different from the point where we exploit that property. I did propose some patches to improve the warnings back in the 4.9 time frame, but they never got reviewed. See BZ 16351. We'll have to revisit them during the next open development period. Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 17:30 ` Jeff Law @ 2015-02-26 16:23 ` David Malcolm 0 siblings, 0 replies; 29+ messages in thread From: David Malcolm @ 2015-02-26 16:23 UTC (permalink / raw) To: Jeff Law Cc: Chris Johns, Joel Sherrill, Sandra Loosemore, Jakub Jelinek, Jeff Prothero, gcc On Fri, 2015-02-20 at 10:29 -0700, Jeff Law wrote: > On 02/19/15 14:56, Chris Johns wrote: > > On 20/02/2015 8:23 am, Joel Sherrill wrote: > >> > >> On 2/19/2015 2:56 PM, Sandra Loosemore wrote: > >>> Jakub Jelinek wrote: > >>>> On Wed, Feb 18, 2015 at 11:21:56AM -0800, Jeff Prothero wrote: > >>>>> Starting with gcc 4.9, -O2 implicitly invokes > >>>>> > >>>>> -fisolate-erroneous-paths-dereference: > >>>>> > >>>>> which > >>>>> > >>>>> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > >>>>> > >>>>> documents as > >>>>> > >>>>> Detect paths that trigger erroneous or undefined behavior due to > >>>>> dereferencing a null pointer. Isolate those paths from the > >>>>> main control > >>>>> flow and turn the statement with erroneous or undefined > >>>>> behavior into a > >>>>> trap. This flag is enabled by default at -O2 and higher. > >>>>> > >>>>> This results in a sizable number of previously working embedded > >>>>> programs mysteriously > >>>>> crashing when recompiled under gcc 4.9. The problem is that embedded > >>>>> programs will often have ram starting at address zero (think > >>>>> hardware-defined > >>>>> interrupt vectors, say) which gets initialized by code which the > >>>>> -fisolate-erroneous-paths-deference logic can recognize as reading > >>>>> and/or > >>>>> writing address zero. > >>>> If you have some pages mapped at address 0, you really should > >>>> compile your > >>>> code with -fno-delete-null-pointer-checks, otherwise you can run > >>>> into tons > >>>> of other issues. > >>> Hmmmm, Passing the additional option in user code would be one thing, > >>> but what about library code? E.g., using memcpy (either explicitly or > >>> implicitly for a structure copy)? > >>> > >>> It looks to me like cr16 and avr are currently the only architectures > >>> that disable flag_delete_null_pointer_checks entirely, but I am sure > >>> that this issue affects other embedded targets besides nios2, too. E.g. > >>> scanning Mentor's ARM board support library, I see a whole pile of > >>> devices that have memory mapped at address zero (TI Stellaris/Tiva, > >>> Energy Micro EFM32Gxxx, Atmel AT91SAMxxx, ....). Plus our simulator > >>> BSPs assume a flat address space starting at address 0. > >> I forwarded this to the RTEMS list and was promptly pointed to a patch > >> on a Coldfire BSP where someone worked around this behavior. > >> > >> We are discussing how to deal with this. It is likely OK in user code but > >> horrible in BSP and driver code. We don't have a solution ourselves. We > >> just recognize it impacts a number of targets. > >> > > > > My main concern is not knowing the trap has been added to the code. If I > > could build an application and audit it somehow then I can manage it. We > > have a similar issue with the possible use of FP registers being used in > > general code (ISR save/restore trade off). > > > > Can the ELF be annotated in some GCC specific way that makes it to the > > final executable to flag this is happening ? We can then create tools to > > audit the executables. > Not really, for a variety of reasons. Is information on this reaching the pass-specific dumpfile? I don't see any explicit dumping in gimple-ssa-isolate-paths.c, but I guess that insert_trap_and_remove_trailing_statements could log itself to the dumpfile, or use the statistics framework (which itself also reaches a dumpfile). Assuming the info is reaching a dumpfile, could gcc have an option to write its dumpfiles into a special ELF section in the .s, rather than to disk? Then (given a suitable new option to e.g. eu-readelf) you'd be able to read the dumpfiles from a .o file, and (handwaving about linkage) from an execuable or library. Not that I'm volunteering... > However, the compiler can do > better for warning about some of these kinds of things -- but we > certainly can't guarantee we catch all of them as there are cases where > the point where we determine a property (such as non-nullness) may be > very different from the point where we exploit that property. > > I did propose some patches to improve the warnings back in the 4.9 time > frame, but they never got reviewed. See BZ 16351. We'll have to > revisit them during the next open development period. > > Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-19 20:56 ` Sandra Loosemore 2015-02-19 21:16 ` Daniel Gutson 2015-02-19 21:23 ` Joel Sherrill @ 2015-02-20 11:06 ` Florian Weimer 2015-02-20 11:43 ` Jonathan Wakely ` (3 more replies) 2 siblings, 4 replies; 29+ messages in thread From: Florian Weimer @ 2015-02-20 11:06 UTC (permalink / raw) To: Sandra Loosemore, Jakub Jelinek; +Cc: Jeff Prothero, gcc On 02/19/2015 09:56 PM, Sandra Loosemore wrote: > Hmmmm, Passing the additional option in user code would be one thing, > but what about library code? E.g., using memcpy (either explicitly or > implicitly for a structure copy)? The memcpy problem isn't restricted to embedded architectures. size_t size; const unsigned char *source; std::vector<char> vec; ⦠vec.resize(size); memcpy(vec.data(), source, size); std::vector<T>::data() can return a null pointer if the vector is empty, which means that this code is invalid for empty inputs. I think the C standard is wrong here. We should extend it, as a QoI matter, and support null pointers for variable-length inputs and outputs if the size is 0. But I suspect this is still a minority view. -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:06 ` Florian Weimer @ 2015-02-20 11:43 ` Jonathan Wakely 2015-02-20 12:05 ` Florian Weimer 2015-02-20 17:01 ` Jeff Law 2015-02-20 12:11 ` Jakub Jelinek ` (2 subsequent siblings) 3 siblings, 2 replies; 29+ messages in thread From: Jonathan Wakely @ 2015-02-20 11:43 UTC (permalink / raw) To: Florian Weimer; +Cc: Sandra Loosemore, Jakub Jelinek, Jeff Prothero, gcc On 20 February 2015 at 11:06, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >> Hmmmm, Passing the additional option in user code would be one thing, >> but what about library code? E.g., using memcpy (either explicitly or >> implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector<char> vec; > … > vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector<T>::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. > > I think the C standard is wrong here. We should extend it, as a QoI > matter, and support null pointers for variable-length inputs and outputs > if the size is 0. But I suspect this is still a minority view. I'm inclined to agree. Most developers aren't aware of the preconditions on memcpy, but GCC optimizes aggressively based on those preconditions, so we have a large and potentially dangerous gap between what developers expect and what actually happens. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:43 ` Jonathan Wakely @ 2015-02-20 12:05 ` Florian Weimer 2015-02-20 17:01 ` Jeff Law 1 sibling, 0 replies; 29+ messages in thread From: Florian Weimer @ 2015-02-20 12:05 UTC (permalink / raw) To: Jonathan Wakely; +Cc: Sandra Loosemore, Jakub Jelinek, Jeff Prothero, gcc On 02/20/2015 12:43 PM, Jonathan Wakely wrote: > On 20 February 2015 at 11:06, Florian Weimer wrote: >> On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >>> Hmmmm, Passing the additional option in user code would be one thing, >>> but what about library code? E.g., using memcpy (either explicitly or >>> implicitly for a structure copy)? >> >> The memcpy problem isn't restricted to embedded architectures. >> >> size_t size; >> const unsigned char *source; >> std::vector<char> vec; >> ⦠>> vec.resize(size); >> memcpy(vec.data(), source, size); >> >> std::vector<T>::data() can return a null pointer if the vector is empty, >> which means that this code is invalid for empty inputs. >> >> I think the C standard is wrong here. We should extend it, as a QoI >> matter, and support null pointers for variable-length inputs and outputs >> if the size is 0. But I suspect this is still a minority view. > > I'm inclined to agree. > > Most developers aren't aware of the preconditions on memcpy, but GCC > optimizes aggressively based on those preconditions, so we have a > large and potentially dangerous gap between what developers expect and > what actually happens. Maybe we can add, as a compromise, an always-inline wrapper like this? void *memcpy(void *dst, const *void src, size_t size) { if (__builtin_constant_p(size > 0) && size > 0) { // Or whatever else is needed as non-null assertions. *(char *)dst; *(const char *)src; } return memcpy_real(dst, src, size); // Without non-null assertion. } Then we'll still get the non-NULL optimization for the common positive size case. -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:43 ` Jonathan Wakely 2015-02-20 12:05 ` Florian Weimer @ 2015-02-20 17:01 ` Jeff Law 2015-02-20 17:09 ` Florian Weimer 2015-02-20 18:01 ` Paul_Koning 1 sibling, 2 replies; 29+ messages in thread From: Jeff Law @ 2015-02-20 17:01 UTC (permalink / raw) To: Jonathan Wakely, Florian Weimer Cc: Sandra Loosemore, Jakub Jelinek, Jeff Prothero, gcc On 02/20/15 04:43, Jonathan Wakely wrote: > On 20 February 2015 at 11:06, Florian Weimer wrote: >> On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >>> Hmmmm, Passing the additional option in user code would be one thing, >>> but what about library code? E.g., using memcpy (either explicitly or >>> implicitly for a structure copy)? >> >> The memcpy problem isn't restricted to embedded architectures. >> >> size_t size; >> const unsigned char *source; >> std::vector<char> vec; >> ⦠>> vec.resize(size); >> memcpy(vec.data(), source, size); >> >> std::vector<T>::data() can return a null pointer if the vector is empty, >> which means that this code is invalid for empty inputs. >> >> I think the C standard is wrong here. We should extend it, as a QoI >> matter, and support null pointers for variable-length inputs and outputs >> if the size is 0. But I suspect this is still a minority view. > > I'm inclined to agree. > > Most developers aren't aware of the preconditions on memcpy, but GCC > optimizes aggressively based on those preconditions, so we have a > large and potentially dangerous gap between what developers expect and > what actually happens. But that's always true -- this isn't any different than aliasing, arithmetic overflow, etc. The standards define the contract between the compiler/library implementors and the developers. Once the contract is broken, all bets are off. jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 17:01 ` Jeff Law @ 2015-02-20 17:09 ` Florian Weimer 2015-02-20 17:32 ` Jeff Law 2015-02-20 18:01 ` Paul_Koning 1 sibling, 1 reply; 29+ messages in thread From: Florian Weimer @ 2015-02-20 17:09 UTC (permalink / raw) To: Jeff Law, Jonathan Wakely Cc: Sandra Loosemore, Jakub Jelinek, Jeff Prothero, gcc On 02/20/2015 06:01 PM, Jeff Law wrote: > But that's always true -- this isn't any different than aliasing, > arithmetic overflow, etc. The standards define the contract between the > compiler/library implementors and the developers. Once the contract is > broken, all bets are off. What I don't like about this case (std::vector<T>::data() returning nullptr vs memcpy/memcmp/qsort non-null assertions) is that it is internally non-composing in a totally non-obvious way. data() is explicitly intended to cover interoperability with these older C functions, and it fails. But you are right about overflows. I think we should give up and just enable -fwrapv by default in Fedora and downstream. This issue has been explicitly documented since 2002 at least (explicitly with security-related checks in mind), and programmers still write overflow checks which are only correct with -fwrapv, and it passes code review. I fear that's not going to change, ever. -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 17:09 ` Florian Weimer @ 2015-02-20 17:32 ` Jeff Law 0 siblings, 0 replies; 29+ messages in thread From: Jeff Law @ 2015-02-20 17:32 UTC (permalink / raw) To: Florian Weimer, Jonathan Wakely Cc: Sandra Loosemore, Jakub Jelinek, Jeff Prothero, gcc On 02/20/15 10:09, Florian Weimer wrote: > On 02/20/2015 06:01 PM, Jeff Law wrote: > >> But that's always true -- this isn't any different than aliasing, >> arithmetic overflow, etc. The standards define the contract between the >> compiler/library implementors and the developers. Once the contract is >> broken, all bets are off. > > What I don't like about this case (std::vector<T>::data() returning > nullptr vs memcpy/memcmp/qsort non-null assertions) is that it is > internally non-composing in a totally non-obvious way. data() is > explicitly intended to cover interoperability with these older C > functions, and it fails. And that's precisely why I consider this class of issues the most problematical. Jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 17:01 ` Jeff Law 2015-02-20 17:09 ` Florian Weimer @ 2015-02-20 18:01 ` Paul_Koning 1 sibling, 0 replies; 29+ messages in thread From: Paul_Koning @ 2015-02-20 18:01 UTC (permalink / raw) To: law; +Cc: gcc > On Feb 20, 2015, at 12:01 PM, Jeff Law <law@redhat.com> wrote: > > On 02/20/15 04:43, Jonathan Wakely wrote: >>> ... >> >> I'm inclined to agree. >> >> Most developers aren't aware of the preconditions on memcpy, but GCC >> optimizes aggressively based on those preconditions, so we have a >> large and potentially dangerous gap between what developers expect and >> what actually happens. > But that's always true -- this isn't any different than aliasing, arithmetic overflow, etc. The standards define the contract between the compiler/library implementors and the developers. Once the contract is broken, all bets are off. True. The unfortunate problem with C, and even more so with C++, is that the contract is so large and complex that few, if any, are skilled enough language lawyers to know what exactly it says. For that matter, the contract (the standard) is large and complex enough that it has bugs and ambiguities, so the contract is not in fact precisely defined. There’s a nice paper that drives this home: http://people.csail.mit.edu/akcheung/papers/apsys12.pdf For example, while most people know about the “no overlaps” rule of memcpy, stuff like aliasing are far more obscure. Or the exact meaning (if there is one) of “volatile”. It also doesn’t help that a large fraction of the contract is unenforced. You only find out about it when a new version of the compiler starts using a particular rule to make an optimization that suddenly breaks 10 year old code. I remember some heated debates between Linux folks and compiler builders when such things strike the Linux kernel. paul ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:06 ` Florian Weimer 2015-02-20 11:43 ` Jonathan Wakely @ 2015-02-20 12:11 ` Jakub Jelinek 2015-02-20 17:03 ` Jeff Law 2015-02-20 12:13 ` Andrew Haley 2015-02-20 17:03 ` Jeff Law 3 siblings, 1 reply; 29+ messages in thread From: Jakub Jelinek @ 2015-02-20 12:11 UTC (permalink / raw) To: Florian Weimer; +Cc: Sandra Loosemore, Jeff Prothero, gcc On Fri, Feb 20, 2015 at 12:06:28PM +0100, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: > > Hmmmm, Passing the additional option in user code would be one thing, > > but what about library code? E.g., using memcpy (either explicitly or > > implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector<char> vec; > ⦠> vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector<T>::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. > > I think the C standard is wrong here. We should extend it, as a QoI > matter, and support null pointers for variable-length inputs and outputs > if the size is 0. But I suspect this is still a minority view. I disagree. If you want a function that will have that different property, don't call it memcpy. Jakub ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 12:11 ` Jakub Jelinek @ 2015-02-20 17:03 ` Jeff Law 2015-03-03 19:57 ` Martin Sebor 0 siblings, 1 reply; 29+ messages in thread From: Jeff Law @ 2015-02-20 17:03 UTC (permalink / raw) To: Jakub Jelinek, Florian Weimer; +Cc: Sandra Loosemore, Jeff Prothero, gcc On 02/20/15 05:10, Jakub Jelinek wrote: > On Fri, Feb 20, 2015 at 12:06:28PM +0100, Florian Weimer wrote: >> On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >>> Hmmmm, Passing the additional option in user code would be one thing, >>> but what about library code? E.g., using memcpy (either explicitly or >>> implicitly for a structure copy)? >> >> The memcpy problem isn't restricted to embedded architectures. >> >> size_t size; >> const unsigned char *source; >> std::vector<char> vec; >> ⦠>> vec.resize(size); >> memcpy(vec.data(), source, size); >> >> std::vector<T>::data() can return a null pointer if the vector is empty, >> which means that this code is invalid for empty inputs. >> >> I think the C standard is wrong here. We should extend it, as a QoI >> matter, and support null pointers for variable-length inputs and outputs >> if the size is 0. But I suspect this is still a minority view. > > I disagree. If you want a function that will have that different property, > don't call it memcpy. Right. If someone wants to take it up with the Austin group, that's fine. But until/unless the Austin group blesses, I don't think we should extend as a QoI matter. jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 17:03 ` Jeff Law @ 2015-03-03 19:57 ` Martin Sebor 2015-03-03 23:38 ` Jeff Law 0 siblings, 1 reply; 29+ messages in thread From: Martin Sebor @ 2015-03-03 19:57 UTC (permalink / raw) To: Jeff Law, Jakub Jelinek, Florian Weimer Cc: Sandra Loosemore, Jeff Prothero, gcc On 02/20/2015 10:01 AM, Jeff Law wrote: > On 02/20/15 05:10, Jakub Jelinek wrote: >> On Fri, Feb 20, 2015 at 12:06:28PM +0100, Florian Weimer wrote: >>> On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >>>> Hmmmm, Passing the additional option in user code would be one thing, >>>> but what about library code? E.g., using memcpy (either explicitly or >>>> implicitly for a structure copy)? >>> >>> The memcpy problem isn't restricted to embedded architectures. >>> >>> size_t size; >>> const unsigned char *source; >>> std::vector<char> vec; >>> ⦠>>> vec.resize(size); >>> memcpy(vec.data(), source, size); >>> >>> std::vector<T>::data() can return a null pointer if the vector is empty, >>> which means that this code is invalid for empty inputs. >>> >>> I think the C standard is wrong here. We should extend it, as a QoI >>> matter, and support null pointers for variable-length inputs and outputs >>> if the size is 0. But I suspect this is still a minority view. >> >> I disagree. If you want a function that will have that different >> property, >> don't call it memcpy. > Right. If someone wants to take it up with the Austin group, that's > fine. But until/unless the Austin group blesses, I don't think we should > extend as a QoI matter. As a data point(*) it might be interesting to note that GCC itself relies on memcpy providing stronger guarantees than the C standard requires it to by emitting calls to the function for large structure self-assignments (which are strictly conforming, as discussed in bug 65029). Martin [*] IMO, one in favor of tightening up the memcpy specification to require implementations to provide the expected semantics. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-03-03 19:57 ` Martin Sebor @ 2015-03-03 23:38 ` Jeff Law 2015-03-04 12:36 ` Richard Biener 0 siblings, 1 reply; 29+ messages in thread From: Jeff Law @ 2015-03-03 23:38 UTC (permalink / raw) To: Martin Sebor, Jakub Jelinek, Florian Weimer Cc: Sandra Loosemore, Jeff Prothero, gcc On 03/03/15 12:57, Martin Sebor wrote: > > As a data point(*) it might be interesting to note that GCC itself > relies on memcpy providing stronger guarantees than the C standard > requires it to by emitting calls to the function for large structure > self-assignments (which are strictly conforming, as discussed in bug > 65029). Right. I actually spent quite a bit of time struggling with this a while back in a different context. The only case I could come up with where GCC would generate an overlapping memcpy was self assignment, but even that was bad and while we ultimately punted, I've always considered it a wart. [*] IMO, one in favor of tightening up the memcpy specification > to require implementations to provide the expected semantics. That works for me :-) The things done in glibc's memcpy are a bit on the absurd side and the pain caused by the changes over time is almost impossible to overstate. If the Austin group tightens memcpy to require fewer surprises I think most developers would ultimately be happy with the result -- a few would complain about the performance impacts for specific workloads, but I suspect they'd be in the minority. jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-03-03 23:38 ` Jeff Law @ 2015-03-04 12:36 ` Richard Biener 0 siblings, 0 replies; 29+ messages in thread From: Richard Biener @ 2015-03-04 12:36 UTC (permalink / raw) To: Jeff Law Cc: Martin Sebor, Jakub Jelinek, Florian Weimer, Sandra Loosemore, Jeff Prothero, GCC Development On Wed, Mar 4, 2015 at 12:38 AM, Jeff Law <law@redhat.com> wrote: > On 03/03/15 12:57, Martin Sebor wrote: >> >> >> As a data point(*) it might be interesting to note that GCC itself >> relies on memcpy providing stronger guarantees than the C standard >> requires it to by emitting calls to the function for large structure >> self-assignments (which are strictly conforming, as discussed in bug >> 65029). > > Right. I actually spent quite a bit of time struggling with this a while > back in a different context. The only case I could come up with where GCC > would generate an overlapping memcpy was self assignment, but even that was > bad and while we ultimately punted, I've always considered it a wart. ? struct A { int large[100]; }; void foo (struct A *x, struct A *y) { *x = *y; } call it as foo (&a, &a); (on x86 you need -mstringop-strategy=libcall, even at -O0, to emit a memcpy call) The self-assignment doesn't have to be visible to the compiler - so to fix this we'd have to assume pointer equality everywhere and either emit a conditional call to memcpy or always emit a call to memmove. Richard. > > [*] IMO, one in favor of tightening up the memcpy specification >> >> to require implementations to provide the expected semantics. > > That works for me :-) > > The things done in glibc's memcpy are a bit on the absurd side and the pain > caused by the changes over time is almost impossible to overstate. If the > Austin group tightens memcpy to require fewer surprises I think most > developers would ultimately be happy with the result -- a few would complain > about the performance impacts for specific workloads, but I suspect they'd > be in the minority. > > > jeff > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:06 ` Florian Weimer 2015-02-20 11:43 ` Jonathan Wakely 2015-02-20 12:11 ` Jakub Jelinek @ 2015-02-20 12:13 ` Andrew Haley 2015-02-20 17:03 ` Jeff Law 3 siblings, 0 replies; 29+ messages in thread From: Andrew Haley @ 2015-02-20 12:13 UTC (permalink / raw) To: Florian Weimer, Sandra Loosemore, Jakub Jelinek; +Cc: Jeff Prothero, gcc On 02/20/2015 11:06 AM, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >> Hmmmm, Passing the additional option in user code would be one thing, >> but what about library code? E.g., using memcpy (either explicitly or >> implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector<char> vec; > ⦠> vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector<T>::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. Sure, but if that's a bug then it's a bug in the definition of memcpy(), not in the definition of the properties of a null pointer. If the size is zero then it really shouldn't matter if the destination address is null. Andrew. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:06 ` Florian Weimer ` (2 preceding siblings ...) 2015-02-20 12:13 ` Andrew Haley @ 2015-02-20 17:03 ` Jeff Law 3 siblings, 0 replies; 29+ messages in thread From: Jeff Law @ 2015-02-20 17:03 UTC (permalink / raw) To: Florian Weimer, Sandra Loosemore, Jakub Jelinek; +Cc: Jeff Prothero, gcc On 02/20/15 04:06, Florian Weimer wrote: > On 02/19/2015 09:56 PM, Sandra Loosemore wrote: >> Hmmmm, Passing the additional option in user code would be one thing, >> but what about library code? E.g., using memcpy (either explicitly or >> implicitly for a structure copy)? > > The memcpy problem isn't restricted to embedded architectures. > > size_t size; > const unsigned char *source; > std::vector<char> vec; > ⦠> vec.resize(size); > memcpy(vec.data(), source, size); > > std::vector<T>::data() can return a null pointer if the vector is empty, > which means that this code is invalid for empty inputs. > > I think the C standard is wrong here. We should extend it, as a QoI > matter, and support null pointers for variable-length inputs and outputs > if the size is 0. But I suspect this is still a minority view. And it's these kind of uses that scare me the most. jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-18 19:24 Jeff Prothero 2015-02-18 19:29 ` Jakub Jelinek @ 2015-02-18 19:30 ` Andrew Pinski 2015-02-20 9:30 ` Andrew Haley 2 siblings, 0 replies; 29+ messages in thread From: Andrew Pinski @ 2015-02-18 19:30 UTC (permalink / raw) To: Jeff Prothero; +Cc: GCC Mailing List On Wed, Feb 18, 2015 at 11:21 AM, Jeff Prothero <jprother@altera.com> wrote: > > Starting with gcc 4.9, -O2 implicitly invokes > > -fisolate-erroneous-paths-dereference: > > which > > https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html > > documents as > > Detect paths that trigger erroneous or undefined behavior due to > dereferencing a null pointer. Isolate those paths from the main control > flow and turn the statement with erroneous or undefined behavior into a > trap. This flag is enabled by default at -O2 and higher. > > This results in a sizable number of previously working embedded programs mysteriously > crashing when recompiled under gcc 4.9. The problem is that embedded > programs will often have ram starting at address zero (think hardware-defined > interrupt vectors, say) which gets initialized by code which the > -fisolate-erroneous-paths-deference logic can recognize as reading and/or > writing address zero. You should have used -fno-delete-null-pointer-checks which has been doing this optimization for a long time now, just it got better with -fisolate-erroneous-paths-dereference pass. Thanks, Andrew Pinski > > What happens then is that the previously running program compiles without > any warnings, but then typically locks up mysteriously (often disabling the > remote debug link) due to the trap not being gracefully handled by the > embedded runtime. > > Granted, such code is out-of-spec wrt to C standards. > > None the less, the problem is quite painful to track down and > unexpected. > > Is there any good reason the > > -fisolate-erroneous-paths-dereference > > logic could not issue a compiletime warning or error, instead of just > silently generating code virtually certain to crash at runtime? > > Such a warning/error would save a lot of engineers significant amounts > of time, energy and frustration tracking down this problem. > > I would like to think that the spirit of gcc is about helping engineers > efficiently correct nonstandard pain, rather than inflicting maximal > pain upon engineers violating C standards. :-) > > -Jeff > > BTW, I'd also be curious to know what is regarded as engineering best > practice for writing a value to address zero when this is architecturally > required by the hardware platform at hand. Obviously one can do various > things to obscure the process sufficiently that the current gcc implementation > won't detect it and complain, but as gcc gets smarter about optimization > those are at risk of failing in a future release. It would be nice to have > a guaranteed-to-work future-proof idiom for doing this. Do we have one, short > of retreating to assembly code? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-18 19:24 Jeff Prothero 2015-02-18 19:29 ` Jakub Jelinek 2015-02-18 19:30 ` Andrew Pinski @ 2015-02-20 9:30 ` Andrew Haley 2015-02-20 11:45 ` Florian Weimer 2 siblings, 1 reply; 29+ messages in thread From: Andrew Haley @ 2015-02-20 9:30 UTC (permalink / raw) To: Jeff Prothero, gcc On 18/02/15 19:21, Jeff Prothero wrote: > BTW, I'd also be curious to know what is regarded as engineering > best practice for writing a value to address zero when this is > architecturally required by the hardware platform at hand. > Obviously one can do various things to obscure the process > sufficiently that the current gcc implementation won't detect it and > complain, but as gcc gets smarter about optimization those are at > risk of failing in a future release. It would be nice to have a > guaranteed-to-work future-proof idiom for doing this. Do we have > one, short of retreating to assembly code? I doubt that such a thing is ever going to be safe. The idea that a null pointer points to nothing is so hard-baked into the design of C that you can't get away from it. Also, almost every C programmer and especially library writer "knows" that a null pointer points to nothing. The only way to initialize an interrupt vector table at address zero is to go outside the language. While it may be possible to use some proprietary GCC options to get around this, it's never going to be a good idea because some GCC author (or indeed library author) may make the "mistake" of assuming that null pointer point to nothing. Using C to initialize anything at address zero is so dangerous that we shouldn't tell people that they can use such-and-such a GCC option to support it; we should warn them never to do it, and provide as many warnings as we can. IMO, engineering best practice is to use assembly code. Andrew. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 9:30 ` Andrew Haley @ 2015-02-20 11:45 ` Florian Weimer 2015-02-20 17:01 ` Jeff Law 0 siblings, 1 reply; 29+ messages in thread From: Florian Weimer @ 2015-02-20 11:45 UTC (permalink / raw) To: Andrew Haley, Jeff Prothero, gcc On 02/20/2015 10:30 AM, Andrew Haley wrote: > I doubt that such a thing is ever going to be safe. The idea that a > null pointer points to nothing is so hard-baked into the design of C > that you can't get away from it. Also, almost every C programmer and > especially library writer "knows" that a null pointer points to > nothing. NULL pointer dereferences (or NULL pointer with small offsets) were common programming idioms in the DOS days because the interrupt vector table was located at this address. Quite a few systems once had a readable page zero, and (manual, I assume) optimizations for list traversal (p != NULL && p->next != NULL â p->next != NULL) were commonly used on these systems. I think the treatment of pointers not as addresses, but something that has type information and provenience associated with it, came much later, when most of the design was already settled. -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 11:45 ` Florian Weimer @ 2015-02-20 17:01 ` Jeff Law 2015-02-20 18:07 ` Paul_Koning 0 siblings, 1 reply; 29+ messages in thread From: Jeff Law @ 2015-02-20 17:01 UTC (permalink / raw) To: Florian Weimer, Andrew Haley, Jeff Prothero, gcc On 02/20/15 04:45, Florian Weimer wrote: > On 02/20/2015 10:30 AM, Andrew Haley wrote: >> I doubt that such a thing is ever going to be safe. The idea that a >> null pointer points to nothing is so hard-baked into the design of C >> that you can't get away from it. Also, almost every C programmer and >> especially library writer "knows" that a null pointer points to >> nothing. > > NULL pointer dereferences (or NULL pointer with small offsets) were > common programming idioms in the DOS days because the interrupt vector > table was located at this address. Quite a few systems once had a > readable page zero, and (manual, I assume) optimizations for list > traversal (p != NULL && p->next != NULL â p->next != NULL) were commonly > used on these systems. True, but thankfully this isn't blessed anymore. > > I think the treatment of pointers not as addresses, but something that > has type information and provenience associated with it, came much > later, when most of the design was already settled. We still have targets where page0 is special. The H8 for example comes to mind. Folks regularly place data into page0 and mark it as special so the compiler emits more efficient sequences to access that data. Regardless, the right thing to do is to disable elimination of NULL pointer checks on targets where page 0 is mapped and thus a reference to *0 may not fault. In my mind this is an attribute of both the processor (see H8 above) and/or the target OS. On those targets the C-runtime had better also ensure that its headers aren't decorated with non-null attributes, particularly for the mem* and str* functions. jeff ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference 2015-02-20 17:01 ` Jeff Law @ 2015-02-20 18:07 ` Paul_Koning 0 siblings, 0 replies; 29+ messages in thread From: Paul_Koning @ 2015-02-20 18:07 UTC (permalink / raw) To: law; +Cc: fweimer, aph, jprother, gcc > On Feb 20, 2015, at 12:01 PM, Jeff Law <law@redhat.com> wrote: > > ... > Regardless, the right thing to do is to disable elimination of NULL pointer checks on targets where page 0 is mapped and thus a reference to *0 may not fault. In my mind this is an attribute of both the processor (see H8 above) and/or the target OS. > > On those targets the C-runtime had better also ensure that its headers aren't decorated with non-null attributes, particularly for the mem* and str* functions. pdp11 is an example of such a target, independent of OS (with only 8 pages, clearly no one is going to unmap page 0). Fortunately there one is unlikely to find C program data structures at 0 (instead, vectors if kernel, stack limit if user mode). So no fault is correct there, but no null (0 bits) pointer is also — for practical purposes — a valid assumption. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2015-03-04 12:36 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-02-27 22:13 Obscure crashes due to gcc 4.9 -O2 => -fisolate-erroneous-paths-dereference Manuel López-Ibáñez 2015-03-03 8:41 ` Chris Johns -- strict thread matches above, loose matches on Subject: below -- 2015-02-20 1:04 Jeff Prothero 2015-02-18 19:24 Jeff Prothero 2015-02-18 19:29 ` Jakub Jelinek 2015-02-19 20:56 ` Sandra Loosemore 2015-02-19 21:16 ` Daniel Gutson 2015-02-19 22:10 ` Jakub Jelinek 2015-02-19 22:26 ` Sandra Loosemore 2015-02-19 21:23 ` Joel Sherrill 2015-02-19 21:56 ` Chris Johns 2015-02-20 17:30 ` Jeff Law 2015-02-26 16:23 ` David Malcolm 2015-02-20 11:06 ` Florian Weimer 2015-02-20 11:43 ` Jonathan Wakely 2015-02-20 12:05 ` Florian Weimer 2015-02-20 17:01 ` Jeff Law 2015-02-20 17:09 ` Florian Weimer 2015-02-20 17:32 ` Jeff Law 2015-02-20 18:01 ` Paul_Koning 2015-02-20 12:11 ` Jakub Jelinek 2015-02-20 17:03 ` Jeff Law 2015-03-03 19:57 ` Martin Sebor 2015-03-03 23:38 ` Jeff Law 2015-03-04 12:36 ` Richard Biener 2015-02-20 12:13 ` Andrew Haley 2015-02-20 17:03 ` Jeff Law 2015-02-18 19:30 ` Andrew Pinski 2015-02-20 9:30 ` Andrew Haley 2015-02-20 11:45 ` Florian Weimer 2015-02-20 17:01 ` Jeff Law 2015-02-20 18:07 ` Paul_Koning
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).