public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/54231] New: LTO generates code for the wrong CPU if different options used
@ 2012-08-11 22:29 thiago at kde dot org
  2012-08-11 22:31 ` [Bug c/54231] " thiago at kde dot org
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-11 22:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

             Bug #: 54231
           Summary: LTO generates code for the wrong CPU if different
                    options used
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: thiago@kde.org


Created attachment 27992
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27992
Makefile

Summary:

Given the following code:

=====
#include <immintrin.h>

void BZERO(char *ptr, size_t count)
{
    __m128i zero = _mm_set1_epi8(0);
    while (count--) {
        _mm_stream_si128((__m128i*)ptr, zero);
        ptr += 16;
    }
}
=====

When compiled twice, once for SSE2 and once for AVX (so we get VEX-prefixed
code), under LTO gcc will generate both cases using VEX. See the attached
Makefile.

Long description:

A library or program that attempts to determine at runtime whether certain CPU
features, like AVX support, may need to compile different compilation units
with different compiler flags. In the example I am providing, a simple function
that zeroes out a segment of memory aligned to 16 bytes. It's provided by the
same compilation unit which is compiled twice, but that does not seem to be
relevant.

The idea is that each of these two functions would be called by a dispatcher
function, after verifying the result of CPUID.

However, if you compile the code with LTO (e.g., by make CFLAGS=-flto with the
attached Makefile), GCC will apply the highest CPU setting to all compilation
units. This defeats the runtime detection technique: in this example, both
functions will contain AVX code, which would end up being run on computers
without AVX support.

This might be intentional. If so, please close this bug report.

However, I would recommend that the behaviour be fixed: the ability to use LTO
with different CPU settings would allow for better inlining of the functions
and suppressing unnecessary function calls. The bzero example is a good one.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
@ 2012-08-11 22:31 ` thiago at kde dot org
  2012-08-11 22:33 ` thiago at kde dot org
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-11 22:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #1 from Thiago Macieira <thiago at kde dot org> 2012-08-11 22:30:50 UTC ---
Created attachment 27993
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27993
main.c


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
  2012-08-11 22:31 ` [Bug c/54231] " thiago at kde dot org
@ 2012-08-11 22:33 ` thiago at kde dot org
  2012-08-11 22:36 ` thiago at kde dot org
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-11 22:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #2 from Thiago Macieira <thiago at kde dot org> 2012-08-11 22:33:31 UTC ---
When adding the following source file to the library build:
====
#include <stdlib.h>
void bzero_sse2(char *, size_t);
void bzero_avx(char *, size_t);

extern int avx_supported;

void my_bzero(char *ptr, size_t n)
{
    if (avx_supported)
        bzero_avx(ptr, n);
    else
        bzero_sse2(ptr, n);
}
====

and compiling everything with -O2 -flto, GCC produces the following function:

00000000000002e0 <my_bzero>:
 2e0:   mov    0x200171(%rip),%rax        # 200458 <my_bzero+0x200178>
 2e7:   mov    (%rax),%eax
 2e9:   test   %eax,%eax
 2eb:   jne    310 <my_bzero+0x30>
 2ed:   test   %rsi,%rsi
 2f0:   vpxor  %xmm0,%xmm0,%xmm0
 2f4:   je     30e <my_bzero+0x2e>
 2f6:   nopw   %cs:0x0(%rax,%rax,1)
 300:   vmovntdq %xmm0,(%rdi)
 304:   add    $0x10,%rdi
 308:   sub    $0x1,%rsi
 30c:   jne    300 <my_bzero+0x20>
 30e:   repz retq 
 310:   test   %rsi,%rsi
 313:   je     30e <my_bzero+0x2e>
 315:   vpxor  %xmm0,%xmm0,%xmm0
 319:   nopl   0x0(%rax)
 320:   vmovntdq %xmm0,(%rdi)
 324:   add    $0x10,%rdi
 328:   sub    $0x1,%rsi
 32c:   jne    320 <my_bzero+0x40>
 32e:   repz retq 

As can be seen, VEX-prefixed instructions were used in both cases.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug c/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
  2012-08-11 22:31 ` [Bug c/54231] " thiago at kde dot org
  2012-08-11 22:33 ` thiago at kde dot org
@ 2012-08-11 22:36 ` thiago at kde dot org
  2012-08-11 22:40 ` [Bug lto/54231] " pinskia at gcc dot gnu.org
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-11 22:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #3 from Thiago Macieira <thiago at kde dot org> 2012-08-11 22:36:20 UTC ---
Another note: it appears the Intel compiler has the same bug. It produces the
following code when compiling with -O2 -ipo:


0000000000000340 <my_bzero>:
 340:   dec    %rsi
 343:   mov    0x2001ae(%rip),%rax        # 2004f8 <_DYNAMIC+0xe0>
 34a:   vpxor  %xmm0,%xmm0,%xmm0
 34e:   cmpl   $0x0,(%rax)
 351:   je     36c <my_bzero+0x2c>
 353:   cmp    $0xffffffffffffffff,%rsi
 357:   je     383 <my_bzero+0x43>
 359:   dec    %rsi
 35c:   vmovntdq %xmm0,(%rdi)
 360:   add    $0x10,%rdi
 364:   cmp    $0xffffffffffffffff,%rsi
 368:   jne    359 <my_bzero+0x19>
 36a:   jmp    383 <my_bzero+0x43>
 36c:   cmp    $0xffffffffffffffff,%rsi
 370:   je     383 <my_bzero+0x43>
 372:   dec    %rsi
 375:   vmovntdq %xmm0,(%rdi)
 379:   add    $0x10,%rdi
 37d:   cmp    $0xffffffffffffffff,%rsi
 381:   jne    372 <my_bzero+0x32>
 383:   retq   
 384:   nopl   0x0(%rax,%rax,1)
 389:   nopl   0x0(%rax)

Note, additionally, that there's an instruction-scheduling issue: a VPXOR
instruction was scheduled to before the test of the CPU features.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (2 preceding siblings ...)
  2012-08-11 22:36 ` thiago at kde dot org
@ 2012-08-11 22:40 ` pinskia at gcc dot gnu.org
  2012-08-11 22:46 ` steven at gcc dot gnu.org
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-08-11 22:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |lto
           Severity|normal                      |enhancement

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-08-11 22:39:48 UTC ---
Basically the target attribute should come into play but that is currently not
really supported even without LTO.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (3 preceding siblings ...)
  2012-08-11 22:40 ` [Bug lto/54231] " pinskia at gcc dot gnu.org
@ 2012-08-11 22:46 ` steven at gcc dot gnu.org
  2012-08-11 23:23 ` thiago at kde dot org
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-11 22:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #5 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-11 22:46:31 UTC ---
"Fixing" this in the compiler isn't straight-forward. The _mm_stream functions
are just wrappers around builtin functions. It may work correctly if you put
the bzero functions in two separate files or call the builtins directly (a
variant of __builtin_ia32_movntdq in this case), but the way your BZERO is
defined, I don't think it will ever work.

Have you considered using ifunc?


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (4 preceding siblings ...)
  2012-08-11 22:46 ` steven at gcc dot gnu.org
@ 2012-08-11 23:23 ` thiago at kde dot org
  2012-08-12  0:28 ` steven at gcc dot gnu.org
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-11 23:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #6 from Thiago Macieira <thiago at kde dot org> 2012-08-11 23:23:39 UTC ---
(In reply to comment #5)
> "Fixing" this in the compiler isn't straight-forward. The _mm_stream functions
> are just wrappers around builtin functions. It may work correctly if you put
> the bzero functions in two separate files or call the builtins directly (a
> variant of __builtin_ia32_movntdq in this case), but the way your BZERO is
> defined, I don't think it will ever work.

They *are* in separate files already. Calling the builtin directly instead of
the intrinsic wrapper might work, but I did not test it because it's not
acceptable, as the code would be GCC-specific.

> Have you considered using ifunc?

IFUNC is also irrelevant: in order to use it, I need to have two separate
source files which are compiled with different compiler settings, so we end up
where we started: the bzero_sse2() function will have AVX code.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (5 preceding siblings ...)
  2012-08-11 23:23 ` thiago at kde dot org
@ 2012-08-12  0:28 ` steven at gcc dot gnu.org
  2012-08-13  8:59 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: steven at gcc dot gnu.org @ 2012-08-12  0:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-08-12
                 CC|                            |uros at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #7 from Steven Bosscher <steven at gcc dot gnu.org> 2012-08-12 00:27:46 UTC ---
Actually, using the builtins also doesn't work. The instruction patterns are
the same and GCC recog's the "best" available one. E.g.:

#(insn:TI 14 12 27 3 (set (reg:V2DI 21 xmm0 [66])
#        (const_vector:V2DI [
#                (const_int 0 [0])
#                (const_int 0 [0])
#            ])) /home/stevenb/devel/build-test/gcc/include/emmintrin.h:1424
1111 {*avx_movv2di_internal}
#     (expr_list:REG_EQUIV (const_vector:V2DI [
#                (const_int 0 [0])
#                (const_int 0 [0])
#            ])
#        (nil)))
        vpxor   %xmm0, %xmm0, %xmm0     # 14    *avx_movv2di_internal/1 [length
= 4]

vs.

#(insn:TI 14 12 27 3 (set (reg:V2DI 21 xmm0 [66])
#        (const_vector:V2DI [
#                (const_int 0 [0])
#                (const_int 0 [0])
#            ])) /home/stevenb/devel/build-test/gcc/include/emmintrin.h:1424
1124 {*movv2di_internal}
#     (expr_list:REG_EQUIV (const_vector:V2DI [
#                (const_int 0 [0])
#                (const_int 0 [0])
#            ])
#        (nil)))
        pxor    %xmm0, %xmm0    # 14    *movv2di_internal/1     [length = 4]

These insns just look the same to GCC, so even if the sse2 builtin expander is
used, the AVX instruction is selected.

Thus a bug, confirmed. Adding i386 guy to CC.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (6 preceding siblings ...)
  2012-08-12  0:28 ` steven at gcc dot gnu.org
@ 2012-08-13  8:59 ` rguenth at gcc dot gnu.org
  2012-08-13  9:45 ` thiago at kde dot org
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-13  8:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-13 08:59:18 UTC ---
If you do something like

 gcc -c t1.c -mavx -flto
 gcc -c t2.c -msse2 -flto
 gcc t1.o t2.o -flto

then the link step will use -mavx -msse2, that is, target options are
concatenated.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (7 preceding siblings ...)
  2012-08-13  8:59 ` rguenth at gcc dot gnu.org
@ 2012-08-13  9:45 ` thiago at kde dot org
  2012-08-13  9:53 ` thiago at kde dot org
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-13  9:45 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #9 from Thiago Macieira <thiago at kde dot org> 2012-08-13 09:44:51 UTC ---
(In reply to comment #8)
> If you do something like
> 
>  gcc -c t1.c -mavx -flto
>  gcc -c t2.c -msse2 -flto
>  gcc t1.o t2.o -flto
> 
> then the link step will use -mavx -msse2, that is, target options are
> concatenated.

Indeed.

What I'm asking for is that each source file be compiled with its own target
options. I realise this is a request for enhancement, though.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (8 preceding siblings ...)
  2012-08-13  9:45 ` thiago at kde dot org
@ 2012-08-13  9:53 ` thiago at kde dot org
  2012-08-13 10:13 ` thiago at kde dot org
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-13  9:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #10 from Thiago Macieira <thiago at kde dot org> 2012-08-13 09:53:32 UTC ---
Another test:

$ cat main_avx.c
#define BZERO bzero_avx
#pragma GCC target ("avx")
#include "main.c"

$ cat main_sse2.c
#define BZERO bzero_sse2
#pragma GCC target ("sse2")
#include "main.c"

$ cat main.c
#include <immintrin.h>

void BZERO(char *ptr, size_t count)
{
    __m128i zero = _mm_set1_epi8(0);
    while (count--) {
        _mm_stream_si128((__m128i*)ptr, zero);
        ptr += 16;
    }
}

$ gcc -flto -O2 -shared -o libtest.so main_avx.c main_sse2.c
$ objdump -Cdr --no-show-raw-insn libtest.so
[...]

0000000000000650 <bzero_sse2>:
 650:   test   %rsi,%rsi
 653:   pxor   %xmm0,%xmm0
 657:   je     66e <bzero_sse2+0x1e>
 659:   nopl   0x0(%rax)
 660:   movntdq %xmm0,(%rdi)
 664:   add    $0x10,%rdi
 668:   sub    $0x1,%rsi
 66c:   jne    660 <bzero_sse2+0x10>
 66e:   repz retq 

0000000000000670 <bzero_avx>:
 670:   test   %rsi,%rsi
 673:   pxor   %xmm0,%xmm0
 677:   je     68e <bzero_avx+0x1e>
 679:   nopl   0x0(%rax)
 680:   movntdq %xmm0,(%rdi)
 684:   add    $0x10,%rdi
 688:   sub    $0x1,%rsi
 68c:   jne    680 <bzero_avx+0x10>
 68e:   repz retq


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (9 preceding siblings ...)
  2012-08-13  9:53 ` thiago at kde dot org
@ 2012-08-13 10:13 ` thiago at kde dot org
  2012-08-13 11:58 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-13 10:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #11 from Thiago Macieira <thiago at kde dot org> 2012-08-13 10:12:48 UTC ---
Attaching __attribute__((target("xxx"))) to the function does help.

It generates the following with the my_bzero function from comment 2:

00000000000002e0 <bzero_avx.2362>:
 2e0:   test   %rsi,%rsi
 2e3:   vpxor  %xmm0,%xmm0,%xmm0
 2e7:   je     2fe <bzero_avx.2362+0x1e>
 2e9:   nopl   0x0(%rax)
 2f0:   vmovntdq %xmm0,(%rdi)
 2f4:   add    $0x10,%rdi
 2f8:   sub    $0x1,%rsi
 2fc:   jne    2f0 <bzero_avx.2362+0x10>
 2fe:   repz retq 

0000000000000300 <my_bzero>:
 300:   mov    0x200171(%rip),%rax        # 200478 <my_bzero+0x200178>
 307:   mov    (%rax),%eax
 309:   test   %eax,%eax
 30b:   jne    330 <my_bzero+0x30>
 30d:   test   %rsi,%rsi
 310:   pxor   %xmm0,%xmm0
 314:   je     332 <my_bzero+0x32>
 316:   nopw   %cs:0x0(%rax,%rax,1)
 320:   movntdq %xmm0,(%rdi)
 324:   add    $0x10,%rdi
 328:   sub    $0x1,%rsi
 32c:   jne    320 <my_bzero+0x20>
 32e:   repz retq 
 330:   jmp    2e0 <bzero_avx.2362>
 332:   repz retq 


This workaround might be useful for me in a few places where the code inlining
provided by LTO was desired (even though, in this example, the AVX variant is
exactly what it would be if no LTO had been used). But it won't work without
major changes to the code if I have 400+ functions in a file, plus possibly
inlines from headers, to be compiled.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (10 preceding siblings ...)
  2012-08-13 10:13 ` thiago at kde dot org
@ 2012-08-13 11:58 ` rguenth at gcc dot gnu.org
  2012-08-13 12:13 ` thiago at kde dot org
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-08-13 11:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #12 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-08-13 11:58:33 UTC ---
(In reply to comment #9)
> (In reply to comment #8)
> > If you do something like
> > 
> >  gcc -c t1.c -mavx -flto
> >  gcc -c t2.c -msse2 -flto
> >  gcc t1.o t2.o -flto
> > 
> > then the link step will use -mavx -msse2, that is, target options are
> > concatenated.
> 
> Indeed.
> 
> What I'm asking for is that each source file be compiled with its own target
> options. I realise this is a request for enhancement, though.

Yes, there are similar option-related bugs for this.  Note somebody needs
to sit down and document the desired semantics of combining translation
units T1 and T2, compiled with different options OP1 and OP2, at link-time with
options OP3.  Desired semantics including which cross-file optimizations
(inlining?) are possible.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (11 preceding siblings ...)
  2012-08-13 11:58 ` rguenth at gcc dot gnu.org
@ 2012-08-13 12:13 ` thiago at kde dot org
  2012-09-12 13:02 ` thiago at kde dot org
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-08-13 12:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #13 from Thiago Macieira <thiago at kde dot org> 2012-08-13 12:13:40 UTC ---
(In reply to comment #12)
> Yes, there are similar option-related bugs for this.  Note somebody needs
> to sit down and document the desired semantics of combining translation
> units T1 and T2, compiled with different options OP1 and OP2, at link-time with
> options OP3.  Desired semantics including which cross-file optimizations
> (inlining?) are possible.

>From my (admittedly restrict) point of view, inlining should be possible,
provided the following conditions:
 - when inlining a function with a "lower" optimisation / target setting, apply
the outer scope's setting to the inlined code
 - when inlining a function with a higher target requirement, inlining should
be done only in the sense of partial function splitting, prologue, epilogues,
constant propagation, etc.

In the case that I pasted, for example, I'd like GCC to realise that it has
already tested if the counter variable is 0, then forego that test in the
inlined, inner function.

Worst case scenario, simply forego inlining completely. Then the code would
simply be no worse than the non-LTO case.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (12 preceding siblings ...)
  2012-08-13 12:13 ` thiago at kde dot org
@ 2012-09-12 13:02 ` thiago at kde dot org
  2021-09-15  8:09 ` pinskia at gcc dot gnu.org
  2021-09-15  8:29 ` rguenth at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: thiago at kde dot org @ 2012-09-12 13:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #14 from Thiago Macieira <thiago at kde dot org> 2012-09-12 13:02:23 UTC ---
>From GCC's own manual:

(Node "Function attributes"):

     On the 386/x86_64 and PowerPC backends, the inliner will not
     inline a function that has different target options than the
     caller, unless the callee has a subset of the target options of
     the caller.  For example a function declared with `target("sse3")'
     can inline a function with `target("sse2")', since `-msse3'
     implies `-msse2'.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (13 preceding siblings ...)
  2012-09-12 13:02 ` thiago at kde dot org
@ 2021-09-15  8:09 ` pinskia at gcc dot gnu.org
  2021-09-15  8:29 ` rguenth at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-15  8:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect this has been fixed since maybe GCC 8 (maybe GCC 7).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Bug lto/54231] LTO generates code for the wrong CPU if different options used
  2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
                   ` (14 preceding siblings ...)
  2021-09-15  8:09 ` pinskia at gcc dot gnu.org
@ 2021-09-15  8:29 ` rguenth at gcc dot gnu.org
  15 siblings, 0 replies; 17+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-15  8:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54231

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |7.2

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #15)
> I suspect this has been fixed since maybe GCC 8 (maybe GCC 7).

The use-case should now indeed work fine by means of recording all optimization
and target options per function and restricting inlining.  I think it was fixed
in GCC 7 or even earlier.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-09-15  8:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-11 22:29 [Bug c/54231] New: LTO generates code for the wrong CPU if different options used thiago at kde dot org
2012-08-11 22:31 ` [Bug c/54231] " thiago at kde dot org
2012-08-11 22:33 ` thiago at kde dot org
2012-08-11 22:36 ` thiago at kde dot org
2012-08-11 22:40 ` [Bug lto/54231] " pinskia at gcc dot gnu.org
2012-08-11 22:46 ` steven at gcc dot gnu.org
2012-08-11 23:23 ` thiago at kde dot org
2012-08-12  0:28 ` steven at gcc dot gnu.org
2012-08-13  8:59 ` rguenth at gcc dot gnu.org
2012-08-13  9:45 ` thiago at kde dot org
2012-08-13  9:53 ` thiago at kde dot org
2012-08-13 10:13 ` thiago at kde dot org
2012-08-13 11:58 ` rguenth at gcc dot gnu.org
2012-08-13 12:13 ` thiago at kde dot org
2012-09-12 13:02 ` thiago at kde dot org
2021-09-15  8:09 ` pinskia at gcc dot gnu.org
2021-09-15  8:29 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).