public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug lto/95776] New: Reduce indirection with target_clones at link time (with LTO)
@ 2020-06-20  4:55 yyc1992 at gmail dot com
  0 siblings, 0 replies; only message in thread
From: yyc1992 at gmail dot com @ 2020-06-20  4:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95776

            Bug ID: 95776
           Summary: Reduce indirection with target_clones at link time
                    (with LTO)
           Product: gcc
           Version: 10.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: lto
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
                CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Currently, if a function is not not visible outside the final library (static,
or internal or hidden visibility), the call of the plt will be replaced with
the call to the function directly.

With target_clones, this is also possible within the same compilation unit for
static functions as callees. The caller that has the same cloning attribute
will simply call the cloned function without indirection.

However, this stops working when the two are combined. Even with the maximum
options and attribute to help it (hidden visibility, same compilation unit,
-Wl,-Bsymbolic, LTO) the call to the cloned function from a caller with
matching cloning attribute still go through the PLT.

Test code

```
__attribute__((noinline,visibility("hidden"))) int f1(int *p)
{
    asm volatile ("" :: "r"(p) : "memory");
    return *p;
}

__attribute__((noinline,visibility("hidden"),target_clones("default,avx2")))
int f2(int *p)
{
    asm volatile ("" :: "r"(p) : "memory");
    return *p;
}

__attribute__((noinline)) int g1(int *p)
{
    return f1(p);
}

__attribute__((noinline,target_clones("default,avx2"))) int g2(int *p)
{
    return f2(p);
}
```

Compiled with `-fPIC -flto -O3 -Wl,-Bsymbolic -shared`. The `f1` call calls
`f1` directly whereas the two cloned `f2` calls both call `f2@plt`.

The same also applies to inlining, target_clones kills inlining even with lto
on.

I assume this happens because this can only be done at link time which either
didn't get passed enough info to determine this or simply didn't get
implemented? I assume this should be possible since it can be done within a
single compilation unit.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-06-20  4:55 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-20  4:55 [Bug lto/95776] New: Reduce indirection with target_clones at link time (with LTO) yyc1992 at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).