public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
@ 2015-05-06  7:09 ` rahul.gundecha at gmail dot com
  2015-05-06 10:59 ` gcc at mattwhitlock dot name
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: rahul.gundecha at gmail dot com @ 2015-05-06  7:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Rahul <rahul.gundecha at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rahul.gundecha at gmail dot com

--- Comment #9 from Rahul <rahul.gundecha at gmail dot com> ---
I am also experiencing the same issue. Is there any solution for it?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
  2015-05-06  7:09 ` [Bug middle-end/192] String literals don't obey -fdata-sections rahul.gundecha at gmail dot com
@ 2015-05-06 10:59 ` gcc at mattwhitlock dot name
  2015-05-06 16:07 ` gcc at mattwhitlock dot name
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: gcc at mattwhitlock dot name @ 2015-05-06 10:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #10 from Matt Whitlock <gcc at mattwhitlock dot name> ---
(In reply to Rahul from comment #9)
> I am also experiencing the same issue. Is there any solution for it?

You can wrap a preprocessor macro around string literals that you want to
subject to the linker's garbage collection:

  #define GCSTR(str) ({ static const char __str[] = str; __str; })

  void hello() {
      puts(GCSTR("111")); // NOT in .rodata
      puts("222");        //     in .rodata
  }

  int main() {
      puts(GCSTR("333")); //     in .rodata
      puts("444");        //     in .rodata
      return 0;
  }

$ gcc -ffunction-sections -fdata-sections -Wl,--gc-sections -o gcstr gcstr.c

$ objdump -s -j .rodata gcstr

  gcstr:     file format elf64-x86-64

  Contents of section .rodata:
   4005fd 32323200 34343400 33333300           222.444.333.    

The downside of this strategy, however, is that these strings then become
ineligible for merging, so if you have multiple *reachable* occurrences of the
same GCSTR in your code, then you'll have multiple copies of the string data in
the .rodata section of your linked binary.

These redundant copies would not be present if the compiler were correctly
outputting literal-initialized constant character arrays to sections with the
"merge" and "strings" flags set (which it should do only if
-fmerge-all-constants is set). You can simulate how this could/should work by
editing the compiler's assembly output so that it sets the section flags
appropriately.

Given this program, gcstr.c:

  #define GCSTR(str) ({ static const char __str[] = str; __str; })

  int main() {
      puts(GCSTR("111"));
      puts(GCSTR("111"));
      puts("111");
      return 0;
  }

Compile (but do not assemble) the program:

$ gcc -S -ffunction-sections -fdata-sections -fmerge-all-constants -o gcstr.s
gcstr.c

Edit the assembly code so that all .rodata.__str.* sections are declared with
the "merge" and "strings" flags and an entity size of 1:

$ sed -e
's/\(\.section\t\.rodata\.__str\..*\),"a",\(@progbits\)$/\1,"aMS",\2,1/' -i
gcstr.s

Now assemble and link the program:

$ gcc -Wl,--gc-sections -o gcstr gcstr.s

Dumping the .rodata section from the resulting executable reveals that the
linker did correctly perform string merging.

$ objdump -s -j .rodata gcstr

  gcstr:     file format elf64-x86-64

  Contents of section .rodata:
   40060d 31313100                             111.            

Compare the above objdump output to that which results when skipping the sed
step:

   40060d 31313100 31313100 31313100           111.111.111.    

The needed correction is that the compiler should, when -fmerge-all-constants
is set, emit literal-initialized constant character array data to a section
with flags "aMS" and entsize==sizeof(T), where T is the type of characters in
the array.

A further correction (and really the main request in this bug report) would be
for the compiler to emit string literals to discrete sections when
-fdata-sections is set.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
  2015-05-06  7:09 ` [Bug middle-end/192] String literals don't obey -fdata-sections rahul.gundecha at gmail dot com
  2015-05-06 10:59 ` gcc at mattwhitlock dot name
@ 2015-05-06 16:07 ` gcc at mattwhitlock dot name
  2015-05-06 16:41 ` hjl.tools at gmail dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: gcc at mattwhitlock dot name @ 2015-05-06 16:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #11 from Matt Whitlock <gcc at mattwhitlock dot name> ---
Created attachment 35479
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35479&action=edit
put string literals into unique sections when -fmerge-constants -fdata-sections

This patch puts each string literal into a (probably) unique section when
compiling with -fmerge-constants -fdata-sections. The section name is
constructed from the character width and string alignment (as before) plus a
32-bit hash of the string contents.

Consider the following program:

  void used() {
      puts("keep me");
      puts("common");
      puts("string");
      puts("tail");
  }

  void not_used() {
      puts("toss me");
      puts("common");
      puts("ring");
      puts("entail");
  }

  int main() {
      used();
      return 0;
  }

$ gcc -ffunction-sections -fdata-sections -fmerge-constants \
      -Wl,--gc-sections -o test test.c

Compiling with an unpatched GCC produces a binary whose .rodata contains:

   40061d 6b656570 206d6500 636f6d6d 6f6e0073  keep me.common.s
   40062d 7472696e 6700746f 7373206d 6500656e  tring.toss me.en
   40063d 7461696c 00                          tail.           

Compiling with a patched GCC produces a binary whose .rodata contains:

   40061d 6b656570 206d6500 636f6d6d 6f6e0073  keep me.common.s
   40062d 7472696e 67007461 696c00             tring.tail.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2015-05-06 16:07 ` gcc at mattwhitlock dot name
@ 2015-05-06 16:41 ` hjl.tools at gmail dot com
  2015-05-06 23:26 ` gcc at mattwhitlock dot name
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: hjl.tools at gmail dot com @ 2015-05-06 16:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #12 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Matt Whitlock from comment #11)
> Created attachment 35479 [details]
> put string literals into unique sections when -fmerge-constants
> -fdata-sections
> 
> This patch puts each string literal into a (probably) unique section when
> compiling with -fmerge-constants -fdata-sections. The section name is
> constructed from the character width and string alignment (as before) plus a
> 32-bit hash of the string contents.

Would it better to use MD5 checksum on string contents?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2015-05-06 16:41 ` hjl.tools at gmail dot com
@ 2015-05-06 23:26 ` gcc at mattwhitlock dot name
  2015-05-07  7:19 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: gcc at mattwhitlock dot name @ 2015-05-06 23:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #13 from Matt Whitlock <gcc at mattwhitlock dot name> ---
(In reply to H.J. Lu from comment #12)
> Would it better to use MD5 checksum on string contents?

MD5 would be slower for not much gain in uniqueness (assuming its output is
truncated to 32 bits). This application doesn't require a cryptographically
strong hash function, as the consequence of a collision is merely that a string
gets included in the binary when maybe it didn't need to be.

Actually, I would favor replacing the very old (1996) Lookup2 hash function
(implemented in libiberty/hashtab.c) with a more modern hash function, such as
MurmurHash3, CityHash, or even Lookup3, all of which are faster than Lookup2.

I would hesitate to use more than 32 bits, as the section names would start
getting rather long.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2015-05-06 23:26 ` gcc at mattwhitlock dot name
@ 2015-05-07  7:19 ` jakub at gcc dot gnu.org
  2015-05-07 15:51 ` segher at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: jakub at gcc dot gnu.org @ 2015-05-07  7:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
This doesn't really look like a good idea to me.  Instead, perhaps ld's
--gc-sections or new special option should just remove unused string literals
from mergeable sections.
With your patch, I bet you lose e.g. all tail merging.  Consider:
const char *used1 () { return "foo bar baz blah blah"; }
in one TU and
const char *used2 () { return "bar baz blah blah"; }
in another.  The linker necessarily knows which strings (or other data) in
mergeable sections are used and which are unused.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2015-05-07  7:19 ` jakub at gcc dot gnu.org
@ 2015-05-07 15:51 ` segher at gcc dot gnu.org
  2015-05-08  2:05 ` gcc at mattwhitlock dot name
  2020-09-15 22:15 ` i at maskray dot me
  8 siblings, 0 replies; 13+ messages in thread
From: segher at gcc dot gnu.org @ 2015-05-07 15:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #16 from Segher Boessenkool <segher at gcc dot gnu.org> ---
Author: segher
Date: Thu May  7 15:51:01 2015
New Revision: 222880

URL: https://gcc.gnu.org/viewcvs?rev=222880&root=gcc&view=rev
Log:
        PR middle-end/192
        PR middle-end/54303
        * varasm.c (function_mergeable_rodata_prefix): New function.
        (mergeable_string_section): Use it.
        (mergeable_constant_section): Use it.

gcc/testsuite/
        * gcc.dg/fdata-sections-2.c: New file.

Added:
    trunk/gcc/testsuite/gcc.dg/fdata-sections-2.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/varasm.c


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2015-05-07 15:51 ` segher at gcc dot gnu.org
@ 2015-05-08  2:05 ` gcc at mattwhitlock dot name
  2020-09-15 22:15 ` i at maskray dot me
  8 siblings, 0 replies; 13+ messages in thread
From: gcc at mattwhitlock dot name @ 2015-05-08  2:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

--- Comment #17 from Matt Whitlock <gcc at mattwhitlock dot name> ---
(In reply to Segher Boessenkool from comment #16)

Thanks for the fix, Segher. Your patch seems more "right" than mine, although I
will point out that it doesn't precisely address this bug report, as it places
string literal data into unique sections only if -ffunction-sections is set,
whereas -fdata-sections has no impact. I can see arguments both ways, and
personally this distinction is irrelevant to me, as I always use for
-ffunction-sections and -fdata-sections, but the new behavior does seem
somewhat counter-intuitive to me.

Anyway, I tested your new patch (backported to GCC 4.9.2) with the use cases in
Comment 11 and Comment 15, and both produced the desired results (after I added
-ffunction-sections to the command lines in Comment 15). So I'm appeased.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2015-05-08  2:05 ` gcc at mattwhitlock dot name
@ 2020-09-15 22:15 ` i at maskray dot me
  8 siblings, 0 replies; 13+ messages in thread
From: i at maskray dot me @ 2020-09-15 22:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192

Fangrui Song <i at maskray dot me> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |i at maskray dot me

--- Comment #19 from Fangrui Song <i at maskray dot me> ---
(In reply to Jakub Jelinek from comment #14)
> This doesn't really look like a good idea to me.  Instead, perhaps ld's
> --gc-sections or new special option should just remove unused string
> literals from mergeable sections.
> With your patch, I bet you lose e.g. all tail merging.  Consider:
> const char *used1 () { return "foo bar baz blah blah"; }
> in one TU and
> const char *used2 () { return "bar baz blah blah"; }
> in another.  The linker necessarily knows which strings (or other data) in
> mergeable sections are used and which are unused.

I second Jakub's idea that the linker should perform the constant merge (which
is implemented in LLD): the cost of a section header (sizeof(Elf64_Shdr)=64) +
a section name (".rodata.xxx.str1.1") is quite large.

Created a GNU ld (and gold) feature request:
https://sourceware.org/bugzilla/show_bug.cgi?id=26622

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <bug-192-865@http.gcc.gnu.org/bugzilla/>
@ 2007-04-02 20:27 ` maskva at searxhmash dot com
  0 siblings, 0 replies; 13+ messages in thread
From: maskva at searxhmash dot com @ 2007-04-02 20:27 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from maskva at searxhmash dot com  2007-04-02 21:27 -------
Created an attachment (id=13319)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13319&action=view)
ned


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <20000502194601.192.greyham@research.canon.com.au>
  2004-05-24  2:38 ` pinskia at gcc dot gnu dot org
  2004-10-13 13:40 ` pinskia at gcc dot gnu dot org
@ 2004-11-30  6:36 ` amodra at bigpond dot net dot au
  2 siblings, 0 replies; 13+ messages in thread
From: amodra at bigpond dot net dot au @ 2004-11-30  6:36 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From amodra at bigpond dot net dot au  2004-11-30 06:36 -------
This is true of other constants too.  For example, on powerpc-linux, compiling
the testcase in pr9571:
gcc -O2 -m32 -fdata-sections -fno-merge-constants -S /src/tmp/pr9571.c
gives:
        .file   "pr9571.c"
        .globl d
        .section        .sdata.d,"a",@progbits
        .align 3
        .type   d, @object
        .size   d, 8
d:
        .long   1074339512
        .long   1374389535
        .section        .rodata
        .align 3
.LC0:
        .long   1074339512
        .long   1374389535
        .section        ".text"
        .align 2
        .p2align 4,,15
        .globl f
        .type   f, @function
f:
        lis 9,.LC0@ha
        lfd 1,.LC0@l(9)
        blr
        .size   f,.-f
        .ident  "GCC: (GNU) 4.0.0 20041129 (experimental)"
        .section        .note.GNU-stack,"",@progbits

The duplication of the constant isn't ideal either.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <20000502194601.192.greyham@research.canon.com.au>
  2004-05-24  2:38 ` pinskia at gcc dot gnu dot org
@ 2004-10-13 13:40 ` pinskia at gcc dot gnu dot org
  2004-11-30  6:36 ` amodra at bigpond dot net dot au
  2 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-13 13:40 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |NEW


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug middle-end/192] String literals don't obey -fdata-sections
       [not found] <20000502194601.192.greyham@research.canon.com.au>
@ 2004-05-24  2:38 ` pinskia at gcc dot gnu dot org
  2004-10-13 13:40 ` pinskia at gcc dot gnu dot org
  2004-11-30  6:36 ` amodra at bigpond dot net dot au
  2 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-05-24  2:38 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.5.0                       |---


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=192


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-15 22:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-192-4@http.gcc.gnu.org/bugzilla/>
2015-05-06  7:09 ` [Bug middle-end/192] String literals don't obey -fdata-sections rahul.gundecha at gmail dot com
2015-05-06 10:59 ` gcc at mattwhitlock dot name
2015-05-06 16:07 ` gcc at mattwhitlock dot name
2015-05-06 16:41 ` hjl.tools at gmail dot com
2015-05-06 23:26 ` gcc at mattwhitlock dot name
2015-05-07  7:19 ` jakub at gcc dot gnu.org
2015-05-07 15:51 ` segher at gcc dot gnu.org
2015-05-08  2:05 ` gcc at mattwhitlock dot name
2020-09-15 22:15 ` i at maskray dot me
     [not found] <bug-192-865@http.gcc.gnu.org/bugzilla/>
2007-04-02 20:27 ` maskva at searxhmash dot com
     [not found] <20000502194601.192.greyham@research.canon.com.au>
2004-05-24  2:38 ` pinskia at gcc dot gnu dot org
2004-10-13 13:40 ` pinskia at gcc dot gnu dot org
2004-11-30  6:36 ` amodra at bigpond dot net dot au

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).