public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/95052] New: Excess padding of partially initialized strings/char arrays
@ 2020-05-11 10:50 krzysztof.a.nowicki+gcc at gmail dot com
  2020-05-11 11:32 ` [Bug c/95052] " rguenth at gcc dot gnu.org
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: krzysztof.a.nowicki+gcc at gmail dot com @ 2020-05-11 10:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95052

            Bug ID: 95052
           Summary: Excess padding of partially initialized strings/char
                    arrays
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: krzysztof.a.nowicki+gcc at gmail dot com
  Target Milestone: ---

Created attachment 48506
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48506&action=edit
generated assembly (GCC 11.0 trunk, -Os -g0)

When compiling the following code with -Os:

  extern void func(char *buf, unsigned size);
  int main(int argc, char *argv[])
  {
    char str[1*1024*1024] =
"fooiuhluhpiuhliuhliyfyukyfklyugkiuhpoipoipoipoipoipoipoipoipoipoipoipoipoimipoipiuhoulouihnliuhl";
    char arr[1*1024*1024] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 0, 1, 6, 2, 3, 4, 5, 6, 7, 8, 9, 0, 3, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1,
2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
    func(str, sizeof(str));
    func(arr, sizeof(arr));
 }

GCC generates initializers for both local variables (str and arr) in the
.rodata section and at run-time initializes the explicit part of the variable
with the provided contents, and zero-inits the remainder.

Unfortunately the initializer stored in the .rodata section is padded up to the
target array size:

.LC0:
        .string
"fooiuhluhpiuhliuhliyfyukyfklyugkiuhpoipoipoipoipoipoipoipoipoipoipoipoipoimipoipiuhoulouihnliuhl"
        .zero   1048479
.LC1:
        .string "\001\026\003\004\005?\007\b'"
        .string "\001\002\003\004\005>\033\b1"
        .string "\001\006\002\003\004\005\006\007\b\t"
        .string "\003\001\002\003\004\005\006\007\b\t"
        .string "\001\002\003\004\005\006\007\b\t"
        .string "\001\002\003\004\005\006\007\b\t"
        .string "\002\002\003\004\005>\033\b1"
        .string "\001\006\002\003\004\005\006\007\b\t"
        .string "\003\001\002\003\004\005\006\007\b\t"
        .string "\001\002\003\004\005\006\007\b\t"
        .string "\001\002\003\004\005"
        .zero   1048466

This causes the resulting binary to become unnecessarily large, even though the
zero padding is completely redundant (the run-time initializer code does not
copy these bytes to the target variable, but zero-initializes them.

I suspect that this is caused by GCC not being able to distinguish between:
 - initialization of a global (or static local) variable,
 - initialization of a local variable

In the former case the contents of the variable live in the read/write data
section and are initialized by the compiler. In such case padding is necessary
as any further changes to the variable will be done in-place.

In the latter case the contents of the variable live on the stack and are
initialized from a read-only copy in the read-only data section. In such case
only the non-zero explicitly initialized part needs to be stored - any padding
can be skipped as it will not be used.

This mis-optimization occurs depending on compiler flags, architecture and size
of the array as well as the initialized part, as GCC may choose (and usually
does) to initialize the variable by using store assembly instructions with
immediate values, as this method is usually faster at the cost of increased
code size.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-07  8:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-11 10:50 [Bug c/95052] New: Excess padding of partially initialized strings/char arrays krzysztof.a.nowicki+gcc at gmail dot com
2020-05-11 11:32 ` [Bug c/95052] " rguenth at gcc dot gnu.org
2020-05-11 13:07 ` krzysztof.a.nowicki+gcc at gmail dot com
2020-05-11 13:18 ` krzysztof.a.nowicki+gcc at gmail dot com
2020-05-11 14:41 ` [Bug target/95052] " msebor at gcc dot gnu.org
2020-05-11 14:51 ` krzysztof.a.nowicki+gcc at gmail dot com
2020-05-11 19:06 ` [Bug middle-end/95052] " jakub at gcc dot gnu.org
2020-05-13  5:05 ` krzysztof.a.nowicki+gcc at gmail dot com
2020-05-13 12:33 ` [Bug middle-end/95052] [9/10/11 Regression] " pinskia at gcc dot gnu.org
2020-05-29  8:43 ` cvs-commit at gcc dot gnu.org
2020-05-29 13:27 ` mkuvyrkov at gcc dot gnu.org
2020-05-29 14:22 ` mkuvyrkov at gcc dot gnu.org
2020-05-29 14:26 ` mkuvyrkov at gcc dot gnu.org
2020-05-29 19:37 ` jakub at gcc dot gnu.org
2020-05-31  9:55 ` cvs-commit at gcc dot gnu.org
2020-08-03 11:10 ` ptsneves at gmail dot com
2021-01-14  8:46 ` [Bug middle-end/95052] [9/10 " rguenth at gcc dot gnu.org
2021-06-01  8:17 ` rguenth at gcc dot gnu.org
2022-05-27  9:42 ` [Bug middle-end/95052] [10 " rguenth at gcc dot gnu.org
2022-06-28 10:40 ` jakub at gcc dot gnu.org
2023-07-07  8:52 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).