[ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling
@ 2019-12-01 18:47 JeanHeyd Meneide
  2019-12-04 10:48 ` Richard Biener
  0 siblings, 1 reply; 3+ messages in thread
From: JeanHeyd Meneide @ 2019-12-01 18:47 UTC (permalink / raw)
  To: GCC Development

Dear GCC Community,

     I have a bit of a question. I recently fixed up and deployed 2
separate implementations of a paper I am working on for C and C++
Standardization called #embed (found here -
https://thephd.github.io/vendor/future_cxx/papers/source/C%20-%20embed
).

    I was trying to play with some of the semantics of implementing
this in GCC, beyond just doing plain expansion into an brace-init-list
of numbers. I noticed that that large payloads (firmware binary images
used in resets, lossless music data, generated config data, etc.)
takes quite a bit more memory and time to initialize and store in GCC;
Clang has much the same problem too, using ~2 GiB memory to handle a
20 MiB array.

    My current working knowledge is that it takes a very long time to
parse, verify, and create a VECTOR_CST to use for array
initialization. I am wondering if it might be a good idea for
optimization purposes to introduce a new way of representing a blob of
binary data.

     I was able to get extremely fast speeds in a test implementation
by making a special builtin called __builtin_array with keyword tag
RID_BUILTIN_ARRAY. The preprocessor directive would expand to
__builtin_array and one of its arguments was a string literal encoded
as base64 representing the original resource's binary data (so it
could survive re-preprocessing the file untouched). I would decode the
string literal and then build a static array of tree type STRING_CST
with the actual data.

     It worked, but this approach required removing some type checks
in digest_init just to be able to fake-up a proper initialization from
a string literal. It also could not initialize data beyond `unsigned
char`, as that is what I had pinned the array representation to upon
creation of the STRING_CST.

     I am new to this, and not really a compiler developer, so I was
wonder if anyone had any tips, tricks, or other places I should look
to learn something about better ways to do this or faster ways to
handle what is essentially large binary payloads! I appreciate any
feedback or thoughts tremendously.

Sincerely Yours,
JeanHeyd Meneide

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling
  2019-12-01 18:47 [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling JeanHeyd Meneide
@ 2019-12-04 10:48 ` Richard Biener
  2019-12-09  3:42   ` JeanHeyd Meneide
  0 siblings, 1 reply; 3+ messages in thread
From: Richard Biener @ 2019-12-04 10:48 UTC (permalink / raw)
  To: JeanHeyd Meneide; +Cc: GCC Development

On Sun, Dec 1, 2019 at 7:47 PM JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
>
> Dear GCC Community,
>
>      I have a bit of a question. I recently fixed up and deployed 2
> separate implementations of a paper I am working on for C and C++
> Standardization called #embed (found here -
> https://thephd.github.io/vendor/future_cxx/papers/source/C%20-%20embed
> ).
>
>     I was trying to play with some of the semantics of implementing
> this in GCC, beyond just doing plain expansion into an brace-init-list
> of numbers. I noticed that that large payloads (firmware binary images
> used in resets, lossless music data, generated config data, etc.)
> takes quite a bit more memory and time to initialize and store in GCC;
> Clang has much the same problem too, using ~2 GiB memory to handle a
> 20 MiB array.
>
>     My current working knowledge is that it takes a very long time to
> parse, verify, and create a VECTOR_CST to use for array
> initialization. I am wondering if it might be a good idea for
> optimization purposes to introduce a new way of representing a blob of
> binary data.
>
>      I was able to get extremely fast speeds in a test implementation
> by making a special builtin called __builtin_array with keyword tag
> RID_BUILTIN_ARRAY. The preprocessor directive would expand to
> __builtin_array and one of its arguments was a string literal encoded
> as base64 representing the original resource's binary data (so it
> could survive re-preprocessing the file untouched). I would decode the
> string literal and then build a static array of tree type STRING_CST
> with the actual data.
>
>      It worked, but this approach required removing some type checks
> in digest_init just to be able to fake-up a proper initialization from
> a string literal. It also could not initialize data beyond `unsigned
> char`, as that is what I had pinned the array representation to upon
> creation of the STRING_CST.

Using a STRING_CST is an iteresting idea and probably works well
for most data.

>      I am new to this, and not really a compiler developer, so I was
> wonder if anyone had any tips, tricks, or other places I should look
> to learn something about better ways to do this or faster ways to
> handle what is essentially large binary payloads! I appreciate any
> feedback or thoughts tremendously.

I think most constraints come from designated initializer processing
and handling of symbolic constants that eventually generate relocations.
I've once played with patches in this area, noting that we "waste"
one INTEGER_CST counting the element number for each actual
data element.  For contiguous elements this could be elided.

Note we also have "special" CONSTRUCTOR fields like
RANGE_EXPR for repetitive data.

Since the large initializers are usually in static initializers
tied to variables another option is to replace the DECL_INITIAL
CONSTRUCTOR tree node with a new BINARY_BLOB
tree node containing a pointer to target encoded (compressed)
data.

Then the fastest alternative is to do as GCC did way back in the
past, emit the initializer to final assembly immediately and not
store it in memory (works only when designated initializer processing
isn't needed).  That removes the possibility of eliding the initializer
by constant folding or dead code removal of course.

For the programmer I'd advise to use separate objects for such data,
possibly compiling it with -O0.  With -O0 we could default to emitting
the data immediately then...

Richard.

> Sincerely Yours,
> JeanHeyd Meneide

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling
  2019-12-04 10:48 ` Richard Biener
@ 2019-12-09  3:42   ` JeanHeyd Meneide
  0 siblings, 0 replies; 3+ messages in thread
From: JeanHeyd Meneide @ 2019-12-09  3:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Development

Dear Richard Biener,

On Wed, Dec 4, 2019 at 5:48 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Sun, Dec 1, 2019 at 7:47 PM JeanHeyd Meneide <phdofthehouse@gmail.com> wrote:
> >
> > ...
> >      It worked, but this approach required removing some type checks
> > in digest_init just to be able to fake-up a proper initialization from
> > a string literal. It also could not initialize data beyond `unsigned
> > char`, as that is what I had pinned the array representation to upon
> > creation of the STRING_CST.
>
> Using a STRING_CST is an iteresting idea and probably works well
> for most data.
>
> ...
>
> Note we also have "special" CONSTRUCTOR fields like
> RANGE_EXPR for repetitive data.
>
> Since the large initializers are usually in static initializers
> tied to variables another option is to replace the DECL_INITIAL
> CONSTRUCTOR tree node with a new BINARY_BLOB
> tree node containing a pointer to target encoded (compressed)
> data.

Thank you so much for your feedback! Your ideas really helped me out
here. I'm using  RANGE_EXPR with an INDEX of 2 operands that are the
min and max of the array, and a VALUE that is the binary data to pull
from. I coded a special handling for digest_init for the C frontend:
I'll likely have to add some additional magic for the C++
initialization rules too. Some preliminary testing with large binary
files went like so:

- 50 MB binary file, huge.bin
- xxd generated include file, huge.bin.h (N.B. took 302 MB)
- compile a file with no library dependencies, using the #embed
directive or just relying on the xxd file

     It takes 11 seconds for #embed compilation to chew through the
file, encode it in a special way so it can survive external tools
applied between the preprocessor and the real compilation of the file
(e.g., a distcc or icecc workflow).

     It takes 621 seconds for the #include-based, xxd-like compilation.

I could get it even faster if I didn't have to do the encode/decode
step for the special way #embed handles data between when it exits the
preprocessor and when it enters the actual C/C++ front ends. I know of
an implementation to do it, but because #embed is not standard I have
to respect that other tools won't know how to behave in the presence
of such a special secondary implementation, so my encoded
implementation is the one that will have to stand for now.

Thank you so much,
JeanHeyd

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-12-09  3:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-01 18:47 [ C ] [ C++ ] Efficient Array Construction / Binary Payload Handling JeanHeyd Meneide
2019-12-04 10:48 ` Richard Biener
2019-12-09  3:42   ` JeanHeyd Meneide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).