public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Brown <david@westcontrol.com>
To: Kai Song <kaisong1515@gmail.com>, gcc-help@gcc.gnu.org
Subject: Re: Compilation of lengthy C++ Files
Date: Thu, 19 Oct 2023 14:47:49 +0200	[thread overview]
Message-ID: <fad81a33-6087-6c5f-8374-53bc060c6316@westcontrol.com> (raw)
In-Reply-To: <CAE37PZpFB5ggKPnqpgCJ8zWSuxULSO5itkZ88UHJ6GOh2C=wyA@mail.gmail.com>

On 18/10/2023 18:04, Kai Song via Gcc-help wrote:
> Dear GCC Developers,
> 
> I am unsuccessfully using g++ 12.0.4 to compile lengthy c++ codes. Those
> codes are automatically generated from my own code-generator tools that
> depend on parameters p.
> Typical applications are:
> - Taylor series of order p inserted into consistency conditions of
> numerical schemes, to determine optimal method parameters (think of, e.g.,
> Runge-Kutta methods)
> - recursive automatic code transformation (think of adjoints of adjoints of
> adjoints...) of recursion level p
> - Hilbert curves or other space-filling curves to generate code that
> simulates cache utilization in a Monte-Carlo context
> 
> I verify that for small p the codes compile and execute to the expected
> result. However, there is always a threshold for p so that the generated
> cpp file is so long that the compiler will just terminate after ~10min
> without monitor output but return the value +1.
> My cpp files range from 600k LOC up to 1Bio LOC. Often, the file comprises
> of one single c++ template class member function definition that relies on
> a few thousand lines of template-classes.
> 
> I would like to know:
> 1) Am I doing something wrong in that GCC should be able to compile lengthy
> codes?
> 2) Is it known that GCC is unable to compile lengthy codes?
> 3) Is it acknowledged that a compiler's ability to compile large files is
> relevant?
> 4) Are possible roots known for this inability and are these deliberate?
> 

I am curious to know why you are generating code like this.  I can see 
how some code generators for mathematical code could easily produce 
large amounts of code, but it is rarely ideal for real-world uses.  Such 
flattened code can reduce overheads and improve optimisation 
opportunities (like inlining, constant folding, function cloning, etc.) 
for small cases, but then they get impractical for compiling while the 
costs for cache misses outweigh the overhead cost for the loops or 
recursion needed for general solutions.

Any compiler is going to be targeted and tuned towards "normal" or 
"typical" code.  That means primarily hand-written code, or smaller 
generated code.  I know that some systems generate very large functions 
or large files, but those are primarily C code, and the code is often 
very simple and "flat".  (Applications here include compilers that use C 
as a intermediary target language, and simulators of various kinds.)  It 
typically makes sense to disable certain optimisation passes here, and a 
number of passes scale badly (quadratic or perhaps worse) with function 
size.

However, if you are generating huge templates in C++, you are going a 
big step beyond that - templates are, in a sense, code generators 
themselves that run at compile time as an interpreted meta-language.  I 
don't expect that there has been a deliberate decision to limit GCC's 
handling of larger files, but I can't imagine that huge templates are a 
major focus for the compiler development.  And I would expect enormous 
memory use and compile times even when it does work.



WARNING: multiple messages have this Message-ID
From: David Brown <david@westcontrol.com>
To: gcc-help@gcc.gnu.org
Subject: Re: Compilation of lengthy C++ Files
Date: Thu, 19 Oct 2023 14:47:49 +0200	[thread overview]
Message-ID: <fad81a33-6087-6c5f-8374-53bc060c6316@westcontrol.com> (raw)
Message-ID: <20231019124749.y3Pdf1VLuJZJZvw9AF_uUDJItA7z2epB4HgerRpgHxs@z> (raw)
In-Reply-To: <CAE37PZpFB5ggKPnqpgCJ8zWSuxULSO5itkZ88UHJ6GOh2C=wyA@mail.gmail.com>

On 18/10/2023 18:04, Kai Song via Gcc-help wrote:
> Dear GCC Developers,
> 
> I am unsuccessfully using g++ 12.0.4 to compile lengthy c++ codes. Those
> codes are automatically generated from my own code-generator tools that
> depend on parameters p.
> Typical applications are:
> - Taylor series of order p inserted into consistency conditions of
> numerical schemes, to determine optimal method parameters (think of, e.g.,
> Runge-Kutta methods)
> - recursive automatic code transformation (think of adjoints of adjoints of
> adjoints...) of recursion level p
> - Hilbert curves or other space-filling curves to generate code that
> simulates cache utilization in a Monte-Carlo context
> 
> I verify that for small p the codes compile and execute to the expected
> result. However, there is always a threshold for p so that the generated
> cpp file is so long that the compiler will just terminate after ~10min
> without monitor output but return the value +1.
> My cpp files range from 600k LOC up to 1Bio LOC. Often, the file comprises
> of one single c++ template class member function definition that relies on
> a few thousand lines of template-classes.
> 
> I would like to know:
> 1) Am I doing something wrong in that GCC should be able to compile lengthy
> codes?
> 2) Is it known that GCC is unable to compile lengthy codes?
> 3) Is it acknowledged that a compiler's ability to compile large files is
> relevant?
> 4) Are possible roots known for this inability and are these deliberate?
> 

I am curious to know why you are generating code like this.  I can see 
how some code generators for mathematical code could easily produce 
large amounts of code, but it is rarely ideal for real-world uses.  Such 
flattened code can reduce overheads and improve optimisation 
opportunities (like inlining, constant folding, function cloning, etc.) 
for small cases, but then they get impractical for compiling while the 
costs for cache misses outweigh the overhead cost for the loops or 
recursion needed for general solutions.

Any compiler is going to be targeted and tuned towards "normal" or 
"typical" code.  That means primarily hand-written code, or smaller 
generated code.  I know that some systems generate very large functions 
or large files, but those are primarily C code, and the code is often 
very simple and "flat".  (Applications here include compilers that use C 
as a intermediary target language, and simulators of various kinds.)  It 
typically makes sense to disable certain optimisation passes here, and a 
number of passes scale badly (quadratic or perhaps worse) with function 
size.

However, if you are generating huge templates in C++, you are going a 
big step beyond that - templates are, in a sense, code generators 
themselves that run at compile time as an interpreted meta-language.  I 
don't expect that there has been a deliberate decision to limit GCC's 
handling of larger files, but I can't imagine that huge templates are a 
major focus for the compiler development.  And I would expect enormous 
memory use and compile times even when it does work.




  parent reply	other threads:[~2023-10-19 12:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 16:04 Kai Song
2023-10-18 21:59 ` Jonathan Wakely
2023-10-19  8:36   ` Andrew Haley
2023-10-19 12:47 ` David Brown [this message]
2023-10-19 12:47   ` David Brown
2023-10-19 14:16   ` Kai Song
2023-10-19 14:26     ` Jonathan Wakely
2023-10-19 15:11       ` Kai Song
2023-10-19 16:03         ` David Brown
2023-10-20  9:32           ` Kai Song
2023-10-20 10:19             ` Jonathan Wakely
     [not found]             ` <CACJ51z3rYUSSe7XpcL4d2xfAhMaiVZpxWAnpkqZc1cn2DRf+uA@mail.gmail.com>
2023-10-20 21:08               ` Kai Song
2023-10-20 22:03                 ` Paul Smith
2023-10-21  6:52                   ` Jonathan Wakely
2023-10-21 14:10                     ` Kai Song
2023-10-24 14:57                       ` Paul Smith
2023-10-25 11:09                         ` Richard Earnshaw
2023-10-25 14:49                           ` Paul Smith
2023-10-26 11:19                             ` David Brown
2023-10-19 15:15     ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fad81a33-6087-6c5f-8374-53bc060c6316@westcontrol.com \
    --to=david@westcontrol.com \
    --cc=gcc-help@gcc.gnu.org \
    --cc=kaisong1515@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).