public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib
@ 2022-11-22 21:39 aleks at physik dot tu-berlin.de
  2022-11-22 21:47 ` [Bug c/107827] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: aleks at physik dot tu-berlin.de @ 2022-11-22 21:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827

            Bug ID: 107827
           Summary: switch table compression could shrink large tables
                    (but missing), testcase cerf-lib
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: aleks at physik dot tu-berlin.de
  Target Milestone: ---

The cerf-lib (self-contained numeric library that provides an efficient and
accurate implementation of complex error functions) contains 2 switch-blocks
having about 100 entries. Each entry represents a piece-wise
taylor-polynomial-approximation (aiming nearly machine precision, type double).
https://jugit.fz-juelich.de/mlz/libcerf/-/tree/main/lib

Compiler Explorer (selected -O3 for x86-64-gcc-12) https://godbolt.org/
shows ASM output having huge jump-tables. Even code compiles fine on my PC with
gcc11, you have to shrink to 20 cases in Compiler-Explorer to see result (some
webservice limit, but this is not the issue here). Lets focus on 2 static
functions.

1. erfcx.c / static double erfcx_y100() / case (int) 0..99 (100 cases, no gap,
each case is a taylor-polynomial of grade 6). Gives Jump-Table of 100 x 8
bytes, plus code. The table could be easy compressed or replaced by linear
formula.
a) assumning no function ever exceeded 4 GB of code, it would be possible to
use only 4 byte offset (jumping-distance), and for 99% of funtions 65 KB range
would be enough (at least in O2 is makes sence, but also in O3 it could be
faster to decrease memory/cache usage - this could become a new option to
control it).
b) for this erfcx_y100-sample it would be even possible to calc the offset by
formula.
c) of course the parameters could be a separate lookup-table, but no speedup
for other reasons (I tried these days). gcc could detect the similarity
(idential cases beside the 7 hard-coded coefficients) which would be antother
change request..

sample (I cutted the coefficients numbers for readability)
switch ((int) y100) {
  case 0: {
    double t = 2*y100 - 1;
    return 0.7 + (0.7 + (0.3 + (0.1 + (0.8 + (0.3 + 0.1 * t) *t) *t) *t) *t)
*t;
  }
  case 1: {
    double t = 2*y100 - 3;
    return 0.2 + (0.7 + (0.3 + (0.1 + (0.8 + (0.3 + 0.1 *t) *t) *t) *t) *t) *t;
  }
 .. until case 99:
  // instead 8 x 100 byte table it could be 2 x 100 byte, or 10 byte linear
equation.

2. im_w_of_x.c / static double w_im_y100() / case (int) 0..96 (97 cases, no
gap, similar polynomials between 6th to 8th grade, plus another unique final
formula). Here the code is (even neglecting the coefficients) not identical,
but still similar. Address-Offset-Compression could still be used (but not
linear all the way).
d) same as a) + b)
e) side note: " 2*y100 " seems no to be detected and calculated once at begin
of switch.
f) detection of similar code (same equation, different parameters) could be
used to decrease object-size (text) as only 4 kinds of equations are here, only
offset in some created "const double parameters[]"-array differs.

PS: no code is wrong or shows different results, this is a
ro-memory-consumption and timing issue only.

Thanks for reading.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug c/107827] switch table compression could shrink large tables (but missing), testcase cerf-lib
  2022-11-22 21:39 [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib aleks at physik dot tu-berlin.de
@ 2022-11-22 21:47 ` pinskia at gcc dot gnu.org
  2022-11-22 22:01 ` [Bug target/107827] " pinskia at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-22 21:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 53950
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53950&action=edit
testcase

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/107827] switch table compression could shrink large tables (but missing), testcase cerf-lib
  2022-11-22 21:39 [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib aleks at physik dot tu-berlin.de
  2022-11-22 21:47 ` [Bug c/107827] " pinskia at gcc dot gnu.org
@ 2022-11-22 22:01 ` pinskia at gcc dot gnu.org
  2022-11-22 22:05 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-22 22:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|c                           |target
             Target|                            |x86_64-*-*
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2022-11-22

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
x86_64 target does not dojump table compression at all.
But aarch64 and arm targets do:
.L4:
        .2byte  (.L103 - .Lrtx4) / 4
        .2byte  (.L102 - .Lrtx4) / 4
        .2byte  (.L101 - .Lrtx4) / 4
        .2byte  (.L100 - .Lrtx4) / 4
        .2byte  (.L99 - .Lrtx4) / 4
        .2byte  (.L98 - .Lrtx4) / 4
        .2byte  (.L97 - .Lrtx4) / 4
        .2byte  (.L96 - .Lrtx4) / 4
        .2byte  (.L95 - .Lrtx4) / 4


Confirmed. Note the /4 can't be done for x86 but the rest I think can be done.
The /4 can't be done because some instructions one byte in length on x86_64.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/107827] switch table compression could shrink large tables (but missing), testcase cerf-lib
  2022-11-22 21:39 [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib aleks at physik dot tu-berlin.de
  2022-11-22 21:47 ` [Bug c/107827] " pinskia at gcc dot gnu.org
  2022-11-22 22:01 ` [Bug target/107827] " pinskia at gcc dot gnu.org
@ 2022-11-22 22:05 ` pinskia at gcc dot gnu.org
  2022-11-22 22:08 ` aleks at physik dot tu-berlin.de
  2022-11-22 22:13 ` aleks at physik dot tu-berlin.de
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-11-22 22:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Interesting how clang does not do it either for x86_64 but does it also for
aarch64 ...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/107827] switch table compression could shrink large tables (but missing), testcase cerf-lib
  2022-11-22 21:39 [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib aleks at physik dot tu-berlin.de
                   ` (2 preceding siblings ...)
  2022-11-22 22:05 ` pinskia at gcc dot gnu.org
@ 2022-11-22 22:08 ` aleks at physik dot tu-berlin.de
  2022-11-22 22:13 ` aleks at physik dot tu-berlin.de
  4 siblings, 0 replies; 6+ messages in thread
From: aleks at physik dot tu-berlin.de @ 2022-11-22 22:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827

--- Comment #4 from Alexander Kleinsorge <aleks at physik dot tu-berlin.de> ---
(In reply to Andrew Pinski from comment #2)
> x86_64 target does not dojump table compression at all.
> But aarch64 and arm targets do:
> .L4:
>         .2byte  (.L103 - .Lrtx4) / 4
>         .2byte  (.L102 - .Lrtx4) / 4
>         .2byte  (.L101 - .Lrtx4) / 4
>         .2byte  (.L100 - .Lrtx4) / 4
>         .2byte  (.L99 - .Lrtx4) / 4
>         .2byte  (.L98 - .Lrtx4) / 4
>         .2byte  (.L97 - .Lrtx4) / 4
>         .2byte  (.L96 - .Lrtx4) / 4
>         .2byte  (.L95 - .Lrtx4) / 4
> 
> 
> Confirmed. Note the /4 can't be done for x86 but the rest I think can be
> done. The /4 can't be done because some instructions one byte in length on
> x86_64.

I think by inserting 1 byte NOPs should be possible to get jump-point-alignment
https://www.felixcloutier.com/x86/nop
So the other 1 byte ops could be filled up for alignment.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/107827] switch table compression could shrink large tables (but missing), testcase cerf-lib
  2022-11-22 21:39 [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib aleks at physik dot tu-berlin.de
                   ` (3 preceding siblings ...)
  2022-11-22 22:08 ` aleks at physik dot tu-berlin.de
@ 2022-11-22 22:13 ` aleks at physik dot tu-berlin.de
  4 siblings, 0 replies; 6+ messages in thread
From: aleks at physik dot tu-berlin.de @ 2022-11-22 22:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107827

--- Comment #5 from Alexander Kleinsorge <aleks at physik dot tu-berlin.de> ---
And what about linear equation for same-sized cases (each case has same
code-size). Which should be possible in Case 1 (I think).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-11-22 22:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22 21:39 [Bug c/107827] New: switch table compression could shrink large tables (but missing), testcase cerf-lib aleks at physik dot tu-berlin.de
2022-11-22 21:47 ` [Bug c/107827] " pinskia at gcc dot gnu.org
2022-11-22 22:01 ` [Bug target/107827] " pinskia at gcc dot gnu.org
2022-11-22 22:05 ` pinskia at gcc dot gnu.org
2022-11-22 22:08 ` aleks at physik dot tu-berlin.de
2022-11-22 22:13 ` aleks at physik dot tu-berlin.de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).