* slowdown with -std=gnu18 with respect to -std=c99 @ 2022-05-03 8:28 Paul Zimmermann 2022-05-03 9:09 ` Alexander Monakov 0 siblings, 1 reply; 13+ messages in thread From: Paul Zimmermann @ 2022-05-03 8:28 UTC (permalink / raw) To: gcc-help; +Cc: sibid, stephane.glondu Hi, I observe a slowdown of some code compiled with gcc when I use -std=gnu18 instead of -std=c99. My computer is a i5-4590, and I use gcc version 11.3.0 (Debian 11.3.0-1). To reproduce: $ git clone https://gitlab.inria.fr/core-math/core-math.git $ cd core-math $ CORE_MATH_PERF_MODE=rdtsc CFLAGS="-O3 -march=native -ffinite-math-only -std=gnu18" ./perf.sh exp10f GNU libc version: 2.33 GNU libc release: release 31.746 11.780 $ CORE_MATH_PERF_MODE=rdtsc CFLAGS="-O3 -march=native -ffinite-math-only -std=c99" ./perf.sh exp10f GNU libc version: 2.33 GNU libc release: release 21.514 11.751 The difference is seen between the first figures in each run (31.746 and 21.514), which indicate the average number of cycles of the exp10f function from the core-math library. The code is very simple (a few dozen lines): https://gitlab.inria.fr/core-math/core-math/-/blob/master/src/binary32/exp10/exp10f.c Some more remarks: * this slowdown does not happen on all machines, for example it does not appear on an AMD EPYC 7282 with gcc gcc version 10.2.1 (Debian 10.2.1-6). * this slowdown disappears when I replace __builtin_expect(ex>(127+6), 0) by ex>(127+6) at line 45 of the code, however that branch is never taken in the above experiment. Does anyone have a clue? Best regards, Paul Zimmermann ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-03 8:28 slowdown with -std=gnu18 with respect to -std=c99 Paul Zimmermann @ 2022-05-03 9:09 ` Alexander Monakov 2022-05-03 11:45 ` Paul Zimmermann 2022-05-05 8:57 ` Stéphane Glondu 0 siblings, 2 replies; 13+ messages in thread From: Alexander Monakov @ 2022-05-03 9:09 UTC (permalink / raw) To: Paul Zimmermann; +Cc: gcc-help, stephane.glondu, sibid On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote: > Does anyone have a clue? I can reproduce a difference, but in my case it's simply because in -std=gnuXX mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few steps in the benchmarked function to use fma instead of separate mul/add instructions. (regarding __builtin_expect, it also makes a small difference in my case, it seems GCC generates some redundant code without it, but the difference is 10x smaller than what presence/absence of FMA gives) I think you might be able to figure it out on your end if you run both variants under 'perf stat', note how cycle count and instruction counts change, and then look at disassembly to see what changed. You can use 'perf record' and 'perf report' to easily see the hot code path; if you do that, I'd recommend to run it with the same sampling period in both cases, e.g. like this: perf record -e instructions:P -c 500000 ./perf ... Alexander ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-03 9:09 ` Alexander Monakov @ 2022-05-03 11:45 ` Paul Zimmermann 2022-05-03 12:12 ` Alexander Monakov 2022-05-05 8:57 ` Stéphane Glondu 1 sibling, 1 reply; 13+ messages in thread From: Paul Zimmermann @ 2022-05-03 11:45 UTC (permalink / raw) To: Alexander Monakov; +Cc: gcc-help, stephane.glondu, sibid thank you very much Alexander. > Date: Tue, 3 May 2022 12:09:32 +0300 (MSK) > From: Alexander Monakov <amonakov@ispras.ru> > cc: gcc-help@gcc.gnu.org, stephane.glondu@inria.fr, sibid@uvic.ca > > On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote: > > > Does anyone have a clue? > > I can reproduce a difference, but in my case it's simply because in -std=gnuXX > mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few > steps in the benchmarked function to use fma instead of separate mul/add > instructions. but then you should get better (i.e. smaller) timings with -std=gnuXX than with -std=cXX, instead of worse timings as we get? > (regarding __builtin_expect, it also makes a small difference in my case, > it seems GCC generates some redundant code without it, but the difference is > 10x smaller than what presence/absence of FMA gives) > > I think you might be able to figure it out on your end if you run both variants > under 'perf stat', note how cycle count and instruction counts change, and then > look at disassembly to see what changed. You can use 'perf record' and 'perf > report' to easily see the hot code path; if you do that, I'd recommend to run > it with the same sampling period in both cases, e.g. like this: > > perf record -e instructions:P -c 500000 ./perf ... thank you, we'll investigate that. Best regards, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-03 11:45 ` Paul Zimmermann @ 2022-05-03 12:12 ` Alexander Monakov 0 siblings, 0 replies; 13+ messages in thread From: Alexander Monakov @ 2022-05-03 12:12 UTC (permalink / raw) To: Paul Zimmermann; +Cc: gcc-help, stephane.glondu, sibid On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote: > > I can reproduce a difference, but in my case it's simply because in -std=gnuXX > > mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few > > steps in the benchmarked function to use fma instead of separate mul/add > > instructions. > > but then you should get better (i.e. smaller) timings with -std=gnuXX than > with -std=cXX, instead of worse timings as we get? Right, for me -std=gnuXX is faster. But for you it's slower by almost 1.5x, that's quite a lot and should be easy to spot on 'perf report' profile. > > perf record -e instructions:P -c 500000 ./perf ... > > thank you, we'll investigate that. Good luck! I'm curious what you'll find, please let me know. Alexander ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-03 9:09 ` Alexander Monakov 2022-05-03 11:45 ` Paul Zimmermann @ 2022-05-05 8:57 ` Stéphane Glondu 2022-05-05 14:31 ` Stéphane Glondu 1 sibling, 1 reply; 13+ messages in thread From: Stéphane Glondu @ 2022-05-05 8:57 UTC (permalink / raw) To: Alexander Monakov, gcc-help; +Cc: sibid, Paul Zimmermann Le 03/05/2022 à 11:09, Alexander Monakov a écrit : >> Does anyone have a clue? > > I can reproduce a difference, but in my case it's simply because in -std=gnuXX > mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few > steps in the benchmarked function to use fma instead of separate mul/add > instructions. > > (regarding __builtin_expect, it also makes a small difference in my case, > it seems GCC generates some redundant code without it, but the difference is > 10x smaller than what presence/absence of FMA gives) > > I think you might be able to figure it out on your end if you run both variants > under 'perf stat', note how cycle count and instruction counts change, and then > look at disassembly to see what changed. You can use 'perf record' and 'perf > report' to easily see the hot code path; if you do that, I'd recommend to run > it with the same sampling period in both cases, e.g. like this: > > perf record -e instructions:P -c 500000 ./perf ... I did that. The hot code path corresponds to (from exp10f.c): double a = iln2h*z, ia = __builtin_floor(a), h = (a - ia) + iln2l*z; long i = ia, j = i&0xf, e = i - j; e >>= 4; double s = tb[j]; b64u64_u su = {.u = (e + 0x3fful)<<52}; s *= su.f; double h2 = h*h; double c0 = c[0] + h*c[1]; double c2 = c[2] + h*c[3]; double c4 = c[4] + h*c[5]; c0 += h2*(c2 + h2*c4); double w = s*h; return s + w*c0; With -std=c99, where the overall performance is 22 cycles, I get: 4,03 │ 3a: vcvtss2sd -0x4(%rsp),%xmm0,%xmm0 0,01 │ vmulsd ir.4+0x38,%xmm0,%xmm1 │ vmulsd ir.4+0x40,%xmm0,%xmm0 0,01 │ lea tb.1,%rdx 3,06 │ vroundsd $0x9,%xmm1,%xmm1,%xmm2 0,03 │ vsubsd %xmm2,%xmm1,%xmm1 10,42 │ vcvttsd2si %xmm2,%rax 0,01 │ vaddsd %xmm0,%xmm1,%xmm1 │ mov %rax,%rcx 0,02 │ vmulsd ir.4+0x58,%xmm1,%xmm0 0,38 │ vmulsd %xmm1,%xmm1,%xmm5 0,00 │ vmulsd ir.4+0x68,%xmm1,%xmm4 │ sar $0x4,%rax 0,00 │ add $0x3ff,%rax 1,17 │ vaddsd ir.4+0x60,%xmm0,%xmm0 │ shl $0x34,%rax 0,02 │ vaddsd ir.4+0x70,%xmm4,%xmm4 0,10 │ vmulsd %xmm5,%xmm0,%xmm0 0,85 │ vmulsd ir.4+0x48,%xmm1,%xmm3 0,00 │ and $0xf,%ecx │ vmovq %rax,%xmm6 1,20 │ vmulsd (%rdx,%rcx,8),%xmm6,%xmm2 0,65 │ vaddsd %xmm4,%xmm0,%xmm0 0,00 │ vaddsd ir.4+0x50,%xmm3,%xmm3 3,49 │ vmulsd %xmm5,%xmm0,%xmm0 15,59 │ vmulsd %xmm2,%xmm1,%xmm1 4,61 │ vaddsd %xmm3,%xmm0,%xmm0 10,24 │ vmulsd %xmm1,%xmm0,%xmm0 11,31 │ vaddsd %xmm2,%xmm0,%xmm0 23,21 │ vcvtsd2ss %xmm0,%xmm0,%xmm0 0,00 │ ← ret With -std=gnu18, where the overall performance is 36 cycles, I get: 0,02 │ 3a: vcvtss2sd -0x4(%rsp),%xmm1,%xmm1 0,01 │ vmulsd ir.4+0x40,%xmm1,%xmm0 │ vmovsd ir.4+0x60,%xmm5 │ vmovsd ir.4+0x50,%xmm4 │ lea tb.1,%rdx 0,13 │ vroundsd $0x9,%xmm0,%xmm0,%xmm2 0,83 │ vsubsd %xmm2,%xmm0,%xmm0 28,99 │ vcvttsd2si %xmm2,%rax 63,49 │ vfmadd132sd 0x961(%rip),%xmm0,%xmm1 │ vmovsd ir.4+0x70,%xmm0 │ mov %rax,%rcx │ sar $0x4,%rax 2,73 │ add $0x3ff,%rax 1,99 │ vmulsd %xmm1,%xmm1,%xmm3 0,00 │ vfmadd213sd 0x95f(%rip),%xmm1,%xmm5 0,00 │ vfmadd213sd 0x966(%rip),%xmm1,%xmm0 │ shl $0x34,%rax │ and $0xf,%ecx │ vmovq %rax,%xmm6 0,17 │ vmulsd (%rdx,%rcx,8),%xmm6,%xmm2 │ vfmadd213sd 0x92c(%rip),%xmm1,%xmm4 0,04 │ vfmadd132sd %xmm3,%xmm5,%xmm0 0,64 │ vmulsd %xmm2,%xmm1,%xmm1 0,01 │ vfmadd132sd %xmm3,%xmm4,%xmm0 0,46 │ vfmadd132sd %xmm1,%xmm2,%xmm0 0,27 │ vcvtsd2ss %xmm0,%xmm0,%xmm0 │ ← ret The distribution of time is very different in both cases: in the first case, most of the time is spent at the end (computing w and return value I suppose) whereas in the second case, most of the time is spent in the first multiply-and-add (computing h). I do not understand this change of behaviour. Cheers, -- Stéphane ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-05 8:57 ` Stéphane Glondu @ 2022-05-05 14:31 ` Stéphane Glondu 2022-05-05 14:41 ` Marc Glisse 0 siblings, 1 reply; 13+ messages in thread From: Stéphane Glondu @ 2022-05-05 14:31 UTC (permalink / raw) To: Alexander Monakov, gcc-help; +Cc: sibid, Paul Zimmermann As additional data points, the performance with several versions of gcc (as packaged in Debian testing/unstable): | gcc-9 | gcc-10 | gcc-11 | gcc-12 | ------------|-------|--------|--------|--------| -std=c99 | 24 | 23.5 | 23 | 23 | -std=gnu18 | 43 | 16.8 | 38 | 38 | One can see that the performance stays relatively constant with -std=c99, but varies significantly with -std=gnu18. -- Stéphane ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-05 14:31 ` Stéphane Glondu @ 2022-05-05 14:41 ` Marc Glisse 2022-05-05 14:56 ` Alexander Monakov 2022-05-05 17:50 ` Paul Zimmermann 0 siblings, 2 replies; 13+ messages in thread From: Marc Glisse @ 2022-05-05 14:41 UTC (permalink / raw) To: Stéphane Glondu; +Cc: Alexander Monakov, gcc-help, sibid, Paul Zimmermann On Thu, 5 May 2022, Stéphane Glondu via Gcc-help wrote: > As additional data points, the performance with several versions of gcc > (as packaged in Debian testing/unstable): > > | gcc-9 | gcc-10 | gcc-11 | gcc-12 | > ------------|-------|--------|--------|--------| > -std=c99 | 24 | 23.5 | 23 | 23 | > -std=gnu18 | 43 | 16.8 | 38 | 38 | > > One can see that the performance stays relatively constant with > -std=c99, but varies significantly with -std=gnu18. Could you compare with c18 or gnu99, to determine if the issue is with c vs gnu (most likely since fma seems important) or 99 vs 18? -- Marc Glisse ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-05 14:41 ` Marc Glisse @ 2022-05-05 14:56 ` Alexander Monakov 2022-05-06 7:46 ` Paul Zimmermann 2022-05-05 17:50 ` Paul Zimmermann 1 sibling, 1 reply; 13+ messages in thread From: Alexander Monakov @ 2022-05-05 14:56 UTC (permalink / raw) To: Marc Glisse via Gcc-help Cc: Stéphane Glondu, Marc Glisse, sibid, Paul Zimmermann On Thu, 5 May 2022, Marc Glisse via Gcc-help wrote: > On Thu, 5 May 2022, Stéphane Glondu via Gcc-help wrote: > > > As additional data points, the performance with several versions of gcc > > (as packaged in Debian testing/unstable): > > > > | gcc-9 | gcc-10 | gcc-11 | gcc-12 | > > ------------|-------|--------|--------|--------| > > -std=c99 | 24 | 23.5 | 23 | 23 | > > -std=gnu18 | 43 | 16.8 | 38 | 38 | > > > > One can see that the performance stays relatively constant with > > -std=c99, but varies significantly with -std=gnu18. > > Could you compare with c18 or gnu99, to determine if the issue is with c vs > gnu (most likely since fma seems important) or 99 vs 18? Good point. Also could you please add latency metrics, I see that your testing framework already exposes the '--latency' flag. I could reproduce a similar though less dramatic slowdown and am investigating. Alexander ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-05 14:56 ` Alexander Monakov @ 2022-05-06 7:46 ` Paul Zimmermann 2022-05-06 9:27 ` Alexander Monakov 0 siblings, 1 reply; 13+ messages in thread From: Paul Zimmermann @ 2022-05-06 7:46 UTC (permalink / raw) To: Alexander Monakov; +Cc: gcc-help, stephane.glondu, marc.glisse, sibid Dear Alexander, > Good point. Also could you please add latency metrics, I see that your testing > framework already exposes the '--latency' flag. here are latency metrics (still on i5-4590): | gcc-9 | gcc-10 | gcc-11 | ------------|-------|--------|--------| -std=c99 | 70.8 | 70.3 | 70.2 | -std=gnu18 | 59.5 | 59.5 | 59.5 | It thus seems the issue only appears for the reciprocal throughput. Best regards, Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-06 7:46 ` Paul Zimmermann @ 2022-05-06 9:27 ` Alexander Monakov 2022-05-07 6:11 ` Paul Zimmermann 2022-05-11 13:26 ` Alexander Monakov 0 siblings, 2 replies; 13+ messages in thread From: Alexander Monakov @ 2022-05-06 9:27 UTC (permalink / raw) To: Paul Zimmermann; +Cc: gcc-help, stephane.glondu, marc.glisse, sibid On Fri, 6 May 2022, Paul Zimmermann via Gcc-help wrote: > here are latency metrics (still on i5-4590): > > | gcc-9 | gcc-10 | gcc-11 | > ------------|-------|--------|--------| > -std=c99 | 70.8 | 70.3 | 70.2 | > -std=gnu18 | 59.5 | 59.5 | 59.5 | > > It thus seems the issue only appears for the reciprocal throughput. Thanks. The primary issue here is false dependency on vcvtss2sd instruction. In the snippet shown in Stéphane's email, the slower variant begins with vcvtss2sd -0x4(%rsp),%xmm1,%xmm1 The cvtss2sd instruction is specified to take the upper bits of SSE register unmodified, so here it merges high bits of xmm1 with results of float->double conversion (in low bits) into new xmm1. Unless the CPU can track dependencies separately for vector register components, it has to delay this instruction until the previous computation that modified xmm1 has completed (AMD Zen2 is an example of a microarchitecture that apparently can). This limits the degree to which separate cr_log10f can overlap, affecting throughput. In latency measurements, the calls are already serialized by dependency over xmm0, so the additional false dependency does not matter. (so fma is a "red herring", it's just that depending on compiler version and flags, register allocation will place last assignment into xmm1 differently) If you want to experiment, you can hand-edit assembly to replace the problematic instruction with variants that avoid the false dependency, such as vcvtss2sd %xmm0, %xmm0, %xmm1 or vpxor %xmm1, %xmm1, %xmm1 vcvtss2sd -0x4(%rsp),%xmm1,%xmm1 GCC has code to do this automatically, but for some reason it doesn't work for your function. I have reported in to the Bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 Alexander ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-06 9:27 ` Alexander Monakov @ 2022-05-07 6:11 ` Paul Zimmermann 2022-05-11 13:26 ` Alexander Monakov 1 sibling, 0 replies; 13+ messages in thread From: Paul Zimmermann @ 2022-05-07 6:11 UTC (permalink / raw) To: Alexander Monakov; +Cc: gcc-help, stephane.glondu, marc.glisse, sibid thank you very much Alexander for your analysis and the bugzilla report! Paul > Date: Fri, 6 May 2022 12:27:39 +0300 (MSK) > From: Alexander Monakov <amonakov@ispras.ru> > > On Fri, 6 May 2022, Paul Zimmermann via Gcc-help wrote: > > > here are latency metrics (still on i5-4590): > > > > | gcc-9 | gcc-10 | gcc-11 | > > ------------|-------|--------|--------| > > -std=c99 | 70.8 | 70.3 | 70.2 | > > -std=gnu18 | 59.5 | 59.5 | 59.5 | > > > > It thus seems the issue only appears for the reciprocal throughput. > > Thanks. > > The primary issue here is false dependency on vcvtss2sd instruction. In the > snippet shown in Stéphane's email, the slower variant begins with > > vcvtss2sd -0x4(%rsp),%xmm1,%xmm1 > > The cvtss2sd instruction is specified to take the upper bits of SSE register > unmodified, so here it merges high bits of xmm1 with results of float->double > conversion (in low bits) into new xmm1. Unless the CPU can track dependencies > separately for vector register components, it has to delay this instruction > until the previous computation that modified xmm1 has completed (AMD Zen2 is > an example of a microarchitecture that apparently can). > > This limits the degree to which separate cr_log10f can overlap, affecting > throughput. In latency measurements, the calls are already serialized by > dependency over xmm0, so the additional false dependency does not matter. > > (so fma is a "red herring", it's just that depending on compiler version and > flags, register allocation will place last assignment into xmm1 differently) > > If you want to experiment, you can hand-edit assembly to replace the problematic > instruction with variants that avoid the false dependency, such as > > vcvtss2sd %xmm0, %xmm0, %xmm1 > > or > > vpxor %xmm1, %xmm1, %xmm1 > vcvtss2sd -0x4(%rsp),%xmm1,%xmm1 > > GCC has code to do this automatically, but for some reason it doesn't work for > your function. I have reported in to the Bugzilla: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105504 > > Alexander ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-06 9:27 ` Alexander Monakov 2022-05-07 6:11 ` Paul Zimmermann @ 2022-05-11 13:26 ` Alexander Monakov 1 sibling, 0 replies; 13+ messages in thread From: Alexander Monakov @ 2022-05-11 13:26 UTC (permalink / raw) To: Paul Zimmermann; +Cc: gcc-help, stephane.glondu, marc.glisse, sibid On Fri, 6 May 2022, Alexander Monakov wrote: > The primary issue here is false dependency on vcvtss2sd instruction. In the > snippet shown in Stéphane's email, the slower variant begins with > > vcvtss2sd -0x4(%rsp),%xmm1,%xmm1 > > The cvtss2sd instruction is specified to take the upper bits of SSE register > unmodified, so here it merges high bits of xmm1 with results of float->double > conversion (in low bits) into new xmm1. Unless the CPU can track dependencies > separately for vector register components, it has to delay this instruction > until the previous computation that modified xmm1 has completed (AMD Zen2 is > an example of a microarchitecture that apparently can). For future reference, my statement in parenthesis was a bit inaccurate: Zen 2 avoids the false dependency provided that xmm1 carries all-zeroes in high bits after being idiomatically zeroed (i.e. via pxor). Thanks to Andreas Abel for pointing out there's a limitation. (nevertheless, the "blessed" state seemingly survives context switches, so it's quite useful, including this testcase) Alexander ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: slowdown with -std=gnu18 with respect to -std=c99 2022-05-05 14:41 ` Marc Glisse 2022-05-05 14:56 ` Alexander Monakov @ 2022-05-05 17:50 ` Paul Zimmermann 1 sibling, 0 replies; 13+ messages in thread From: Paul Zimmermann @ 2022-05-05 17:50 UTC (permalink / raw) To: gcc-help; +Cc: stephane.glondu, amonakov, gcc-help, sibid > Date: Thu, 5 May 2022 16:41:28 +0200 (CEST) > From: Marc Glisse <marc.glisse@inria.fr> > > On Thu, 5 May 2022, Stéphane Glondu via Gcc-help wrote: > > > As additional data points, the performance with several versions of gcc > > (as packaged in Debian testing/unstable): > > > > | gcc-9 | gcc-10 | gcc-11 | gcc-12 | > > ------------|-------|--------|--------|--------| > > -std=c99 | 24 | 23.5 | 23 | 23 | > > -std=gnu18 | 43 | 16.8 | 38 | 38 | > > > > One can see that the performance stays relatively constant with > > -std=c99, but varies significantly with -std=gnu18. > > Could you compare with c18 or gnu99, to determine if the issue is with c > vs gnu (most likely since fma seems important) or 99 vs 18? yes it is easy. On another i5: | gcc-9 | gcc-10 | gcc-11 | ------------|-------|--------|--------| -std=c99 | 24.3 | 23.8 | 23.8 | -std=c18 | 24.4 | 23.8 | 23.9 | -std=gnu99 | 42.9 | 19.2 | 35.0 | -std=gnu18 | 42.9 | 19.2 | 35.0 | Thus the issue is definitely c vs gnu. Paul ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2022-05-11 13:26 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-03 8:28 slowdown with -std=gnu18 with respect to -std=c99 Paul Zimmermann 2022-05-03 9:09 ` Alexander Monakov 2022-05-03 11:45 ` Paul Zimmermann 2022-05-03 12:12 ` Alexander Monakov 2022-05-05 8:57 ` Stéphane Glondu 2022-05-05 14:31 ` Stéphane Glondu 2022-05-05 14:41 ` Marc Glisse 2022-05-05 14:56 ` Alexander Monakov 2022-05-06 7:46 ` Paul Zimmermann 2022-05-06 9:27 ` Alexander Monakov 2022-05-07 6:11 ` Paul Zimmermann 2022-05-11 13:26 ` Alexander Monakov 2022-05-05 17:50 ` Paul Zimmermann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).