public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* simple optimisation question
@ 2024-04-09 17:26 zamfofex
  2024-04-10  1:26 ` LIU Hao
  0 siblings, 1 reply; 2+ messages in thread
From: zamfofex @ 2024-04-09 17:26 UTC (permalink / raw)
  To: gcc-help

I was recently investigating what these two (relatively similar) functions would compile to:

- - - - -
int read0(int (*x)[12], int i, int j)
{
    return x[i][j];
}

int read1(int *x, int i, int j)
{
    return x[i * 12 + j];
}
- - - - -

I found (through Godbolt) the following behavior (all on x86-64):

- On GCC: The two functions compile to different assembly code regardless of the flags tested.

- On Clang: They compile to the same assembly code under ‘-m32 -Oz’.

- On Clang: They compile to different assembly code under other tested flags.

The flags I tested were ‘-O3’ vs. ‘-Oz’ and ‘-m32’ vs. none. (Four combinations per compiler.)

In GCC, the assembly code, although different, under ‘-m32 -Oz’ was of the same size (in bytes, after assembled) for both functions. For ‘-Oz’ withough ‘-m32’, the first one was larger.

Is this a missed size optimisation for x86-64? Even in the case where the assembly code is larger, the time performance difference seems unobservable. (Though I’d have imagined the the larger one would have been slower in each case.)

These are URLs for the tests I made:

- ‘gcc -O3’: https://godbolt.org/z/8T77ehaEE
- ‘gcc -Oz’: https://godbolt.org/z/1E1jMxY78
- ‘gcc -m32 -O3’: https://godbolt.org/z/P8G614xzv
- ‘gcc -m32 -Oz’: https://godbolt.org/z/TTEGKqzn1
- ‘clang -O3’: https://godbolt.org/z/nEPbvfsb6
- ‘clang -Oz’: https://godbolt.org/z/hv1z3dEcf
- ‘clang -m32 -O3’: https://godbolt.org/z/6snbhj68n
- ‘clang -m32 -Oz’: https://godbolt.org/z/EEK87dn3d

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: simple optimisation question
  2024-04-09 17:26 simple optimisation question zamfofex
@ 2024-04-10  1:26 ` LIU Hao
  0 siblings, 0 replies; 2+ messages in thread
From: LIU Hao @ 2024-04-10  1:26 UTC (permalink / raw)
  To: zamfofex, gcc-help


[-- Attachment #1.1: Type: text/plain, Size: 1352 bytes --]

在 2024-04-10 01:26, zamfofex 写道:
> The flags I tested were ‘-O3’ vs. ‘-Oz’ and ‘-m32’ vs. none. (Four combinations per compiler.)
> 
> In GCC, the assembly code, although different, under ‘-m32 -Oz’ was of the same size (in bytes, after assembled) for both functions. For ‘-Oz’ withough ‘-m32’, the first one was larger.

The first piece of code involves two sign-extension operations, as in

    char* p = (char*) x;
    return *(x + (ptrdiff_t) i * 48 + (ptrdiff_t) j * 4);

and the second one involves one one, as in

    char* p = (char*) x;
    return *(x + (ptrdiff_t) (i * 48 + j * 4));


For -m32 the assembly differs a little, but as far as I can tell there is almost no difference.


> Is this a missed size optimisation for x86-64? Even in the case where the assembly code is larger, the time performance difference seems unobservable. (Though I’d have imagined the the larger one would have been slower in each case.)

Maybe. My suggestion is to avoid `int` as subscripts for x86-64, as it involves unnecessary 
sign-extensions.

And it's not always the case that larger ones are slower. Intel CPUs recognize a lot of patterns to 
break dependencies (such as `xor eax, eax`, and similarly `xorps xmm0, xmm0`), which may make larger 
code faster.


-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-04-10  1:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-09 17:26 simple optimisation question zamfofex
2024-04-10  1:26 ` LIU Hao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).