public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109127] New: More advanced constexpr value compile time evaluation
@ 2023-03-14 11:10 dmitriy.ovdienko at gmail dot com
0 siblings, 0 replies; only message in thread
From: dmitriy.ovdienko at gmail dot com @ 2023-03-14 11:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109127
Bug ID: 109127
Summary: More advanced constexpr value compile time evaluation
Product: gcc
Version: 12.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: dmitriy.ovdienko at gmail dot com
Target Milestone: ---
Hello,
I'd like to report the idea which could improve the application performance.
The idea is related to `constexpr` math, which can be performed at compile
time. At some degree C++ compiler manages to perform the optimization. But in
my more real example for some reason it does not perform that kind of
optimization.
Let's start with the simple example which explains the idea and which works.
Following function serializes the `constexpr` unsigned into the string. It does
not work right, as an output is reversed, but we will get into it later.
```cpp
// The expected output is "543\0"
void foo1(char* ptr)
{
constexpr unsigned Tag = 345;
auto v = Tag;
do
{
*ptr++ = (v % 10) + '0';
v /= 10;
}
while(v);
*ptr = 0;
}
```
The produced assembly is as following:
```asm
foo1(char*):
mov eax, DWORD PTR .LC0[rip]
mov DWORD PTR [rdi], eax
ret
.LC0:
.byte 53
.byte 52
.byte 51
.byte 0
```
It is good enough. I would replace the reading from the memory `.LC0` with the
hardcoded unsigned integer though, so CPU does not have to access other memory
locations:
```
mov eax, 0x35343300
; instead of
mov eax, DWORD PTR .LC0[rip]
```
Now, I change the code a bit to use 16-base math. That is an intermediate step
before we go to the real code:
```cpp
void foo2(char* ptr)
{
constexpr unsigned Tag = 0xF345;
auto v = Tag;
while(v != 0xF)
{
*ptr++ = (v % 16) + '0';
v /= 16;
}
*ptr = 0;
}
```
The assembly is the same as above, which is good.
The thing which does not work is if I reverse the output bytes, then compiler
does not perform the `constexpr` math in the compile time:
```cpp
void foo3(char* ptr)
{
constexpr unsigned Tag = 0x345;
// Convert 0x345 -> 0xF543
auto v = Tag;
auto reversed = 0xFu; // 0xF is a stop value
while(v)
{
reversed <<= 4;
reversed |= v & 0xFu;
v >>= 4;
}
// Now serialize 0xF543 into "345\0"
while(reversed != 0xF)
{
*ptr++ = (reversed % 16) + '0';
reversed /= 16;
}
*ptr = 0;
}
```
The assembly output is following:
```asm
foo3(char*):
mov eax, 62277
.L2:
mov edx, eax
add rdi, 1
shr eax, 4
and edx, 15
add edx, 48
mov BYTE PTR [rdi-1], dl
cmp eax, 15
jne .L2
mov BYTE PTR [rdi], 0
ret
```
In the assembly above there is a `.L2` loop, which could be calculated during
the compilation.
The workaround is to force compiler to calculate the reversed unsigned and
store it as constexpr:
```cpp
constexpr unsigned reverse(unsigned v)
{
auto reversed = 0xFu;
while(v)
{
reversed <<= 4;
reversed |= v & 0xFu;
v >>= 4;
}
return reversed;
}
void foo3(char* ptr)
{
constexpr unsigned Tag = 0x543;
constexpr unsigned ReversedTag = reverse(Tag);
auto reversed = ReversedTag;
while(reversed != 0xF)
{
*ptr++ = (reversed % 16) + '0';
reversed /= 16;
}
*ptr = 0;
}
```
The assembly is back to normal:
```cpp
foo3(char*):
mov eax, DWORD PTR .LC0[rip]
mov DWORD PTR [rdi], eax
ret
.LC0:
.byte 53
.byte 52
.byte 51
.byte 0
```
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2023-03-14 11:10 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-14 11:10 [Bug c++/109127] New: More advanced constexpr value compile time evaluation dmitriy.ovdienko at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).