From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 832043858D37; Tue, 14 Mar 2023 11:10:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 832043858D37 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678792251; bh=Aj7QOXKENEFTcWIMVfTcPY+a+LbEjiy2sVximYXxGYo=; h=From:To:Subject:Date:From; b=ZJccaEN1EAsDtpzKBuoPasx2qiWpHXIYJ2zW6UWEwBeaWuIKKuSSbmJ247EWGvObB +lpjaVO71FYr9OPJ8gnCLPzwxKHEces4vwRlr5xS9zdWtbN3vMZq3Xv00KEsWz/yWz LISuSUvk6QSZbUqR9fFFbNlepsh4Ja9wohVc2Gus= From: "dmitriy.ovdienko at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/109127] New: More advanced constexpr value compile time evaluation Date: Tue, 14 Mar 2023 11:10:50 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: dmitriy.ovdienko at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D109127 Bug ID: 109127 Summary: More advanced constexpr value compile time evaluation Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: dmitriy.ovdienko at gmail dot com Target Milestone: --- Hello, I'd like to report the idea which could improve the application performance. The idea is related to `constexpr` math, which can be performed at compile time. At some degree C++ compiler manages to perform the optimization. But = in my more real example for some reason it does not perform that kind of optimization. Let's start with the simple example which explains the idea and which works. Following function serializes the `constexpr` unsigned into the string. It = does not work right, as an output is reversed, but we will get into it later. ```cpp // The expected output is "543\0" void foo1(char* ptr) { constexpr unsigned Tag =3D 345; auto v =3D Tag; do { *ptr++ =3D (v % 10) + '0'; v /=3D 10; } while(v); *ptr =3D 0; } ``` The produced assembly is as following: ```asm foo1(char*): mov eax, DWORD PTR .LC0[rip] mov DWORD PTR [rdi], eax ret .LC0: .byte 53 .byte 52 .byte 51 .byte 0 ``` It is good enough. I would replace the reading from the memory `.LC0` with = the hardcoded unsigned integer though, so CPU does not have to access other mem= ory locations: ``` mov eax, 0x35343300 ; instead of mov eax, DWORD PTR .LC0[rip] ``` Now, I change the code a bit to use 16-base math. That is an intermediate s= tep before we go to the real code: ```cpp void foo2(char* ptr) { constexpr unsigned Tag =3D 0xF345; auto v =3D Tag; while(v !=3D 0xF) { *ptr++ =3D (v % 16) + '0'; v /=3D 16; } *ptr =3D 0; } ``` The assembly is the same as above, which is good. The thing which does not work is if I reverse the output bytes, then compil= er does not perform the `constexpr` math in the compile time: ```cpp void foo3(char* ptr) { constexpr unsigned Tag =3D 0x345; // Convert 0x345 -> 0xF543 auto v =3D Tag; auto reversed =3D 0xFu; // 0xF is a stop value while(v) { reversed <<=3D 4; reversed |=3D v & 0xFu; v >>=3D 4; } // Now serialize 0xF543 into "345\0" while(reversed !=3D 0xF) { *ptr++ =3D (reversed % 16) + '0'; reversed /=3D 16; } *ptr =3D 0; } ``` The assembly output is following: ```asm foo3(char*): mov eax, 62277 .L2: mov edx, eax add rdi, 1 shr eax, 4 and edx, 15 add edx, 48 mov BYTE PTR [rdi-1], dl cmp eax, 15 jne .L2 mov BYTE PTR [rdi], 0 ret ``` In the assembly above there is a `.L2` loop, which could be calculated duri= ng the compilation. The workaround is to force compiler to calculate the reversed unsigned and store it as constexpr: ```cpp constexpr unsigned reverse(unsigned v) { auto reversed =3D 0xFu; while(v) { reversed <<=3D 4; reversed |=3D v & 0xFu; v >>=3D 4; } return reversed; } void foo3(char* ptr) { constexpr unsigned Tag =3D 0x543; constexpr unsigned ReversedTag =3D reverse(Tag); auto reversed =3D ReversedTag; while(reversed !=3D 0xF) { *ptr++ =3D (reversed % 16) + '0'; reversed /=3D 16; } *ptr =3D 0; } ``` The assembly is back to normal: ```cpp foo3(char*): mov eax, DWORD PTR .LC0[rip] mov DWORD PTR [rdi], eax ret .LC0: .byte 53 .byte 52 .byte 51 .byte 0 ```=