From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 2C6E73858CDB; Mon, 28 Nov 2022 10:36:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2C6E73858CDB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1669631770; bh=mEKtHcxtp2K+LRJEz50HWv0omGFtZHdg0DfSLnRQsr8=; h=From:To:Subject:Date:From; b=AT4g7UB6M2Ojs1UvhdG3HAuKxjuavGQltKfww6gAlv7VkO2TFklIc6sTAhYQedM7+ OgNfRtYLTic9mULmrUzSbS8025qfCzPpjQBUq1ADWrylgk9t4QV27R8FqFgfjJZrPw T9yy4rKiZKnTp5pZ3CXfz1+wMB+3Ce50QykTfoMY= From: "jengelh at inai dot de" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107895] New: mt19937 bad performance on LP64 Date: Mon, 28 Nov 2022 10:36:09 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.2.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jengelh at inai dot de X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107895 Bug ID: 107895 Summary: mt19937 bad performance on LP64 Product: gcc Version: 12.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jengelh at inai dot de Target Milestone: --- Input =3D=3D=3D=3D=3D #include #include static std::mt19937 rng; int main() { for (size_t j =3D 0; j < 0x8000; ++j) { uint32_t numbers[65536]; for (size_t i =3D 0; i < std::size(numbers); ++i) numbers[i] =3D rng(); // ensure number generation is not all optimized away write(STDOUT_FILENO, numbers, sizeof(numbers)); } } Observed =3D=3D=3D=3D=3D=3D=3D=3D Target: x86_64-suse-linux gcc version 12.2.1 20221020 [revision 0aaef83351473e8f4eb774f8f999bbe87a486= 6d7] (SUSE Linux) $ g++ x.cpp -O2 && time ./a.out >/dev/zero -m32 -m64 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D std::mt19937 3.9s 11.5s std::mt19937_64 14.0s 11.6s =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D error =C2=B10.1s With -ftree-loop-if-convert [Bug #80520], but still not at -m32 levels: +-ftree- -m32 -m64 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D std::mt19937 3.9s 5.2s std::mt19937_64 14.0s 5.4s =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D =3D=3D=3D= =3D=3D=3D error =C2=B10.1s Expected =3D=3D=3D=3D=3D=3D=3D=3D Expected to see <=3D 4.7s on -m64 at all times. (3.9 + ~20% margin for wider transfers CPU<->caches/RAM) The -m64 versions should have somewhat equal runtime or faster runtime (bec= ause more registers, more opportunities); concerns like https://gmplib.org/32vs64 apply to old CPUs, but I do not think it's indicative of how contemporary x86_64 systems perform. Additional information =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D CPUs: "11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz" [fam 6 model 140 stepping 1 microcode 0xa4] and "AMD Ryzen 7 3700X 8-Core Processor" [fam 23 model 113 stepping 0 microcode 0x8701013] (about 3.0 and 10.2 seconds runtime, respectively)=