From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 2C6E73858CDB; Mon, 28 Nov 2022 10:36:10 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2C6E73858CDB
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1669631770;
	bh=mEKtHcxtp2K+LRJEz50HWv0omGFtZHdg0DfSLnRQsr8=;
	h=From:To:Subject:Date:From;
	b=AT4g7UB6M2Ojs1UvhdG3HAuKxjuavGQltKfww6gAlv7VkO2TFklIc6sTAhYQedM7+
	 OgNfRtYLTic9mULmrUzSbS8025qfCzPpjQBUq1ADWrylgk9t4QV27R8FqFgfjJZrPw
	 T9yy4rKiZKnTp5pZ3CXfz1+wMB+3Ce50QykTfoMY=
From: "jengelh at inai dot de" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107895] New: mt19937 bad performance on LP64
Date: Mon, 28 Nov 2022 10:36:09 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 12.2.1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jengelh at inai dot de
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-107895-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107895

            Bug ID: 107895
           Summary: mt19937 bad performance on LP64
           Product: gcc
           Version: 12.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jengelh at inai dot de
  Target Milestone: ---

Input
=3D=3D=3D=3D=3D

#include <random>
#include <unistd.h>
static std::mt19937 rng;
int main() {
        for (size_t j =3D 0; j < 0x8000; ++j) {
                uint32_t numbers[65536];
                for (size_t i =3D 0; i < std::size(numbers); ++i)
                        numbers[i] =3D rng();
                // ensure number generation is not all optimized away
                write(STDOUT_FILENO, numbers, sizeof(numbers));
        }
}


Observed
=3D=3D=3D=3D=3D=3D=3D=3D

Target: x86_64-suse-linux
gcc version 12.2.1 20221020 [revision 0aaef83351473e8f4eb774f8f999bbe87a486=
6d7]
(SUSE Linux)

$ g++ x.cpp -O2 && time ./a.out >/dev/zero

                  -m32    -m64
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D  =3D=3D=3D=
=3D=3D=3D
std::mt19937      3.9s   11.5s
std::mt19937_64  14.0s   11.6s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D  =3D=3D=3D=
=3D=3D=3D
error =C2=B10.1s

With -ftree-loop-if-convert [Bug #80520], but still not at -m32 levels:

+-ftree-          -m32    -m64
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D  =3D=3D=3D=
=3D=3D=3D
std::mt19937      3.9s    5.2s
std::mt19937_64  14.0s    5.4s
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D  =3D=3D=3D=3D=3D  =3D=3D=3D=
=3D=3D=3D
error =C2=B10.1s


Expected
=3D=3D=3D=3D=3D=3D=3D=3D

Expected to see <=3D 4.7s on -m64 at all times. (3.9 + ~20% margin for wider
transfers CPU<->caches/RAM)

The -m64 versions should have somewhat equal runtime or faster runtime (bec=
ause
more registers, more opportunities); concerns like https://gmplib.org/32vs64
apply to old CPUs, but I do not think it's indicative of how contemporary
x86_64 systems perform.


Additional information
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

CPUs:
 "11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz"
    [fam 6 model 140 stepping 1 microcode 0xa4] and
 "AMD Ryzen 7 3700X 8-Core Processor"
    [fam 23 model 113 stepping 0 microcode 0x8701013]
    (about 3.0 and 10.2 seconds runtime, respectively)=