From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by sourceware.org (Postfix) with ESMTPS id 107B33857C66 for ; Sun, 23 May 2021 00:42:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 107B33857C66 Received: by mail-qk1-x734.google.com with SMTP id q10so23705824qkc.5 for ; Sat, 22 May 2021 17:42:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=zua9/irCL/8gDkyzqEQeCSlkq/3K05Qp7+WhDpgJ5ho=; b=fhtLoS6+hBjDJTcIncoQFG3X6A8emYJ72LRa6KpFXBXy4M/LN2xxK7EQQllxDD1KW5 NPZRwewhlzMEHrYUiKlZeoSGsxzr4uMDcVoq3HoA5MV3I4oc0jrgvgio85AD3Bw9cbaa +JTQzCMko5fhMgGVa0eaDuVXjMcSxab7h+seKPwTeDT2NyBAxVcBLhpsEkiYue6qQbhN PlsvJRyHOh0QZcSbbDr8D42vdwhQYr4X/ZAV3sY+pLMfuPdAWLszYva1/uLG1+wmDT6M jkth1AUpXxJ2WipGY6kRrlkkqVjI5xTX4W/h12j3xlggRelIb8tHLJ/t7U/dJpSWw9Qg EKtg== X-Gm-Message-State: AOAM5331M78srl2fmpsrCE7o/ZkDCyITqucCfcnUep1ZvQ5SKIGHtiHA jgWyYX2zTYTH2KxaxeW/eoT0hDsb3+u8JEalrov9RMQKrA4RAg== X-Google-Smtp-Source: ABdhPJw+ZD6lQZTKLlfzeEeUW1p3MJP6yBxxKZP1qT//LCf298r1dk5/ns8854orxoyUuFsGp+Z5VzYSu9CdGp0LnEQ= X-Received: by 2002:a37:a9d1:: with SMTP id s200mr20546565qke.64.1621730522500; Sat, 22 May 2021 17:42:02 -0700 (PDT) MIME-Version: 1.0 References: <20210522002227.2234377-1-ibmibmibm.tw@gmail.com> <20210522002227.2234377-2-ibmibmibm.tw@gmail.com> In-Reply-To: From: =?UTF-8?B?6Kyd5piH6YGUKFNoZW4tVGEgSHNpZWgp?= Date: Sun, 23 May 2021 08:41:50 +0800 Message-ID: Subject: Re: [PATCH v6 2/3] x86_64: roundeven with sse4.1 support To: Paul Zimmermann Cc: libc-alpha@sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 23 May 2021 00:42:04 -0000 Dear Paul, Paul Zimmermann =E6=96=BC 2021=E5=B9=B45=E6=9C= =8822=E6=97=A5 =E9=80=B1=E5=85=AD =E4=B8=8B=E5=8D=8812:52=E5=AF=AB=E9=81=93= =EF=BC=9A > > Dear Shen-Ta, > > > Here is a benchmark result on my AMD Ryzen 9 3900X system: > > > > * benchmark result before this commit > > | | roundeven | roundevenf | > > |------------|--------------|--------------| > > | duration | 3.77659e+09 | 3.77504e+09 | > > | iterations | 3.97043e+08 | 4.36752e+08 | > > | max | 83.714 | 58.861 | > > | min | 7.144 | 6.27 | > > | mean | 9.51179 | 8.64345 | > > > > * benchmark result after this commit > > | | roundeven | roundevenf | > > |------------|--------------|--------------| > > | duration | 3.76913e+09 | 3.76923e+09 | > > | iterations | 5.55921e+08 | 5.64822e+08 | > > | max | 211.698 | 439.09 | > > | min | 6.498 | 6.422 | > > | mean | 6.77998 | 6.6733 | > > I wonder why the max times have increased by a factor 2.5 and 7.5. > In my experiments I noticed that the "mean" time was quite stable, > while the "max" time could vary a lot between different runs. Thus > I usually run 5 times "make bench" and keep the smallest times: > > $ ./testrun.sh benchtests/bench-roundeven > > You can also try make USE_RDTSCP=3D1 bench (cf benchtests/README). > > And it would be nice to have figures on another hardware (for example Int= el). > > Best regards, > Paul Zimmermann > I've tried to run another benchmark following your advises, and here is the result: # AMD Ryzen 9 3900X 12-Core Processor * benchmark result before this commit | | roundeven | roundevenf | |------------|--------------|--------------| | duration | 3.75587e+09 | 3.75114e+09 | | iterations | 3.93053e+08 | 4.35402e+08 | | max | 52.592 | 58.71 | | min | 7.98 | 7.22 | | mean | 9.55563 | 8.61535 | * benchmark result after this commit | | roundeven | roundevenf | |------------|---------------|--------------| | duration | 3.73815e+09 | 3.73738e+09 | | iterations | 5.82692e+08 | 5.91498e+08 | | max | 56.468 | 51.642 | | min | 6.27 | 6.156 | | mean | 6.41532 | 6.3185 | # Intel(R) Pentium(R) CPU D1508 @ 2.20GHz * benchmark result before this commit | | roundeven | roundevenf | |------------|--------------|--------------| | duration | 2.18208e+09 | 2.18258e+09 | | iterations | 2.39932e+08 | 2.46924e+08 | | max | 96.378 | 98.035 | | min | 6.776 | 5.94 | | mean | 9.09456 | 8.83907 | * benchmark result after this commit | | roundeven | roundevenf | |------------|--------------|--------------| | duration | 2.17415e+09 | 2.17005e+09 | | iterations | 3.56193e+08 | 4.09824e+08 | | max | 51.693 | 97.192 | | min | 5.926 | 5.093 | | mean | 6.10385 | 5.29507 | I ran each tests about 20 times, and pick a stable result out.