From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sourceware.org (Postfix) with ESMTPS id 347973858418 for ; Tue, 3 May 2022 11:45:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 347973858418 Received-SPF: SoftFail (mail2-relais-roc.national.inria.fr: domain of Paul.Zimmermann@inria.fr is inclined to not designate 152.81.10.51 as permitted sender) identity=mailfrom; client-ip=152.81.10.51; receiver=mail2-relais-roc.national.inria.fr; envelope-from="Paul.Zimmermann@inria.fr"; x-sender="Paul.Zimmermann@inria.fr"; x-conformance=spf_only; x-record-type="v=spf1"; x-record-text="v=spf1 ip4:192.134.164.0/24 mx ~all" Received-SPF: None (mail2-relais-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@tomate) identity=helo; client-ip=152.81.10.51; receiver=mail2-relais-roc.national.inria.fr; envelope-from="Paul.Zimmermann@inria.fr"; x-sender="postmaster@tomate"; x-conformance=spf_only X-IronPort-AV: E=Sophos;i="5.91,195,1647298800"; d="scan'208";a="34470104" Received: from tomate.loria.fr (HELO tomate) ([152.81.10.51]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 May 2022 13:45:00 +0200 Date: Tue, 03 May 2022 13:45:00 +0200 Message-Id: From: Paul Zimmermann To: Alexander Monakov Cc: gcc-help@gcc.gnu.org, stephane.glondu@inria.fr, sibid@uvic.ca In-Reply-To: <9f7e3aa9-8d46-1fbb-75b-1c8ad9a667f@ispras.ru> (message from Alexander Monakov on Tue, 3 May 2022 12:09:32 +0300 (MSK)) Subject: Re: slowdown with -std=gnu18 with respect to -std=c99 References: <9f7e3aa9-8d46-1fbb-75b-1c8ad9a667f@ispras.ru> X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2022 11:45:03 -0000 thank you very much Alexander. > Date: Tue, 3 May 2022 12:09:32 +0300 (MSK) > From: Alexander Monakov > cc: gcc-help@gcc.gnu.org, stephane.glondu@inria.fr, sibid@uvic.ca > > On Tue, 3 May 2022, Paul Zimmermann via Gcc-help wrote: > > > Does anyone have a clue? > > I can reproduce a difference, but in my case it's simply because in -std=gnuXX > mode (as opposed to -std=cXX) GCC enables FMA contraction, enabling the last few > steps in the benchmarked function to use fma instead of separate mul/add > instructions. but then you should get better (i.e. smaller) timings with -std=gnuXX than with -std=cXX, instead of worse timings as we get? > (regarding __builtin_expect, it also makes a small difference in my case, > it seems GCC generates some redundant code without it, but the difference is > 10x smaller than what presence/absence of FMA gives) > > I think you might be able to figure it out on your end if you run both variants > under 'perf stat', note how cycle count and instruction counts change, and then > look at disassembly to see what changed. You can use 'perf record' and 'perf > report' to easily see the hot code path; if you do that, I'd recommend to run > it with the same sampling period in both cases, e.g. like this: > > perf record -e instructions:P -c 500000 ./perf ... thank you, we'll investigate that. Best regards, Paul