From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id BFD523858CD1 for ; Mon, 31 Jul 2023 09:40:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BFD523858CD1 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id E536E22200; Mon, 31 Jul 2023 09:40:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1690796418; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=eIAEkpA2HorJDqEWt+ECLqqkn0UjoYThLjUJnH97kBM=; b=1PcFhIkqX3S4SDjxRtTr64YjvBNu67a/1dLdjOP0v5z0YQI41XdeJ2H5u40F5gA8RF6Zht FUv6zK8rEGs0HKTPu7U2nKJ23jGeJRAk/uMhi78zJW7nUJ+V/jc4Gh2l3EwHTgvk1pJ+gE O1DtLcyAkuh0SZE3Mp4xaYZCd7OS8G4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1690796418; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=eIAEkpA2HorJDqEWt+ECLqqkn0UjoYThLjUJnH97kBM=; b=w7Fx1edjZxzmQM2VkafAC29nCcw8KtLvHc9fiiC8kGDIvciwKS0SLZh6y25cKueXZ9UNEX IGglZRRNGrrXtzBQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id ACCD12C142; Mon, 31 Jul 2023 09:40:18 +0000 (UTC) Date: Mon, 31 Jul 2023 09:40:18 +0000 (UTC) From: Richard Biener To: Uros Bizjak cc: "gcc-patches@gcc.gnu.org" , Jan Hubicka , Hongtao Liu Subject: Re: [RFC PATCH] i386: Do not sanitize upper part of V2SFmode reg with -fno-trapping-math [PR110832] In-Reply-To: Message-ID: References: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Sun, 30 Jul 2023, Uros Bizjak wrote: > Also introduce -m[no-]mmxfp-with-sse option to disable trapping V2SF > named patterns in order to avoid generation of partial vector V4SFmode > trapping instructions. > > The new option is enabled by default, because even with sanitization, > a small but consistent speed up of 2 to 3% with Polyhedron capacita > benchmark can be achieved vs. scalar code. > > Using -fno-trapping-math improves Polyhedron capacita runtime 8 to 9% > vs. scalar code. This is what clang does by default, as it defaults > to -fno-trapping-math. I like the new option, note you lack invoke.texi documentation where I'd also elaborate a bit on the interaction with -fno-trapping-math and the possible performance impact then NaNs or denormals leak into the upper halves and cross-reference -mdaz-ftz. Thanks, Richard. > PR target/110832 > > gcc/ChangeLog: > > * config/i386/i386.h (TARGET_MMXFP_WITH_SSE): New macro. > * config/i386/i386/opt (mmmxfp-with-sse): New option. > * config/i386/mmx.md (movq__to_sse): Do not sanitize > upper part of V2SFmode register with -fno-trapping-math. > (v2sf3): Enable for TARGET_MMXFP_WITH_SSE. > (divv2sf3): Ditto. > (v2sf3): Ditto. > (sqrtv2sf2): Ditto. > (*mmx_haddv2sf3_low): Ditto. > (*mmx_hsubv2sf3_low): Ditto. > (vec_addsubv2sf3): Ditto. > (vec_cmpv2sfv2si): Ditto. > (vcondv2sf): Ditto. > (fmav2sf4): Ditto. > (fmsv2sf4): Ditto. > (fnmav2sf4): Ditto. > (fnmsv2sf4): Ditto. > (fix_truncv2sfv2si2): Ditto. > (fixuns_truncv2sfv2si2): Ditto. > (floatv2siv2sf2): Ditto. > (floatunsv2siv2sf2): Ditto. > (nearbyintv2sf2): Ditto. > (rintv2sf2): Ditto. > (lrintv2sfv2si2): Ditto. > (ceilv2sf2): Ditto. > (lceilv2sfv2si2): Ditto. > (floorv2sf2): Ditto. > (lfloorv2sfv2si2): Ditto. > (btruncv2sf2): Ditto. > (roundv2sf2): Ditto. > (lroundv2sfv2si2): Ditto. > > Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. > > Uros. > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)