From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 45DE03858CDB for ; Wed, 5 Oct 2022 20:02:29 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 45DE03858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665000148; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hFLUh+QrlI4RbFpJWBOx5FIuZSYV/hFMd9OM3Tawdig=; b=D1G7z1qRziVfEeMgJSHLdYn+kKlCcsNDE1tw9SE5XrKforZJaDEWk7rPyIGSBR/iuxXTGR qedNTBaDfj+ANt9XE3LxZALbI4humekmkD8rTw7mdKxc1HXWxQFd/kTHNxZEYij1vHN+/m aFVJFfUvfyVwKdbbh8lGH/x4F7ySebE= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-460-vfY9lsb3M8WohGNz4XkzOg-1; Wed, 05 Oct 2022 16:02:27 -0400 X-MC-Unique: vfY9lsb3M8WohGNz4XkzOg-1 Received: by mail-qk1-f200.google.com with SMTP id j13-20020a05620a410d00b006e08208eb31so4246174qko.3 for ; Wed, 05 Oct 2022 13:02:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=hFLUh+QrlI4RbFpJWBOx5FIuZSYV/hFMd9OM3Tawdig=; b=S2S6S1tchBT7JHUZayACYpy9sinZV+tylkaX1KO5gQhpSQ9yitx2uQntPzGoPm1ZEm JvIKT2/GsdexkSREoJ95UJWqB+6CgOpwrujSFG9e1w+LDdDRd7nkq7WIhvLgcGI5BF3U dxq5RwVPgEC3VQK0vm3bkb6O9WVK23EA9nLmzrs3IL+MFJsCyDmTqnW5UU+67mQsx862 X4QQLADQ25GVZu73ltNVcuDoFvbMuQz0Ukfc89hPKyGiXTp7rPQPKAoeuP6Y8/hpSuSH jwOYiOSeGruPcsE7C0gw+RWNE+WSH2QdT+Qak46QNtGvK8tlKS32u3nZ5YKkLwWKZYHt qQng== X-Gm-Message-State: ACrzQf2hL86OFxog05CrXOOzDBsrxPp1oHAasxD1tRZiAlIIXgx/sRvh atm9QFgjsoBEP2Gwj4KHcMDszKv7vrDjeFb3fbWPhpvRLE7n7qMLX5pKrSDGRDlngB99f+xrpY8 9+QQHwI4mhPfjrJZpWA== X-Received: by 2002:a05:620a:2848:b0:6af:6c3f:7141 with SMTP id h8-20020a05620a284800b006af6c3f7141mr884275qkp.548.1665000147371; Wed, 05 Oct 2022 13:02:27 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4XbVANrFBjFl3Csfvm+5Wh9cyvJRkHkE/p3urb1CU4NkNzUZy30F6OFnSympohoCqQjdmtBg== X-Received: by 2002:a05:620a:2848:b0:6af:6c3f:7141 with SMTP id h8-20020a05620a284800b006af6c3f7141mr884247qkp.548.1665000147039; Wed, 05 Oct 2022 13:02:27 -0700 (PDT) Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id bi41-20020a05620a31a900b006bb0f9b89cfsm18342641qkb.87.2022.10.05.13.02.26 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 05 Oct 2022 13:02:26 -0700 (PDT) Message-ID: <95f2abba-afb4-bb73-a9f0-b1578b28713a@redhat.com> Date: Wed, 5 Oct 2022 16:02:25 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.1 Subject: Re: [PATCH] middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic support To: Jakub Jelinek Cc: "Joseph S. Myers" , Richard Biener , Jeff Law , Uros Bizjak , gcc-patches@gcc.gnu.org References: <37522634-319a-b471-aa35-87e711b0479e@redhat.com> <55062a15-79a1-f8cf-ed20-25ca8ff42abe@redhat.com> From: Jason Merrill In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/5/22 09:47, Jakub Jelinek wrote: > On Tue, Oct 04, 2022 at 05:50:50PM -0400, Jason Merrill wrote: >>> Another question is the suffixes of the builtins. For now I have added >>> bf16 suffix and enabled the builtins with !both_p, so one always needs to >>> use __builtin_* form for them. None of the GCC builtins end with b, >>> so this isn't ambiguous with __builtin_*f16, but some libm functions do end >>> with b, in particular ilogb, logb and f{??,??x}sub. ilogb and the subs >>> always have it, but is __builtin_logbf16 f16 suffixed logb or bf16 suffixed >>> log? Shall the builtins use f16b suffixes instead like the mangling does? >> >> Do we want bf16 builtins at all? The impression I've gotten is that users >> want computation to happen in SFmode and only later truncate back to BFmode. > > As I wrote earlier, I think we need at least one, __builtin_nans variant > which would be used in libstdc++ > std::numeric_limits::signaling_NaN() implementation. > I think > std::numeric_limits::infinity() can be implemented as > return (__bf16) __builtin_huge_valf (); > and similarly > std::numeric_limits::quiet_NaN() as > return (__bf16) __builtin_nanf (""); > but > return (__bf16) __builtin_nansf (""); > would loose the signaling NaN on the conversion and raise exception, > and as the method is constexpr, > union { unsigned short a; __bf16 b; } u = { 0x7f81 }; > return u.b; > wouldn't work. I can certainly restrict the builtins to the single > one, but wonder whether the suffix for that builtin shouldn't be chosen > such that eventually we could add more builtins if we need to > and don't run into the log with bf16 suffix vs. logb with f16 suffix > ambiguity. > As you said, most of the libstdc++ overloads for std::bfloat16_t then > can use float builtins or library calls under the hood, but std::nextafter > is another case where I think we'll need to have something bfloat16_t > specific, because float ulp isn't bfloat16_t ulp, the latter is much larger. Makes sense. > Based on what Joseph wrote, I'll add bf16/BF16 suffix support for C too > in the next iteration (always with pedwarn in that case). > >>> @@ -5716,7 +5716,13 @@ emit_store_flag_1 (rtx target, enum rtx_ >>> { >>> machine_mode optab_mode = mclass == MODE_CC ? CCmode : compare_mode; >>> icode = optab_handler (cstore_optab, optab_mode); >>> - if (icode != CODE_FOR_nothing) >>> + if (icode != CODE_FOR_nothing >>> + /* Don't consider [BH]Fmode as usable wider mode, as neither is >>> + a subset or superset of the other. */ >>> + && (compare_mode == mode >>> + || !SCALAR_FLOAT_MODE_P (compare_mode) >>> + || maybe_ne (GET_MODE_PRECISION (compare_mode), >>> + GET_MODE_PRECISION (mode)))) >> >> Why do you need to do this here (and in prepare_cmp_insn, and similarly in >> can_compare_p)? Shouldn't get_wider skip over modes that are not actually >> wider? > > I'm afraid too many places rely on all modes of a certain class to be > visible when walking from "narrowest" to "widest" mode, say > FOR_EACH_MODE_IN_CLASS/FOR_EACH_MODE/FOR_EACH_MODE_UNTIL/FOR_EACH_WIDER_MODE > etc. wouldn't work at all if GET_MODE_WIDER_MODE (BFmode) == SFmode > && GET_MODE_WIDER_MODE (HFmode) == SFmode. Yes, it seems they need to change now that their assumptions have been violated. I suppose FOR_EACH_MODE_IN_CLASS would need to change to not use get_wider, and users of FOR_EACH_MODE/FOR_EACH_MODE_UNTIL need to decide whether they want an iteration that uses get_wider (likely with a new name) or not. > Note, besides this GET_MODE_PRECISION (HFmode) == GET_MODE_PRECISION (BFmode) > case, another set of modes which have the same size are powerpc* > TFmode/IFmode/KFmode, but in that case it makes ugly hacks where it > artificially lowers the precision of 2 of them: > rs6000-modes.h:#define FLOAT_PRECISION_IFmode 128 > rs6000-modes.h:#define FLOAT_PRECISION_TFmode 127 > rs6000-modes.h:#define FLOAT_PRECISION_KFmode 126 > (and the middle-end then has to work around that mess). Doing something > similar wouldn't help the BFmode vs. HFmode case though, one of them would > have wider precision and so e.g. C FE would then prefer it, but more > importantly, as they are unordered modes where most of the optabs aren't > implemented it is bad to pick optabs for the "wider" mode to handle the > "narrower" one. I think powerpc works because they define optabs for > all the 3 modes when those modes are usable. > > Jakub >