From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22a.google.com (mail-oi1-x22a.google.com [IPv6:2607:f8b0:4864:20::22a]) by sourceware.org (Postfix) with ESMTPS id 0DB713856DF9 for ; Wed, 18 May 2022 17:39:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0DB713856DF9 Received: by mail-oi1-x22a.google.com with SMTP id v66so3542384oib.3 for ; Wed, 18 May 2022 10:39:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=ngQBXOASMWkpckg5OCIsxtq3XajBicoVOzyWWXGaGsw=; b=oPTVknkP9nWMBcpH5CcpxvnmZG2v0jV4UhPjNjCpxz5l+2Dn8vtykWUZcnxvpmBdIx mAMijVqwZKk4wh7PNx1BbZ5luPS2q2YDOH29WhDQ2pE78Cwf8rwr/VfAGUJ+IiT1dzM5 Copzq4MWdqizWE53MzQit+2AcvZC6Tx+WuThTmP5WtzVXSGbc3lrCibcvj17F/uAN2xj 47yebLdtnAnX2Dp3Z6Gf3RkZhRyughOhi8ogpvQ1hiHCFCGYfHb+LZbRIWJIUVFg/PkQ ywO50YEoeqiC3Kjmxp+IIzHGppl3vhhn59H/A+qqIlnhVDJ9tPJNpA8NMWwDggnnmoqT T8hQ== X-Gm-Message-State: AOAM531de0lY+zmkzCIJKdAT94OWPjJqp+I9lIppVedgvRBrk/OGRAkd yn7slDJOgwhDk7e6wbShAofo+A== X-Google-Smtp-Source: ABdhPJyVQBk6T5zqiNVi5DyyYcWZ2rfpJGIZVzpVYj2gpMwCjEwkQpvL9IhLzoVX3ssGg8vUi1FEfg== X-Received: by 2002:a05:6808:ec7:b0:322:2bcc:42c2 with SMTP id q7-20020a0568080ec700b003222bcc42c2mr646264oiv.168.1652895574285; Wed, 18 May 2022 10:39:34 -0700 (PDT) Received: from ?IPV6:2804:431:c7cb:cdd6:1a62:669c:7cd2:ac43? ([2804:431:c7cb:cdd6:1a62:669c:7cd2:ac43]) by smtp.gmail.com with ESMTPSA id f17-20020a056870899100b000e92295f8acsm1348630oaq.2.2022.05.18.10.39.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 May 2022 10:39:33 -0700 (PDT) Message-ID: Date: Wed, 18 May 2022 14:39:31 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH 2/3]: C++20 P0482R6 and C2X N2653: Implement mbrtoc8, c8rtomb, char8_t Content-Language: en-US To: Tom Honermann , Florian Weimer , Adhemerval Zanella via Libc-alpha Cc: Joseph Myers References: <57610f50-dd95-fd32-1102-5f1cda440891@honermann.net> <7c7e07a6-ebf0-71bc-cdc4-2f563b0e3b0f@linaro.org> <87h75ng7ye.fsf@oldenburg.str.redhat.com> <6b3dd2a0-80b9-a1bd-b157-c6f92cdf7fb2@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2022 17:39:36 -0000 On 18/05/2022 14:26, Tom Honermann wrote: > On 5/18/22 12:17 PM, Adhemerval Zanella wrote: >> On 18/05/2022 12:32, Tom Honermann wrote: >>> On 5/17/22 5:33 PM, Florian Weimer wrote: >>>> * Adhemerval Zanella via Libc-alpha: >>>> >>>>> On 17/05/2022 15:12, Joseph Myers wrote: >>>>>> On Tue, 17 May 2022, Adhemerval Zanella wrote: >>>>>> >>>>>>>> +/* This is the private state used if PS is NULL.  */ >>>>>>>> +static mbstate_t state; >>>>>>> Although it was done for other convertion interfaces, I wonder if we should >>>>>>> keep supporting this mt-unsafe usage for newer ones.  It was not clear from >>>>>> In C23 it's implementation-defined whether the internal state for such >>>>>> functions has static or thread storage duration (see the general >>>>>> introduction to the uchar.h functions). >>>>>> >>>>> Right, so glibc still need to support either mode. >>>> No, I think we can and should switch.  Maybe with new symbol versions >>>> (but the same implementation) if we want to play it conservative. >>>> >>>> The intent in POSIX and C has been for a long time that thread-local >>>> state is permitted for these functions, only the wording did not >>>> technically allow it.  The phrase was “not required to avoid data >>>> races”.  The problem is that it's possible to tell the difference >>>> without data races (with external synchronization).  Many libcs already >>>> use thread-local state for these functions, without any apparent ill >>>> effects. >>> I would err on the side of maintaining consistency across these functions for now and then transition them all at once if there is a desire to do so. >> Do we need to keep distinct states for each one or can use the a shared >> one? > > Each is required to maintain its own state. The relevant wording from C17 7.28.1 (Restartable multibyte/wide character conversion functions) paragraph 1 states: > > These functions have a parameter, ps, of type pointer to mbstate_t that points to an object that can completely describe the current conversion state of the associated multibyte character sequence, which the functions alter as necessary. *If ps is a null pointer, each function uses its own internal **mbstate_t**object instead, which is initialized at program startup to the initial conversion state; the****functions are not required to avoid data races with other calls to the same function in this case. The **implementation behaves as if no library function calls these functions with a null pointer for ps.* Yeah, I forgot that I just read this very sentence yesterday reviewing you patch. So it it currently 88 bytes per thread (104 considering the mbrtoc8, c8rtomb) per thread state if we make it thread-local. We can maybe lazy allocate the whole wcsmbs state, so we just keep a pointer per thread. We already have the internal scheme to allocate ion tcb (tls-internal.h) so it does not exhaust static tls allocation.