From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 0DA0B3858C98 for ; Thu, 4 Apr 2024 16:55:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DA0B3858C98 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0DA0B3858C98 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712249757; cv=none; b=V3iRfsTvzaalCiAlR7l/TQRuG3sJjhaZ6G3FbUgj9nEQkxO/OGJjOl0GpPcLtEwOEKcs1jd4lNsccDsRzgwkkNdKARK+sAO1B5LMDHQM2iyW1ljM1Qn/9ANjlyCJmtqiLSyfztP/9c6prbAHVLlfpJ7hNvvIMw0tHfLsNLXUXCI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712249757; c=relaxed/simple; bh=93SbBmE9WW0yedyuTDpGVRazlppCet47j8sVm3HNxog=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=xP92EW229XmKP0TYenfdHKWEUpDdBBTMeScVSdSu8fmIvWOGHpl55Bj4uTxAmISNyx7uwkEgkFMRDMhmBVGujHQBrjURCbfPtWYVAoS9tc5eufC7LDHl5m852P+aKmXU+vcyPp4tcmm5pSjgFk3yhTz9HkxIpJJBDaFNtl277s8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712249754; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=93SbBmE9WW0yedyuTDpGVRazlppCet47j8sVm3HNxog=; b=N/s1pBw+x6Bs8zc83GHdx+QzkQCyDFhr7v1SLM/vnqYqZIuMw8fU918KvyVeLrozdU3hxC +oG0RyKVRYkTvRJp7+WQxz4O8XESx6fkssCcN4neBZ8kbHPHi+OEriudvNylaKZIIyktr6 M5X225SHxUKclEikDPptbWbTGAboozs= Received: from mail-yb1-f198.google.com (mail-yb1-f198.google.com [209.85.219.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-259-OcTeFipxMD6P5u9CO3v0Hw-1; Thu, 04 Apr 2024 12:55:53 -0400 X-MC-Unique: OcTeFipxMD6P5u9CO3v0Hw-1 Received: by mail-yb1-f198.google.com with SMTP id 3f1490d57ef6-dc6dbdcfd39so2163828276.2 for ; Thu, 04 Apr 2024 09:55:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712249753; x=1712854553; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=93SbBmE9WW0yedyuTDpGVRazlppCet47j8sVm3HNxog=; b=OyneHi09t7LCbd2qtUQx5JRV43TFTxa7RRKKgG3EwJJHHXD3UY3Z34U/lFaXeq+eyg pIfaA0DmaS+MyMc91wF1ZtpYT814+WgSi5fqO33qwKjpvYtuonTxENjea/hsGDQScoCo dkQTSbXw1e0PtKsF5nYvcLdqhHtrw1zkHImOJMhVwuHxH9Rxq+9ATcvWPKR2kOf8GnZT vCFW11IqWYn/CGGKGyw/F+V8TxjTb9VCthakWl3PbUVk12Wdkx0p2YbzuhwQT7x3/QU6 SxR6rfpMgYmfgRpa/+QxS3ZJkIEV0d6K4nvssNiUsA6pFhXfjELu9+BIOl42u+xIIVJa /ZKw== X-Gm-Message-State: AOJu0YwhM/SxGcShGP6qhN2T8EoG+Lr6/hm97bU6mxykPrXIGbwrEUr8 +xnF2m7oX2LnwI7nJXz5aKblF4KeQLeIzVQfUb5g+T103VHROf5a2gVPOfGcdgcqG5+wC5Sdj2u 4AdGY+KcOYjGATIEhnqj9gmokTMEV7CRbybs3Kq5hV1Fs81322AypRx0sDLd3HOafyeVXdFKaoI aR4Ze/5S0lbXJ/0JFdMLIMdp+2z2ZJ0BKUutxB/g== X-Received: by 2002:a25:bb04:0:b0:dcc:1f6a:d755 with SMTP id z4-20020a25bb04000000b00dcc1f6ad755mr2998771ybg.39.1712249752749; Thu, 04 Apr 2024 09:55:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGJt1h1roF8IsxsAMasPISxu3wLTeR2FKcjnunpYP3tdAfym7rDyY8Cjc8WWgwnKTXXlwXFcZv2V/sTdcMpA5c= X-Received: by 2002:a25:bb04:0:b0:dcc:1f6a:d755 with SMTP id z4-20020a25bb04000000b00dcc1f6ad755mr2998758ybg.39.1712249752461; Thu, 04 Apr 2024 09:55:52 -0700 (PDT) MIME-Version: 1.0 References: <20240404153158.313297-1-jwakely@redhat.com> <19633866-184F-4E13-B05B-C3473946E2B9@googlemail.com> In-Reply-To: From: Jonathan Wakely Date: Thu, 4 Apr 2024 17:55:36 +0100 Message-ID: Subject: Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] To: Iain Sandoe Cc: "libstdc++" , GCC Patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely wrote: > > On Thu, 4 Apr 2024 at 16:40, Iain Sandoe wrote: > > > > > > > > > On 4 Apr 2024, at 16:29, Jonathan Wakely wrote: > > > > > > I would appreciate more eyes on this to confirm my conclusions about > > > negative int_type values, and the proposed fix, make sense. > > > > > > Tested x86_64-linux. > > > > > > -- >8 -- > > > > > > A negative value for the delim value passed to std::istream::ignore can > > > never match any character in the stream, because the comparison is done > > > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never > > > returns negative values (except at EOF). The optimized version of > > > ignore for the std::istream specialization uses traits_type::find to > > > locate the delim character in the streambuf, which _can_ match a > > > negative delim on platforms where char is signed, but then we do another > > > comparison using eq_int_type which fails. The code then keeps looping > > > forever, with traits_type::find saying the character is present and > > > eq_int_type saying it's not. > > > > > > A possible fix would be to check with eq_int_type after a successful > > > find, to see whether we really have a match. However, that would be > > > suboptimal since we know that a negative delimiter will never match > > > using eq_int_type. So a better fix is to adjust the check at the top of > > > the function that handles delim==eof(), so that we treat all negative > > > delim values as equivalent to EOF. That way we don't bother using find > > > to search for something that will never match with eq_int_type. > > > > Is the corollary to this that a platform with signed chars can never use a > > negative value as a delimiter - since that we always be treated as EOF? > > That's what the C++ standard says (and is what libc++ does). > > The delimiter argument to ignore is an int_type, not a char. So > formally you should call it like: > > std::cin.ignore(n, std::istream::traits_type::to_int_type('a')); > > where to_int_type will cast to unsigned char and then to int, so that > no char can ever produce a negative value for that argument. > > If you happen to know that casting 'a' to unsigned char and then to > int doesn't change its value (because it's a 7-bit ASCII value), then > you can be lazy and do: > > std::cin.ignore(n, 'a'); > > That works fine. > > But if your delimiter character is the MSB set, *and* char is signed > on your platform, then you can't be lazy. The implicit conversion from > char to the stream's int_type is not the same as the result of calling > traits_type::to_int_type, and so these are NOT equivalent on a > platform with signed char: > std::cin.ignore(n, '\x80'); > std::cin.ignore(n, (unsigned char)'\x80'); > > The former is wrong, the latter is correct. > The former will never match a '\x80' in the stream, because the ignore > function will cast each char extracted from the stream to > (int)(unsigned char) and so never match -128. > > So the change to treat all negative values as EOF is just an > optimization. Since they can never match, there's no point searching > for them. Just skip n chars. And FWIW, I don't think libc++ does that optimization. They extract char-by-char and compare them, even though that can never match a negative value. So they get the same result as with my patch, but much slower ;-) The status quo is that libstdc++ loops forever given a negative delimiter. That's obviously wrong! We need to fix the buggy logic that loops forever, and the "correct" standard conforming fix means that we never match the negative value, because the chars are extracted from the stream as non-negative int_type values. As an optimization, we can skip the loop that keeps trying impossible matches, because we know it won't match. That's what my patch does. (It occurs to me now that we could also treat any delim value greater than 255 as EOF too, since that can't match either). Ulrich's suggestion is to allow the buggy user code to Just Work (for all cases except (char)-1 on a platform where char is signed). That is maybe the least surprising behaviour for users. On the other hand, I'm not sure we really want to support: cin.ignore(n, -10); cin.ignore(n, -999); cin.ignore(n, +999); What are these trying to do? Those are just nonsense arguments to this function. But if we try to make the testcase in the bug report Just Work, we end up also accepting (at least) the -10 case, because it's indistinguishable from the char '\xf6'. Depending how we do it, we might also make the other cases work, treating -999 as '\x19', and +999 as '\xe7'.