From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id C39603858C50 for ; Thu, 4 Apr 2024 16:55:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C39603858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C39603858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712249757; cv=none; b=actMSJ91NcBH1tEoaYfNNhTuVvlBai/lmQnK/Xhl3NeTbeumrYGGDAsC25UCP1Agexkkbq4S0D735ho544jg7xolt741cN4fumYR8W3TfASTC+uIkL0z3rpJf1+w31SXWjZL1aZCXo+Se624voVFNQNOtawboPPG+c9Zj3p21wU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1712249757; c=relaxed/simple; bh=93SbBmE9WW0yedyuTDpGVRazlppCet47j8sVm3HNxog=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=Wfl0D4687FLdneEE1j1U2ujBr4HdNkdFEzx8UzZRkb/VYhidwHe9jg5o3gNjYJtBhLsupW0OqB6v36RfORGiIcLlxmUDIfo4COf2mOr9QNvNwB9rID+VuK/f33A7TydZONkI+8twMpATWzhaOSYwR85RapsqwJO7cKTZ63yCPDo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1712249755; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=93SbBmE9WW0yedyuTDpGVRazlppCet47j8sVm3HNxog=; b=ZSusWI9Y3IXq9q/CrB/EEZcx3WcndvNmRSaihBBf4bvee1hHJBpQ3pOM7t9KSBKcTQqZue NaENzFIoWq1z3rDET2hUL+4a1a0AVUovITtoHhdCvJ9M7Mf/pXy49Ew2YADArHYmwqQIRb 1IpT6y9XYA9O42wTh8re28wDzZNBRQQ= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-259-as4UPWlVM5aie0PwtuwQ6Q-1; Thu, 04 Apr 2024 12:55:53 -0400 X-MC-Unique: as4UPWlVM5aie0PwtuwQ6Q-1 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-dbf216080f5so2023513276.1 for ; Thu, 04 Apr 2024 09:55:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712249753; x=1712854553; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=93SbBmE9WW0yedyuTDpGVRazlppCet47j8sVm3HNxog=; b=pVfHLNVomJGeFE8CuRXuYF7VMxrCj9LRtgOWieLBFUs/Q/wFimjsGy3k6H9KyLQL77 OWFF2v6L3NL64eyU+X4fx1iWjGanIkX3j13mkRe7H+FDfe7S+xg/RJ/izzvSU4bnZjhA BgRqNllKd1agCQ2UMJxMx9TCsIohWZ95OY/Tvt4SXXG7/L+H32XpwjeQOwuqy8EQAUIi viPwHTlCnWMCMa33ym7N7lAzDiZvT5gfRaPMcBYT6cwbFx3IgFqeyd/7HHBVntfEv9Te jBo67YZT5YxAoprE1qhW7MsX8dLkhVejPuy1KfAdnBMe5HENpIzuzA6lbzfbtAPQdj4y gAmA== X-Forwarded-Encrypted: i=1; AJvYcCXFwkHPt4msNxnyrKmsp3ETMizW5pnjz6mi6OI7lNL5U84x7DHa6mos4WPGagJthxDWHXwws7rW/XWTzhQXHshYLJ/ge7S97w== X-Gm-Message-State: AOJu0YwyCHWYQm2RiumOaQMCXKv2Ts9WaQjh94HMJnL9bAUOqD1RO+8i YABuMAytb2eFFQl7MkvdbOwhIKvnwSF1UP2s/eZYU++6EgZtTIbLZnPKvERyDKpKIbzC59V343G jOYOTYQh89C4/QIN6tIu9HZKRT+754WNCdxMqoof+M+ky5+JoFiAKz4Jpch4BP6m48/fhKQfh92 yzHKviyVvzLAqm01VV5nyJWJ0aSXaN5Q== X-Received: by 2002:a25:bb04:0:b0:dcc:1f6a:d755 with SMTP id z4-20020a25bb04000000b00dcc1f6ad755mr2998769ybg.39.1712249752742; Thu, 04 Apr 2024 09:55:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGJt1h1roF8IsxsAMasPISxu3wLTeR2FKcjnunpYP3tdAfym7rDyY8Cjc8WWgwnKTXXlwXFcZv2V/sTdcMpA5c= X-Received: by 2002:a25:bb04:0:b0:dcc:1f6a:d755 with SMTP id z4-20020a25bb04000000b00dcc1f6ad755mr2998758ybg.39.1712249752461; Thu, 04 Apr 2024 09:55:52 -0700 (PDT) MIME-Version: 1.0 References: <20240404153158.313297-1-jwakely@redhat.com> <19633866-184F-4E13-B05B-C3473946E2B9@googlemail.com> In-Reply-To: From: Jonathan Wakely Date: Thu, 4 Apr 2024 17:55:36 +0100 Message-ID: Subject: Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] To: Iain Sandoe Cc: "libstdc++" , GCC Patches X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely wrote: > > On Thu, 4 Apr 2024 at 16:40, Iain Sandoe wrote: > > > > > > > > > On 4 Apr 2024, at 16:29, Jonathan Wakely wrote: > > > > > > I would appreciate more eyes on this to confirm my conclusions about > > > negative int_type values, and the proposed fix, make sense. > > > > > > Tested x86_64-linux. > > > > > > -- >8 -- > > > > > > A negative value for the delim value passed to std::istream::ignore can > > > never match any character in the stream, because the comparison is done > > > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never > > > returns negative values (except at EOF). The optimized version of > > > ignore for the std::istream specialization uses traits_type::find to > > > locate the delim character in the streambuf, which _can_ match a > > > negative delim on platforms where char is signed, but then we do another > > > comparison using eq_int_type which fails. The code then keeps looping > > > forever, with traits_type::find saying the character is present and > > > eq_int_type saying it's not. > > > > > > A possible fix would be to check with eq_int_type after a successful > > > find, to see whether we really have a match. However, that would be > > > suboptimal since we know that a negative delimiter will never match > > > using eq_int_type. So a better fix is to adjust the check at the top of > > > the function that handles delim==eof(), so that we treat all negative > > > delim values as equivalent to EOF. That way we don't bother using find > > > to search for something that will never match with eq_int_type. > > > > Is the corollary to this that a platform with signed chars can never use a > > negative value as a delimiter - since that we always be treated as EOF? > > That's what the C++ standard says (and is what libc++ does). > > The delimiter argument to ignore is an int_type, not a char. So > formally you should call it like: > > std::cin.ignore(n, std::istream::traits_type::to_int_type('a')); > > where to_int_type will cast to unsigned char and then to int, so that > no char can ever produce a negative value for that argument. > > If you happen to know that casting 'a' to unsigned char and then to > int doesn't change its value (because it's a 7-bit ASCII value), then > you can be lazy and do: > > std::cin.ignore(n, 'a'); > > That works fine. > > But if your delimiter character is the MSB set, *and* char is signed > on your platform, then you can't be lazy. The implicit conversion from > char to the stream's int_type is not the same as the result of calling > traits_type::to_int_type, and so these are NOT equivalent on a > platform with signed char: > std::cin.ignore(n, '\x80'); > std::cin.ignore(n, (unsigned char)'\x80'); > > The former is wrong, the latter is correct. > The former will never match a '\x80' in the stream, because the ignore > function will cast each char extracted from the stream to > (int)(unsigned char) and so never match -128. > > So the change to treat all negative values as EOF is just an > optimization. Since they can never match, there's no point searching > for them. Just skip n chars. And FWIW, I don't think libc++ does that optimization. They extract char-by-char and compare them, even though that can never match a negative value. So they get the same result as with my patch, but much slower ;-) The status quo is that libstdc++ loops forever given a negative delimiter. That's obviously wrong! We need to fix the buggy logic that loops forever, and the "correct" standard conforming fix means that we never match the negative value, because the chars are extracted from the stream as non-negative int_type values. As an optimization, we can skip the loop that keeps trying impossible matches, because we know it won't match. That's what my patch does. (It occurs to me now that we could also treat any delim value greater than 255 as EOF too, since that can't match either). Ulrich's suggestion is to allow the buggy user code to Just Work (for all cases except (char)-1 on a platform where char is signed). That is maybe the least surprising behaviour for users. On the other hand, I'm not sure we really want to support: cin.ignore(n, -10); cin.ignore(n, -999); cin.ignore(n, +999); What are these trying to do? Those are just nonsense arguments to this function. But if we try to make the testcase in the bug report Just Work, we end up also accepting (at least) the -10 case, because it's indistinguishable from the char '\xf6'. Depending how we do it, we might also make the other cases work, treating -999 as '\x19', and +999 as '\xe7'.