From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id D48803851ABC for ; Mon, 5 Jun 2023 12:26:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D48803851ABC Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685967999; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hCE/MNXdKvAXG4JsvmtsqVBgKmFzj63PS1KRxkm8qPM=; b=AkTzLGpoBS+pU2vjjP4D1IXJWTFE1sCdNf3Bau/UaLDf9Ro7u91dN+Sdwzzs7KBOkmsU3M 5hKvSofb85JemavtShlT7KTFiovV+YRtI5nPldqPvvHZOumEdf3LlvPteXs6yF8rn2K+Et gjes8pM2Bw8baFfaE5ITI3pEhEjdIk8= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-404-41HYv07NPjKvUMTgh5A6pA-1; Mon, 05 Jun 2023 08:26:38 -0400 X-MC-Unique: 41HYv07NPjKvUMTgh5A6pA-1 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-3f5fc8581a9so22793435e9.0 for ; Mon, 05 Jun 2023 05:26:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685967997; x=1688559997; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hCE/MNXdKvAXG4JsvmtsqVBgKmFzj63PS1KRxkm8qPM=; b=B1ouMd0pK7HkekmcJGDcEl5gYKg/r4z8S/uWwv4gLk35lb5m1cT0kmGF/FuJDuy4co v5H7e7p5GqV05IyDRXbULiuG/6SCv79d6w7wdGT4XUz9M4V21QEog6tgqAjoQwujL5hj Qt+7bPn67Cinmb2CEU9GHkhTFKkUA/eC2PGXi7GfBGhDMEy6+oQK4qTLLULQedCwu4o4 scGwgQXclSWz5DdHp4ba+ZR/9zDhodoG+JAmSMz1pgbB8/s5/rCw1IB7c/lKCz2Qhpcg Eq2OwOXQLslgy4vSXcmydVhjLyEldO9SMlRImtO5LeOfu/vwiT4A1Jnay1+zXLd8+qQx DWsQ== X-Gm-Message-State: AC+VfDzS2KBZxGaRbzePr76AfjstHBq14RNWJ4OWq6c6WVezD9ah+nUO rOVKhJOjt5ebej/VMoSbj/x1W0K4mOlrv+ikemBw7ayf+ukYKEUK3rEjK561N3ZwWXu5aSL7CVM INCeNZpYaaW2r/4qnZtHcePnPYhurLg== X-Received: by 2002:a7b:cc12:0:b0:3f6:6c0:7c9b with SMTP id f18-20020a7bcc12000000b003f606c07c9bmr6262374wmh.15.1685967997199; Mon, 05 Jun 2023 05:26:37 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6HqVTp7MLamYHK9q0gN7Ym2+G9qJq+nRLLuD6iBXCtatnc9srWlLRnxNBGXDfL9KLoq1bFpA== X-Received: by 2002:a7b:cc12:0:b0:3f6:6c0:7c9b with SMTP id f18-20020a7bcc12000000b003f606c07c9bmr6262364wmh.15.1685967996799; Mon, 05 Jun 2023 05:26:36 -0700 (PDT) Received: from localhost (11.72.115.87.dyn.plus.net. [87.115.72.11]) by smtp.gmail.com with ESMTPSA id b3-20020a5d4d83000000b0030c4d8930b1sm9620463wru.91.2023.06.05.05.26.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 05:26:36 -0700 (PDT) From: Andrew Burgess To: Simon Marchi , gdb-patches@sourceware.org Cc: Pedro Alves , Simon Marchi Subject: Re: [PATCHv3] gdb: building inferior strings from within GDB In-Reply-To: References: <75628df1fbd72166504c9b2a368170121b86e072.1680849242.git.aburgess@redhat.com> Date: Mon, 05 Jun 2023 13:26:35 +0100 Message-ID: <878rcy3xj8.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Simon Marchi writes: > On 5/24/23 10:10, Andrew Burgess wrote: >> History Of This Patch >> ===================== >> >> This commit aims to address PR gdb/21699. There have now been a >> couple of attempts to fix this issue. Simon originally posted two >> patches back in 2021: >> >> https://sourceware.org/pipermail/gdb-patches/2021-July/180894.html >> https://sourceware.org/pipermail/gdb-patches/2021-July/180896.html >> >> Before Pedro then posted a version of his own: >> >> https://sourceware.org/pipermail/gdb-patches/2021-July/180970.html >> >> After this the conversation halted. Then in 2023 I (Andrew) also took >> a look at this bug and posted two versions: >> >> https://sourceware.org/pipermail/gdb-patches/2023-April/198570.html >> https://sourceware.org/pipermail/gdb-patches/2023-April/198680.html >> >> The approach taken in my first patch was pretty similar to what Simon >> originally posted back in 2021. My second attempt was only a slight >> variation on the first. >> >> Pedro then pointed out his older patch, and so we arrive at this >> patch. The GDB changes here are mostly Pedro's work, but updated by >> me (Andrew), any mistakes are mine. >> >> The tests here are a combinations of everyone's work, and the commit >> message is new, but copies bits from everyone's earlier work. >> >> Problem Description >> =================== >> >> Bug PR gdb/21699 makes the observation that using $_as_string with >> GDB's printf can cause GDB to print unexpected data from the >> inferior. The reproducer is pretty simple: >> >> #include >> static char arena[100]; >> >> /* Override malloc() so value_coerce_to_target() gets a known >> pointer, and we know we"ll see an error if $_as_string() gives >> a string that isn't null terminated. */ >> void >> *malloc (size_t size) >> { >> memset (arena, 'x', sizeof (arena)); >> if (size > sizeof (arena)) >> return NULL; >> return arena; >> } >> >> int >> main () >> { >> return 0; >> } >> >> And then in a GDB session: >> >> $ gdb -q test >> Reading symbols from /tmp/test... >> (gdb) start >> Temporary breakpoint 1 at 0x4004c8: file test.c, line 17. >> Starting program: /tmp/test >> >> Temporary breakpoint 1, main () at test.c:17 >> 17 return 0; >> (gdb) printf "%s\n", $_as_string("hello") >> "hello"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx >> (gdb) quit >> >> The problem above is caused by how value_cstring is used within >> py-value.c, but once we understand the issue then it turns out that >> value_cstring is used in an unexpected way in many places within GDB. >> >> Within py-value.c we have a null-terminated C-style string. We then >> pass a pointer to this string, along with the length of this >> string (so not including the null-character) to value_cstring. >> >> In value_cstring GDB allocates an array value of the given character >> type, and copies in requested number of characters. However >> value_cstring does not add a null-character of its own. This means >> that the value created by calling value_cstring is only >> null-terminated if the null-character is included in the passed in >> length. In py-value.c this is not the case, and indeed, in most uses >> of value_cstring, this is not the case. >> >> When GDB tries to print one of these strings the value contents are >> pushed to the inferior, and then read back as a C-style string, that >> is, GDB reads inferior memory until it finds a null-terminator. For >> the py-value.c case, no null-terminator is pushed into the inferior, >> so GDB will continue reading inferior memory until a null-terminator >> is found, with unpredictable results. >> >> Patch Description >> ================= >> >> The first thing this patch does is better define what the arguments >> for the two function value_cstring and value_string should represent. >> The comments in the header file are updated to describe whether the >> length argument should, or should not, include a null-character. >> Also, the data argument is changed to type gdb_byte. The functions as >> they currently exist will handle wide-characters, in which case more >> than one 'char' would be needed for each character. As such using >> gdb_byte seems to make more sense. >> >> To avoid adding casts throughout GDB, I've also added an overload that >> still takes a 'char *', but asserts that the character type being used >> is of size '1'. >> >> The value_cstring function is now responsible for adding a null >> character at the end of the string value it creates. >> >> However, once we start looking at how value_cstring is used, we >> realise there's another, related, problem. Not every language's >> strings are null terminated. Fortran and Ada strings, for example, >> are just an array of characters, GDB already has the function >> value_string which can be used to create such values. >> >> Consider this example using current GDB: >> >> (gdb) set language ada >> (gdb) p $_gdb_setting("arch") >> $1 = (97, 117, 116, 111) >> (gdb) ptype $ >> type = array (1 .. 4) of char >> (gdb) p $_gdb_maint_setting("test-settings string") >> $2 = (0) >> (gdb) ptype $ >> type = array (1 .. 1) of char >> >> This shows two problems, first, the $_gdb_setting and >> $_gdb_maint_setting functions are calling value_cstring using the >> builtin_char character, rather than a language appropriate type. In >> the first call, the 'arch' case, the value_cstring call doesn't >> include the null character, so the returned array only contains the >> expected characters. But, in the $_gdb_maint_setting example we do >> end up including the null-character, even though this is not expected >> for Ada strings. >> >> This commit adds a new language method language_defn::value_string, >> this function takes a pointer and length and creates a language >> appropriate value that represents the string. For C, C++, etc this >> will be a null-terminated string (by calling value_cstring), and for >> Fortran and Ada this can be a bounded array of characters with no null >> terminator. Additionally, this new language_defn::value_string >> function is responsible for selecting a language appropriate character >> type. >> >> After this commit the only calls to value_cstring are from the C >> expression evaluator and from the default language_defn::value_string. >> >> And the only calls to value_string are from Fortan, Ada, and ObjectC >> related code. >> >> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=21699 >> >> Co-Authored-By: Simon Marchi >> Co-Authored-By: Andrew Burgess >> Co-Authored-By: Pedro Alves > > This LGTM: > > Approved-By: Simon Marchi > > Just one small question: > >> diff --git a/gdb/cli/cli-cmds.c b/gdb/cli/cli-cmds.c >> index b7b65303a0b..3fa833d596c 100644 >> --- a/gdb/cli/cli-cmds.c >> +++ b/gdb/cli/cli-cmds.c >> @@ -2316,11 +2316,9 @@ value_from_setting (const setting &var, struct gdbarch *gdbarch) >> } >> >> if (len > 0) >> - return value_cstring (value, len, >> - builtin_type (gdbarch)->builtin_char); >> + return current_language->value_string (gdbarch, value, len); >> else >> - return value_cstring ("", 1, >> - builtin_type (gdbarch)->builtin_char); >> + return current_language->value_string (gdbarch, "", 0); > > Do we still need the two calls, or can we just do with the first (even > when len is 0)? Same idea for str_value_from_setting. You are right. I merged these two calls, and the other two in str_value_from_setting, and pushed this patch. Thanks, Andrew