From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 7AE58385802C for ; Tue, 11 Jan 2022 13:10:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7AE58385802C Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-625-HDwpIHcnN5ialBtEyn9N1w-1; Tue, 11 Jan 2022 08:10:48 -0500 X-MC-Unique: HDwpIHcnN5ialBtEyn9N1w-1 Received: by mail-wr1-f71.google.com with SMTP id q2-20020adfab02000000b001a3ed59eb96so4868937wrc.20 for ; Tue, 11 Jan 2022 05:10:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=iWEiwbe2+ph6lGdDWczDW8Bi1j/zdYApBzWqP9hCU9w=; b=QJRxC6HOSatCTjTORYVfl+rt33wVXuIEksjZ9hC3ayb1xMAX9Ce2krcbqCAHv5qvA1 naaEMFOLipVY9aL4JjjvEyhvMYT2L4csn3CFP7bXTEnSnWNU5v/+S0zcsK8IfgK+nhGe Yk/aBZ5hMUGMD8LKluPgVqLw46ep8kCuP3LnM3qcCGnrqKBLeK4rTbrA57xNNnbpvRoC WgnRBbFOXf1Zd+tAycdBgT4dQ8ttyXfUGh3SICnhPs52SKEs8gQg4uUbcYnJeUZxOjhY CiTPr84xKxg9nYpbTlgVuqu8XTrvSA7H89fNrGKLnsb2vi3hgzE4HdA5xYx9n5scRaxF 3+Sw== X-Gm-Message-State: AOAM531oNiuLlf6GrZB9dilvQ9asARULGnjJ00aB3RzITSKLZt2Wrj2y 60JV2dl5VaU8gU8OEKQKouJKcdcGAPFX4bEFFtxyqUkXacRm0Y1JSIQWWwEtXJo5HghuIlLBRbU DhH3J5faM42mlQKLvt7vXJw== X-Received: by 2002:a1c:9dd4:: with SMTP id g203mr2440244wme.119.1641906646742; Tue, 11 Jan 2022 05:10:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJyPLY6KwhVBJGB1iC9Fx1PxnDbvHzB5hwIsxlUAbCHz8aJt/uOo3nUjKO/ccs1r+OHHFzOLWg== X-Received: by 2002:a1c:9dd4:: with SMTP id g203mr2440231wme.119.1641906646544; Tue, 11 Jan 2022 05:10:46 -0800 (PST) Received: from localhost (host86-188-49-82.range86-188.btcentralplus.com. [86.188.49.82]) by smtp.gmail.com with ESMTPSA id l18sm1677322wms.24.2022.01.11.05.10.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jan 2022 05:10:46 -0800 (PST) Date: Tue, 11 Jan 2022 13:10:45 +0000 From: Andrew Burgess To: Simon Marchi Cc: gdb-patches@sourceware.org Subject: Re: [PATCH 4/4] gdb/python: handle non utf-8 characters when source highlighting Message-ID: <20220111131045.GI622389@redhat.com> References: <825abc2257c992be90af28973c54f98e7cf4371f.1641565040.git.aburgess@redhat.com> <7448ab15-1186-3105-5629-353f0d5ae356@polymtl.ca> <20220110104102.GQ828155@redhat.com> <09a27370-a759-1f81-9db6-38ad2fd97ccd@polymtl.ca> MIME-Version: 1.0 In-Reply-To: <09a27370-a759-1f81-9db6-38ad2fd97ccd@polymtl.ca> X-Operating-System: Linux/5.8.18-100.fc31.x86_64 (x86_64) X-Uptime: 13:08:23 up 9 days, 22:02, X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jan 2022 13:11:00 -0000 * Simon Marchi via Gdb-patches [2022-01-10 10:32:02 -0500]: > > Unfortunately it's not as simple as bytes in bytes out. See: > > > > https://pygments.org/docs/unicode/?highlight=encoding > > > > In summary, Pygments uses unicode internally, but has some logic for > > guessing the encoding of the incoming bytes. This logic is better (I > > claim) than GDB's hard-coded use UTF-8. The link above outlines how > > the guess is done in more detail. > > > > Pygments always returns a unicode object, which is one of the reasons > > I have GDB handle both bytes and unicode being returned from the > > colorize API. We could always make the API for restricted, and insist > > on a bytes object being returned, this would just require us to > > convert the output of Pygments to bytes before returning to GDB. > > Ok, so when does "colorize" returns bytes? (1) Python 2 (for now), and (2) Never, unless a user overrides gdb.colorize. Thanks, Andrew