From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x732.google.com (mail-qk1-x732.google.com [IPv6:2607:f8b0:4864:20::732]) by sourceware.org (Postfix) with ESMTPS id 87EF3385841E for ; Fri, 4 Nov 2022 13:44:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 87EF3385841E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-qk1-x732.google.com with SMTP id i9so3026712qki.10 for ; Fri, 04 Nov 2022 06:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=arY7blMsCrkJ8+ddxKjflzIUALLYigecSIs4grNMcCs=; b=cohrWDMWgNHnnGx4bUvgBys78Z/NShHNjR4/UIg0ycDplOQeQZkietzWddB8mp89SR rKlwaSofgkc/UVk47vPQzLvRcqAxPRFb8aSf73enSzaxp1XrqYz9GLmKUnP/En+U8UUN UScZkuiAvZ9GoPyYqwivrKx26xXgesGpgb7oxkalQeB1+d7t5kmpgnCtb/L9r4IngsZp 5pOgJ0fYBVdP4GPUCe2xjSEz4Xyvk1DcT3WFfw5aaKkO+1KzJxMvIlNhk4L/J5NQcDoB CEoFtFuae8LXneVRdokpZB1QHlGt46dsbIfuqhqCEGHjbHdWMtSEfrOmkd3TW2GpQak0 OQaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=arY7blMsCrkJ8+ddxKjflzIUALLYigecSIs4grNMcCs=; b=dPkVrrLSBCjfO+lGDtwB3hhs93v69tbxIIsL/zTAYvyrbTXlhKm5cVYYG3QT2eCmzj UQqnj+Ycsl1SHFnmB4VbTdnUAv+FQyVCnhgZkrlnIdq1CiB4sXPciS472IFWHmALn4BD juZXbobE9QfL+XPKBL0yWYI1TEA/js4QUQIPNFJymdK71QeBJx4JqrnFXgiE3UKvq1LR QRDLKJcJeulGexOXOf5lfvJpwUI8T+YOLSZJWWwvgYLdjwyf1n3wZ00ut0J7WlqRu4rq PaM1K9I9HXoTODi1jPa3PpMfupPN2j2h0TIcoZgvxCkuoNWUJDX8K2o5uqG1TQPBm7ex QEHw== X-Gm-Message-State: ACrzQf2Z+XLae9zh2iX5s5mLaawyHeYvFuU4DeWy/HUOfrqup32N3Q4I VzZnmtPU3iGoKJL8Xntb3rhPGPapLH0= X-Google-Smtp-Source: AMsMyM4x7YIcy1lqpIYfR2+JYybHqRI9ytAT3wN/p92Bj7ZmipB0YF4A8aUnfh2P1LR3+UZr+xBfIw== X-Received: by 2002:a37:8a44:0:b0:6fa:5b4f:a1cb with SMTP id m65-20020a378a44000000b006fa5b4fa1cbmr10634521qkd.619.1667569473749; Fri, 04 Nov 2022 06:44:33 -0700 (PDT) Received: from localhost.localdomain (96-67-140-173-static.hfc.comcastbusiness.net. [96.67.140.173]) by smtp.gmail.com with ESMTPSA id ey21-20020a05622a4c1500b003988b3d5280sm2470577qtb.70.2022.11.04.06.44.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Nov 2022 06:44:33 -0700 (PDT) From: Lewis Hyatt To: gcc-patches@gcc.gnu.org Cc: Lewis Hyatt , David Malcolm Subject: [PATCH 0/6] diagnostics: libcpp: Overhaul locations for _Pragma tokens Date: Fri, 4 Nov 2022 09:44:08 -0400 Message-Id: X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3032.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello- In the past couple years there has been a ton of progress in fixing bugs related to _Pragma, especially its use in the type of macros that many projects like to implement for manipulating GCC diagnostic pragmas more easily. For GCC 13 I have been going through the remaining open PRs, fixing a couple and adding testcases for several that were already fixed. I felt that made it a good time to overhaul one of the last remaining issues with _Pragma processing, which is that we do not currently assign good locations to the tokens involved. The locations are very important, however, because that is how GCC diagnostic pragmas will ultimately determine whether a given warning should or should not apply at a given point. Currently, the tokens inside a _Pragma string are all assigned the same location as the _Pragma token itself, which is sufficient to make diagnostic pragmas work correctly. It does produce somewhat inferior diagnostics, though, since we do not point the user to which part of the _Pragma string caused the problem; and if the _Pragma string was expanded from a macro, we do not even point them to the string at all. Further, the assignment of the fake location to the tokens inside the _Pragma string takes place after all the tokens have been lexed -- consequently, if a diagnostic is issued by libcpp during that process, it doesn't benefit from the patched-up location and instead uses a bogus location. As a quick example, compiling: ===== _Pragma("GCC diagnostic ignored \"oops") ===== produces: ===== file:1:24: warning: missing terminating " character 1 | _Pragma("GCC diagnostic ignored \"oops") | ^ ===== It is surprisingly involved to make that caret point to something reasonable. The reason it points to the middle of nowhere is that the current implementation of _Pragma in directives.cc:destringize_and_run() does not touch the line_maps instance at all, and so does not inform it where the tokens are coming from. But the line_maps API in fact does not provide any way to handle this case, so this needs to be added first. With all the changes in this patch set, we would output instead: ====== In buffer generated from file:1: :1:24: warning: missing terminating " character 1 | GCC diagnostic ignored "oops | ^ file:1:1: note: in <_Pragma directive> 1 | _Pragma("GCC diagnostic ignored \"oops") | ^~~~~~~ ====== Treating the _Pragma like a macro expansion makes everything consistent and solves a ton of problems; all the locations involved will just make sense from the user's point of view. Patches 1-3 are tiny bug fixes that I came across while working on the new testcases. I was a bit surprised that #1 and #3 especially did not have PRs open, but I guess these small glitches have gone unnoticed so far. Patch 4 is the largest one. It adds a new reason=LC_GEN for ordinary line maps. These maps are just like normal ones, except the file name pointer points not to a file name, but to the actual data in memory instead. This is how we can issue diagnostics for code that did not appear in the user's input, such as the de-stringized _Pragma string. The changes needed in libcpp to support this concept are pretty small and straightforward. Most of the changes outside of libcpp are in input.cc and diagnostic-show-locus.cc, which need to learn how to obtain code from LC_GEN maps, and also a lot of the changes are in selftests that are pretty sensitive to the internal implementation. Patch 5 is a continuation of 4 that supports LC_GEN maps in less commonly used places, such as the new SARIF output format, that also need to know how to read source back from in-memory buffers in addition to files. Patch 6 updates the implementation of _Pragma handling to use LC_GEN maps and to create virtual locations for the tokens as in the example above. I have also added support for the argument of the _Pragma to be a raw string, as requested by PR83473, since this was easy to do while I was there. 1/6: diagnostics: Fix macro tracking for ad-hoc locations 2/6: diagnostics: Use an inline function rather than hardcoding string 3/6: libcpp: Fix paste error with unknown pragma after macro expansion 4/6: diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers 5/6: diagnostics: Support generated data in additional contexts 6/6: diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Bootstrap and regtest all languages on x86-64 Linux looks good. I realize it's near the end of stage 1 now. It would still be great and I would appreciate very much if this patch could get reviewed please? For GCC 13, there have been several _Pragma-related bugs fixed (especially PR53431), and addressing this location issue would tie it together nicely. Thanks very much! -Lewis