public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Jonathan Wakely <jwakely@redhat.com>
Cc: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] libstdc++: Support link chains in std::chrono::tzdb::locate_zone [PR114770]
Date: Fri, 19 Apr 2024 08:14:15 +0200	[thread overview]
Message-ID: <CAFiYyc1dmm4H+pxZPTdkY+EK-iKWWPoKpMcrOurw1yoaRJ2E-A@mail.gmail.com> (raw)
In-Reply-To: <20240418163319.1279591-1-jwakely@redhat.com>

On Thu, Apr 18, 2024 at 6:34 PM Jonathan Wakely <jwakely@redhat.com> wrote:
>
> This would fix the but, how do people feel about it this close to the
> gcc-14 release?

Guess we'll have to fix it anyway, so why not now ... (what could go wrong..)

Richard.

> Tested x86_64-linux.
>
> -- >8 --
>
> Since 2022 the TZif format defined in the zic(8) man page has said that
> links can refer to other links, rather than only referring to a zone.
> This isn't supported by the C++20 spec, which assumes that the target()
> for a chrono::time_zone_link always names a chrono::time_zone, not
> another chrono::time_zone_link.
>
> This hasn't been a problem until now, because there are no entries in
> the tzdata file that chain links together. However, Debian Sid has
> changed the target of the Asia/Chungking link from the Asia/Shanghai
> zone to the Asia/Chongqing link, creating a link chain. The libstdc++
> code is unable to handle this, so chrono::locate_zone("Asia/Chungking")
> will fail with the tzdata.zi file from Debian Sid.
>
> It seems likely that the C++ spec will need a change to allow link
> chains, so that the original structure of the IANA database can be fully
> represented by chrono::tzdb. The alternative would be for chrono::tzdb
> to flatten all chains when loading the data, so that a link's target is
> always a zone, but this means throwing away information present in the
> tzdata.zi input file.
>
> In anticipation of a change to the spec, this commit adds support for
> chained links to libstdc++. When a name is found to be a link, we try to
> find its target in the list of zones as before, but now if the target
> isn't the name of a zone we don't fail. Instead we look for another link
> with that name, and keep doing that until we reach the end of the chain
> of links, and then look up the last target as a zone.
>
> This new logic would get stuck in a loop if the tzdata.zi file is buggy
> and defines a link chain that contains a cycle, e.g. two links that
> refer to each other. To deal with that unlikely case, we use the
> tortoise and hare algorithm to detect cycles in link chains, and throw
> an exception if we detect a cycle. Cycles in links should never happen,
> and it is expected that link chains will be short (if they occur at all)
> and so the code is optimized for short chains without cycles. Longer
> chains (four or more links) and cycles will do more work, but won't fail
> to resolve a chain or get stuck in a loop.
>
> libstdc++-v3/ChangeLog:
>
>         PR libstdc++/114770
>         * src/c++20/tzdb.cc (do_locate_zone): Support links that have
>         another link as their target.
>         * testsuite/std/time/tzdb/links.cc: New test.
> ---
>  libstdc++-v3/src/c++20/tzdb.cc                |  57 ++++-
>  libstdc++-v3/testsuite/std/time/tzdb/links.cc | 215 ++++++++++++++++++
>  2 files changed, 268 insertions(+), 4 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/std/time/tzdb/links.cc
>
> diff --git a/libstdc++-v3/src/c++20/tzdb.cc b/libstdc++-v3/src/c++20/tzdb.cc
> index 639d1c440ba..c7c7cc9deee 100644
> --- a/libstdc++-v3/src/c++20/tzdb.cc
> +++ b/libstdc++-v3/src/c++20/tzdb.cc
> @@ -1599,7 +1599,7 @@ namespace std::chrono
>      const time_zone*
>      do_locate_zone(const vector<time_zone>& zones,
>                    const vector<time_zone_link>& links,
> -                  string_view tz_name) noexcept
> +                  string_view tz_name)
>      {
>        // Lambda mangling changed between -fabi-version=2 and -fabi-version=18
>        auto search = []<class Vec>(const Vec& v, string_view name) {
> @@ -1610,13 +1610,62 @@ namespace std::chrono
>         return ptr;
>        };
>
> +      // Search zones first.
>        if (auto tz = search(zones, tz_name))
>         return tz;
>
> +      // Search links second.
>        if (auto tz_l = search(links, tz_name))
> -       return search(zones, tz_l->target());
> +       {
> +         // Handle the common case of a link that has a zone as the target.
> +         if (auto tz = search(zones, tz_l->target())) [[likely]]
> +           return tz;
>
> -      return nullptr;
> +         // Either tz_l->target() doesn't exist, or we have a chain of links.
> +         // Use Floyd's cycle-finding algorithm to avoid infinite loops,
> +         // at the cost of extra lookups. In the common case we expect a
> +         // chain of links to be short so the loop won't run many times.
> +         // In particular, the duplicate lookups to move the tortoise
> +         // never happen unless the chain has four or more links.
> +         // When a chain contains a cycle we do multiple duplicate lookups,
> +         // but that case should never happen with correct tzdata.zi,
> +         // so there's no need to optimize cycle detection.
> +
> +         const time_zone_link* tortoise = tz_l;
> +         const time_zone_link* hare = search(links, tz_l->target());
> +         while (hare)
> +           {
> +             // Chains should be short, so first check if it ends here:
> +             if (auto tz = search(zones, hare->target())) [[likely]]
> +               return tz;
> +
> +             // Otherwise follow the chain:
> +             hare = search(links, hare->target());
> +             if (!hare)
> +               break;
> +
> +             // Again, first check if the chain ends at a zone here:
> +             if (auto tz = search(zones, hare->target())) [[likely]]
> +               return tz;
> +
> +             // Follow the chain again:
> +             hare = search(links, hare->target());
> +
> +             if (hare == tortoise)
> +               {
> +                 string_view err = "std::chrono::tzdb: link cycle: ";
> +                 string str;
> +                 str.reserve(err.size() + tz_name.size());
> +                 str += err;
> +                 str += tz_name;
> +                 __throw_runtime_error(str.c_str());
> +               }
> +             // Plod along the chain one step:
> +             tortoise = search(links, tortoise->target());
> +           }
> +       }
> +
> +      return nullptr; // not found
>      }
>    } // namespace
>
> @@ -1626,7 +1675,7 @@ namespace std::chrono
>    {
>      if (auto tz = do_locate_zone(zones, links, tz_name))
>        return tz;
> -    string_view err = "tzdb: cannot locate zone: ";
> +    string_view err = "std::chrono::tzdb: cannot locate zone: ";
>      string str;
>      str.reserve(err.size() + tz_name.size());
>      str += err;
> diff --git a/libstdc++-v3/testsuite/std/time/tzdb/links.cc b/libstdc++-v3/testsuite/std/time/tzdb/links.cc
> new file mode 100644
> index 00000000000..0ba214846c6
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/std/time/tzdb/links.cc
> @@ -0,0 +1,215 @@
> +// { dg-do run { target c++20 } }
> +// { dg-require-effective-target tzdb }
> +// { dg-require-effective-target cxx11_abi }
> +// { dg-xfail-run-if "no weak override on AIX" { powerpc-ibm-aix* } }
> +
> +#include <chrono>
> +#include <fstream>
> +#include <testsuite_hooks.h>
> +
> +static bool override_used = false;
> +
> +namespace __gnu_cxx
> +{
> +  const char* zoneinfo_dir_override() {
> +    override_used = true;
> +    return "./";
> +  }
> +}
> +
> +using namespace std::chrono;
> +
> +void
> +test_link_chains()
> +{
> +  std::ofstream("tzdata.zi") << R"(# version test_1
> +Link  Greenwich  G_M_T
> +Link  Etc/GMT    Greenwich
> +Zone  Etc/GMT  0  -  GMT
> +Zone  A_Zone   1  -  ZON
> +Link  A_Zone L1
> +Link  L1 L2
> +Link  L2 L3
> +Link  L3 L4
> +Link  L4 L5
> +Link  L5 L6
> +Link  L3 L7
> +)";
> +
> +  const auto& db = reload_tzdb();
> +  VERIFY( override_used ); // If this fails then XFAIL for the target.
> +  VERIFY( db.version == "test_1" );
> +
> +  // Simple case of a link with a zone as its target.
> +  VERIFY( locate_zone("Greenwich")->name() == "Etc/GMT" );
> +  // Chains of links, where the target may be another link.
> +  VERIFY( locate_zone("G_M_T")->name() == "Etc/GMT" );
> +  VERIFY( locate_zone("L1")->name() == "A_Zone" );
> +  VERIFY( locate_zone("L2")->name() == "A_Zone" );
> +  VERIFY( locate_zone("L3")->name() == "A_Zone" );
> +  VERIFY( locate_zone("L4")->name() == "A_Zone" );
> +  VERIFY( locate_zone("L5")->name() == "A_Zone" );
> +  VERIFY( locate_zone("L6")->name() == "A_Zone" );
> +  VERIFY( locate_zone("L7")->name() == "A_Zone" );
> +}
> +
> +void
> +test_bad_links()
> +{
> +  // The zic(8) man page says
> +  // > the behavior is unspecified if multiple zone or link lines
> +  // > define the same name"
> +  // For libstdc++ the expected behaviour is described and tested below.
> +  std::ofstream("tzdata.zi") << R"(# version test_2
> +Zone A_Zone   1  -  ZA
> +Zone B_Zone   2  -  ZB
> +Link A_Zone B_Zone
> +Link B_Zone C_Link
> +Link C_Link D_Link
> +Link D_Link E_Link
> +)";
> +
> +  const auto& db2 = reload_tzdb();
> +  VERIFY( override_used ); // If this fails then XFAIL for the target.
> +  VERIFY( db2.version == "test_2" );
> +
> +  // The standard requires locate_zone(name) to search for a zone first,
> +  // so this finds the zone B_Zone, not the link that points to zone A_Zone.
> +  VERIFY( locate_zone("B_Zone")->name() == "B_Zone" );
> +  // And libstdc++ does the same at every step when following chained links:
> +  VERIFY( locate_zone("C_Link")->name() == "B_Zone" );
> +  VERIFY( locate_zone("D_Link")->name() == "B_Zone" );
> +  VERIFY( locate_zone("E_Link")->name() == "B_Zone" );
> +
> +  // The zic(8) man page says
> +  // > the behavior is unspecified if a chain of one or more links
> +  // > does not terminate in a Zone name.
> +  // For libstdc++ we throw std::runtime_error if locate_zone finds an
> +  // unterminated chain, including the case of a chain that includes a cycle.
> +  std::ofstream("tzdata.zi") << R"(# version test_3
> +Zone A_Zone   1  -  ZON
> +Link A_Zone GoodLink
> +Link No_Zone BadLink
> +Link LinkSelf LinkSelf
> +Link LinkSelf Link1
> +Link Link1 Link2
> +Link Cycle2_A Cycle2_B
> +Link Cycle2_B Cycle2_A
> +Link Cycle3_A Cycle3_B
> +Link Cycle3_B Cycle3_C
> +Link Cycle3_C Cycle3_A
> +Link Cycle3_C Cycle3_D
> +Link Cycle4_A Cycle4_B
> +Link Cycle4_B Cycle4_C
> +Link Cycle4_C Cycle4_D
> +Link Cycle4_D Cycle4_A
> +)";
> +
> +  const auto& db3 = reload_tzdb();
> +  VERIFY( db3.version == "test_3" );
> +
> +  // Lookup for valid links should still work even if other links are bad.
> +  VERIFY( locate_zone("GoodLink")->name() == "A_Zone" );
> +
> +#if __cpp_exceptions
> +  try {
> +    locate_zone("BadLink");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("cannot locate zone: BadLink") );
> +  }
> +
> +  // LinkSelf forms a link cycle with itself.
> +  try {
> +    locate_zone("LinkSelf");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: LinkSelf") );
> +  }
> +
> +  // Any chain that leads to LinkSelf reaches a cycle.
> +  try {
> +    locate_zone("Link1");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Link1") );
> +  }
> +
> +  try {
> +    locate_zone("Link2");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Link2") );
> +  }
> +
> +  // Cycle2_A and Cycle2_B form a cycle of length two.
> +  try {
> +    locate_zone("Cycle2_A");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle2_A") );
> +  }
> +
> +  try {
> +    locate_zone("Cycle2_B");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle2_B") );
> +  }
> +
> +  // Cycle3_A, Cycle3_B and Cycle3_C form a cycle of length three.
> +  try {
> +    locate_zone("Cycle3_A");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle3_A") );
> +  }
> +
> +  try {
> +    locate_zone("Cycle3_B");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle3_B") );
> +  }
> +
> +  try {
> +    locate_zone("Cycle3_C");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle3_C") );
> +  }
> +
> +  // Cycle3_D isn't part of the cycle, but it leads to it.
> +  try {
> +    locate_zone("Cycle3_D");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle3_D") );
> +  }
> +
> +  // Cycle4_* links form a cycle of length four.
> +  try {
> +    locate_zone("Cycle4_A");
> +    VERIFY( false );
> +  } catch (const std::runtime_error& e) {
> +    std::string_view what(e.what());
> +    VERIFY( what.ends_with("link cycle: Cycle4_A") );
> +  }
> +#endif
> +}
> +
> +int main()
> +{
> +  test_link_chains();
> +  test_bad_links();
> +}
> --
> 2.44.0
>

  parent reply	other threads:[~2024-04-19  6:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18 16:32 Jonathan Wakely
2024-04-18 16:34 ` Jonathan Wakely
2024-04-19  6:14 ` Richard Biener [this message]
2024-04-19  9:08   ` Jonathan Wakely
2024-04-19 20:08     ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc1dmm4H+pxZPTdkY+EK-iKWWPoKpMcrOurw1yoaRJ2E-A@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jwakely@redhat.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).