Skip to content

Revert "[Clang] [Diagnostics] Simplify filenames that contain '..'" #148367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 12, 2025

Conversation

Sirraide
Copy link
Member

@Sirraide Sirraide commented Jul 12, 2025

Revert #143520 for now since it’s causing issues for people who are using symlinks and prefer to preserve the original path (i.e. looks like we’ll have to make this configurable after all; I just need to figure out how to pass -no-canonical-prefixes down through the driver); I’m planning to refactor this a bit and reland it in a few days.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang-tools-extra clang-tidy clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jul 12, 2025
@Sirraide Sirraide merged commit 7b43c6c into main Jul 12, 2025
12 of 14 checks passed
@llvmbot
Copy link
Member

llvmbot commented Jul 12, 2025

@llvm/pr-subscribers-clang

Author: None (Sirraide)

Changes

Revert llvm/llvm-project#143520 for now since it’s causing issues for people who are using symlinks and prefer to preserve the original path (i.e. looks like we’ll have to make this configurable after all; I just need to figure out how to pass -no-canonical-prefixes down through the driver); I’m planning to refactor this a bit and reland it in a few days.


Full diff: https://github.com/llvm/llvm-project/pull/148367.diff

7 Files Affected:

  • (modified) clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp (+3-4)
  • (modified) clang/include/clang/Basic/SourceManager.h (-16)
  • (modified) clang/lib/Basic/SourceManager.cpp (-72)
  • (modified) clang/lib/Frontend/SARIFDiagnostic.cpp (+30-1)
  • (modified) clang/lib/Frontend/TextDiagnostic.cpp (+33-1)
  • (modified) clang/test/Frontend/absolute-paths.c (+3-3)
  • (removed) clang/test/Frontend/simplify-paths.c (-18)
diff --git a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
index 30bcdc3f87743..7efa7d070f69f 100644
--- a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
+++ b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
@@ -12,9 +12,8 @@
 // RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
 // RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' -quiet %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
 
-// `-header-filter` operates on the actual file path that the user provided in
-// the #include directive; however, Clang's path name simplification causes the
-// path to be printed in canonicalised form here.
+// Check that `-header-filter` operates on the same file paths as paths in
+// diagnostics printed by ClangTidy.
 #include "dir1/dir2/../header_alias.h"
-// CHECK_HEADER_ALIAS: dir1/header.h:1:11: warning: single-argument constructors
+// CHECK_HEADER_ALIAS: dir1/dir2/../header_alias.h:1:11: warning: single-argument constructors
 // CHECK_HEADER-NOT: warning:
diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index 00da67fba669c..ed967fd47dc83 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -824,12 +824,6 @@ class SourceManager : public RefCountedBase<SourceManager> {
 
   mutable std::unique_ptr<SrcMgr::SLocEntry> FakeSLocEntryForRecovery;
 
-  /// Cache for filenames used in diagnostics. See 'getNameForDiagnostic()'.
-  mutable llvm::StringMap<StringRef> DiagNames;
-
-  /// Allocator for absolute/short names.
-  mutable llvm::BumpPtrAllocator DiagNameAlloc;
-
   /// Lazily computed map of macro argument chunks to their expanded
   /// source location.
   using MacroArgsMap = std::map<unsigned, SourceLocation>;
@@ -1854,16 +1848,6 @@ class SourceManager : public RefCountedBase<SourceManager> {
   /// \return Location of the top-level macro caller.
   SourceLocation getTopMacroCallerLoc(SourceLocation Loc) const;
 
-  /// Retrieve the name of a file suitable for diagnostics.
-  // FIXME: Passing in the DiagnosticOptions here is a workaround for the
-  // fact that installapi does some weird things with DiagnosticsEngines,
-  // which causes the 'Diag' member of SourceManager (or at least the
-  // DiagnosticsOptions member thereof) to be a dangling reference
-  // sometimes. We should probably fix that or decouple the two classes
-  // to avoid this issue entirely.
-  StringRef getNameForDiagnostic(StringRef Filename,
-                                 const DiagnosticOptions &Opts) const;
-
 private:
   friend class ASTReader;
   friend class ASTWriter;
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index e6b61302cc294..b2b1488f9dc8e 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -2390,75 +2390,3 @@ SourceManagerForFile::SourceManagerForFile(StringRef FileName,
   assert(ID.isValid());
   SourceMgr->setMainFileID(ID);
 }
-
-StringRef
-SourceManager::getNameForDiagnostic(StringRef Filename,
-                                    const DiagnosticOptions &Opts) const {
-  OptionalFileEntryRef File = getFileManager().getOptionalFileRef(Filename);
-  if (!File)
-    return Filename;
-
-  bool SimplifyPath = [&] {
-    if (Opts.AbsolutePath)
-      return true;
-
-    // Try to simplify paths that contain '..' in any case since paths to
-    // standard library headers especially tend to get quite long otherwise.
-    // Only do that for local filesystems though to avoid slowing down
-    // compilation too much.
-    if (!File->getName().contains(".."))
-      return false;
-
-    // If we're not on Windows, check if we're on a network file system and
-    // avoid simplifying the path in that case since that can be slow. On
-    // Windows, the check for a local filesystem is already slow, so skip it.
-#ifndef _WIN32
-    if (!llvm::sys::fs::is_local(File->getName()))
-      return false;
-#endif
-
-    return true;
-  }();
-
-  if (!SimplifyPath)
-    return Filename;
-
-  // This may involve computing canonical names, so cache the result.
-  StringRef &CacheEntry = DiagNames[Filename];
-  if (!CacheEntry.empty())
-    return CacheEntry;
-
-  // We want to print a simplified absolute path, i. e. without "dots".
-  //
-  // The hardest part here are the paths like "<part1>/<link>/../<part2>".
-  // On Unix-like systems, we cannot just collapse "<link>/..", because
-  // paths are resolved sequentially, and, thereby, the path
-  // "<part1>/<part2>" may point to a different location. That is why
-  // we use FileManager::getCanonicalName(), which expands all indirections
-  // with llvm::sys::fs::real_path() and caches the result.
-  //
-  // On the other hand, it would be better to preserve as much of the
-  // original path as possible, because that helps a user to recognize it.
-  // real_path() expands all links, which sometimes too much. Luckily,
-  // on Windows we can just use llvm::sys::path::remove_dots(), because,
-  // on that system, both aforementioned paths point to the same place.
-  SmallString<256> TempBuf;
-#ifdef _WIN32
-  TempBuf = File->getName();
-  llvm::sys::fs::make_absolute(TempBuf);
-  llvm::sys::path::native(TempBuf);
-  llvm::sys::path::remove_dots(TempBuf, /* remove_dot_dot */ true);
-#else
-  TempBuf = getFileManager().getCanonicalName(*File);
-#endif
-
-  // In some cases, the resolved path may actually end up being longer (e.g.
-  // if it was originally a relative path), so just retain whichever one
-  // ends up being shorter.
-  if (!Opts.AbsolutePath && TempBuf.size() > Filename.size())
-    CacheEntry = Filename;
-  else
-    CacheEntry = TempBuf.str().copy(DiagNameAlloc);
-
-  return CacheEntry;
-}
diff --git a/clang/lib/Frontend/SARIFDiagnostic.cpp b/clang/lib/Frontend/SARIFDiagnostic.cpp
index f9714018f7b46..ac27d7480de3e 100644
--- a/clang/lib/Frontend/SARIFDiagnostic.cpp
+++ b/clang/lib/Frontend/SARIFDiagnostic.cpp
@@ -163,7 +163,36 @@ SARIFDiagnostic::addDiagnosticLevelToRule(SarifRule Rule,
 
 llvm::StringRef SARIFDiagnostic::emitFilename(StringRef Filename,
                                               const SourceManager &SM) {
-  return SM.getNameForDiagnostic(Filename, DiagOpts);
+  if (DiagOpts.AbsolutePath) {
+    auto File = SM.getFileManager().getOptionalFileRef(Filename);
+    if (File) {
+      // We want to print a simplified absolute path, i. e. without "dots".
+      //
+      // The hardest part here are the paths like "<part1>/<link>/../<part2>".
+      // On Unix-like systems, we cannot just collapse "<link>/..", because
+      // paths are resolved sequentially, and, thereby, the path
+      // "<part1>/<part2>" may point to a different location. That is why
+      // we use FileManager::getCanonicalName(), which expands all indirections
+      // with llvm::sys::fs::real_path() and caches the result.
+      //
+      // On the other hand, it would be better to preserve as much of the
+      // original path as possible, because that helps a user to recognize it.
+      // real_path() expands all links, which is sometimes too much. Luckily,
+      // on Windows we can just use llvm::sys::path::remove_dots(), because,
+      // on that system, both aforementioned paths point to the same place.
+#ifdef _WIN32
+      SmallString<256> TmpFilename = File->getName();
+      llvm::sys::fs::make_absolute(TmpFilename);
+      llvm::sys::path::native(TmpFilename);
+      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
+      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+#else
+      Filename = SM.getFileManager().getCanonicalName(*File);
+#endif
+    }
+  }
+
+  return Filename;
 }
 
 /// Print out the file/line/column information and include trace.
diff --git a/clang/lib/Frontend/TextDiagnostic.cpp b/clang/lib/Frontend/TextDiagnostic.cpp
index e5bf86791c44a..ccdd59da89bd1 100644
--- a/clang/lib/Frontend/TextDiagnostic.cpp
+++ b/clang/lib/Frontend/TextDiagnostic.cpp
@@ -738,7 +738,39 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
 }
 
 void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
-  OS << SM.getNameForDiagnostic(Filename, DiagOpts);
+#ifdef _WIN32
+  SmallString<4096> TmpFilename;
+#endif
+  if (DiagOpts.AbsolutePath) {
+    auto File = SM.getFileManager().getOptionalFileRef(Filename);
+    if (File) {
+      // We want to print a simplified absolute path, i. e. without "dots".
+      //
+      // The hardest part here are the paths like "<part1>/<link>/../<part2>".
+      // On Unix-like systems, we cannot just collapse "<link>/..", because
+      // paths are resolved sequentially, and, thereby, the path
+      // "<part1>/<part2>" may point to a different location. That is why
+      // we use FileManager::getCanonicalName(), which expands all indirections
+      // with llvm::sys::fs::real_path() and caches the result.
+      //
+      // On the other hand, it would be better to preserve as much of the
+      // original path as possible, because that helps a user to recognize it.
+      // real_path() expands all links, which sometimes too much. Luckily,
+      // on Windows we can just use llvm::sys::path::remove_dots(), because,
+      // on that system, both aforementioned paths point to the same place.
+#ifdef _WIN32
+      TmpFilename = File->getName();
+      llvm::sys::fs::make_absolute(TmpFilename);
+      llvm::sys::path::native(TmpFilename);
+      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
+      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+#else
+      Filename = SM.getFileManager().getCanonicalName(*File);
+#endif
+    }
+  }
+
+  OS << Filename;
 }
 
 /// Print out the file/line/column information and include trace.
diff --git a/clang/test/Frontend/absolute-paths.c b/clang/test/Frontend/absolute-paths.c
index e507f0d4c34b3..e06cf262dd8e2 100644
--- a/clang/test/Frontend/absolute-paths.c
+++ b/clang/test/Frontend/absolute-paths.c
@@ -8,9 +8,9 @@
 
 #include "absolute-paths.h"
 
-// Check that the bogus prefix we added is stripped out even if absolute paths
-// are disabled.
-// NORMAL-NOT: SystemHeaderPrefix
+// Check whether the diagnostic from the header above includes the dummy
+// directory in the path.
+// NORMAL: SystemHeaderPrefix
 // ABSOLUTE-NOT: SystemHeaderPrefix
 // CHECK: warning: non-void function does not return a value
 
diff --git a/clang/test/Frontend/simplify-paths.c b/clang/test/Frontend/simplify-paths.c
deleted file mode 100644
index 90af7c4ee3836..0000000000000
--- a/clang/test/Frontend/simplify-paths.c
+++ /dev/null
@@ -1,18 +0,0 @@
-// REQUIRES: shell
-
-// RUN: rm -rf %t
-// RUN: mkdir -p %t/a/b/
-// RUN: echo 'void x;' > %t/test.h
-// RUN: echo 'const void x;' > %t/header_with_a_really_long_name.h
-// RUN: ln -s %t/header_with_a_really_long_name.h %t/a/shorter_name.h
-//
-// RUN: %clang_cc1 -fsyntax-only -I %t %s 2> %t/output.txt || true
-// RUN: cat %t/output.txt | FileCheck %s
-
-// Check that we strip '..' by canonicalising the path...
-#include "a/b/../../test.h"
-// CHECK: simplify-paths.c.tmp/test.h:1:6: error: variable has incomplete type 'void'
-
-// ... but only if the resulting path is actually shorter.
-#include "a/b/../shorter_name.h"
-// CHECK: simplify-paths.c.tmp/a/b/../shorter_name.h:1:12: error: variable has incomplete type 'const void'

@Sirraide Sirraide deleted the revert-143520-perf/simplify-paths branch July 12, 2025 13:13
@llvmbot
Copy link
Member

llvmbot commented Jul 12, 2025

@llvm/pr-subscribers-clang-tools-extra

Author: None (Sirraide)

Changes

Revert llvm/llvm-project#143520 for now since it’s causing issues for people who are using symlinks and prefer to preserve the original path (i.e. looks like we’ll have to make this configurable after all; I just need to figure out how to pass -no-canonical-prefixes down through the driver); I’m planning to refactor this a bit and reland it in a few days.


Full diff: https://github.com/llvm/llvm-project/pull/148367.diff

7 Files Affected:

  • (modified) clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp (+3-4)
  • (modified) clang/include/clang/Basic/SourceManager.h (-16)
  • (modified) clang/lib/Basic/SourceManager.cpp (-72)
  • (modified) clang/lib/Frontend/SARIFDiagnostic.cpp (+30-1)
  • (modified) clang/lib/Frontend/TextDiagnostic.cpp (+33-1)
  • (modified) clang/test/Frontend/absolute-paths.c (+3-3)
  • (removed) clang/test/Frontend/simplify-paths.c (-18)
diff --git a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
index 30bcdc3f87743..7efa7d070f69f 100644
--- a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
+++ b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
@@ -12,9 +12,8 @@
 // RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
 // RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' -quiet %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
 
-// `-header-filter` operates on the actual file path that the user provided in
-// the #include directive; however, Clang's path name simplification causes the
-// path to be printed in canonicalised form here.
+// Check that `-header-filter` operates on the same file paths as paths in
+// diagnostics printed by ClangTidy.
 #include "dir1/dir2/../header_alias.h"
-// CHECK_HEADER_ALIAS: dir1/header.h:1:11: warning: single-argument constructors
+// CHECK_HEADER_ALIAS: dir1/dir2/../header_alias.h:1:11: warning: single-argument constructors
 // CHECK_HEADER-NOT: warning:
diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index 00da67fba669c..ed967fd47dc83 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -824,12 +824,6 @@ class SourceManager : public RefCountedBase<SourceManager> {
 
   mutable std::unique_ptr<SrcMgr::SLocEntry> FakeSLocEntryForRecovery;
 
-  /// Cache for filenames used in diagnostics. See 'getNameForDiagnostic()'.
-  mutable llvm::StringMap<StringRef> DiagNames;
-
-  /// Allocator for absolute/short names.
-  mutable llvm::BumpPtrAllocator DiagNameAlloc;
-
   /// Lazily computed map of macro argument chunks to their expanded
   /// source location.
   using MacroArgsMap = std::map<unsigned, SourceLocation>;
@@ -1854,16 +1848,6 @@ class SourceManager : public RefCountedBase<SourceManager> {
   /// \return Location of the top-level macro caller.
   SourceLocation getTopMacroCallerLoc(SourceLocation Loc) const;
 
-  /// Retrieve the name of a file suitable for diagnostics.
-  // FIXME: Passing in the DiagnosticOptions here is a workaround for the
-  // fact that installapi does some weird things with DiagnosticsEngines,
-  // which causes the 'Diag' member of SourceManager (or at least the
-  // DiagnosticsOptions member thereof) to be a dangling reference
-  // sometimes. We should probably fix that or decouple the two classes
-  // to avoid this issue entirely.
-  StringRef getNameForDiagnostic(StringRef Filename,
-                                 const DiagnosticOptions &Opts) const;
-
 private:
   friend class ASTReader;
   friend class ASTWriter;
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index e6b61302cc294..b2b1488f9dc8e 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -2390,75 +2390,3 @@ SourceManagerForFile::SourceManagerForFile(StringRef FileName,
   assert(ID.isValid());
   SourceMgr->setMainFileID(ID);
 }
-
-StringRef
-SourceManager::getNameForDiagnostic(StringRef Filename,
-                                    const DiagnosticOptions &Opts) const {
-  OptionalFileEntryRef File = getFileManager().getOptionalFileRef(Filename);
-  if (!File)
-    return Filename;
-
-  bool SimplifyPath = [&] {
-    if (Opts.AbsolutePath)
-      return true;
-
-    // Try to simplify paths that contain '..' in any case since paths to
-    // standard library headers especially tend to get quite long otherwise.
-    // Only do that for local filesystems though to avoid slowing down
-    // compilation too much.
-    if (!File->getName().contains(".."))
-      return false;
-
-    // If we're not on Windows, check if we're on a network file system and
-    // avoid simplifying the path in that case since that can be slow. On
-    // Windows, the check for a local filesystem is already slow, so skip it.
-#ifndef _WIN32
-    if (!llvm::sys::fs::is_local(File->getName()))
-      return false;
-#endif
-
-    return true;
-  }();
-
-  if (!SimplifyPath)
-    return Filename;
-
-  // This may involve computing canonical names, so cache the result.
-  StringRef &CacheEntry = DiagNames[Filename];
-  if (!CacheEntry.empty())
-    return CacheEntry;
-
-  // We want to print a simplified absolute path, i. e. without "dots".
-  //
-  // The hardest part here are the paths like "<part1>/<link>/../<part2>".
-  // On Unix-like systems, we cannot just collapse "<link>/..", because
-  // paths are resolved sequentially, and, thereby, the path
-  // "<part1>/<part2>" may point to a different location. That is why
-  // we use FileManager::getCanonicalName(), which expands all indirections
-  // with llvm::sys::fs::real_path() and caches the result.
-  //
-  // On the other hand, it would be better to preserve as much of the
-  // original path as possible, because that helps a user to recognize it.
-  // real_path() expands all links, which sometimes too much. Luckily,
-  // on Windows we can just use llvm::sys::path::remove_dots(), because,
-  // on that system, both aforementioned paths point to the same place.
-  SmallString<256> TempBuf;
-#ifdef _WIN32
-  TempBuf = File->getName();
-  llvm::sys::fs::make_absolute(TempBuf);
-  llvm::sys::path::native(TempBuf);
-  llvm::sys::path::remove_dots(TempBuf, /* remove_dot_dot */ true);
-#else
-  TempBuf = getFileManager().getCanonicalName(*File);
-#endif
-
-  // In some cases, the resolved path may actually end up being longer (e.g.
-  // if it was originally a relative path), so just retain whichever one
-  // ends up being shorter.
-  if (!Opts.AbsolutePath && TempBuf.size() > Filename.size())
-    CacheEntry = Filename;
-  else
-    CacheEntry = TempBuf.str().copy(DiagNameAlloc);
-
-  return CacheEntry;
-}
diff --git a/clang/lib/Frontend/SARIFDiagnostic.cpp b/clang/lib/Frontend/SARIFDiagnostic.cpp
index f9714018f7b46..ac27d7480de3e 100644
--- a/clang/lib/Frontend/SARIFDiagnostic.cpp
+++ b/clang/lib/Frontend/SARIFDiagnostic.cpp
@@ -163,7 +163,36 @@ SARIFDiagnostic::addDiagnosticLevelToRule(SarifRule Rule,
 
 llvm::StringRef SARIFDiagnostic::emitFilename(StringRef Filename,
                                               const SourceManager &SM) {
-  return SM.getNameForDiagnostic(Filename, DiagOpts);
+  if (DiagOpts.AbsolutePath) {
+    auto File = SM.getFileManager().getOptionalFileRef(Filename);
+    if (File) {
+      // We want to print a simplified absolute path, i. e. without "dots".
+      //
+      // The hardest part here are the paths like "<part1>/<link>/../<part2>".
+      // On Unix-like systems, we cannot just collapse "<link>/..", because
+      // paths are resolved sequentially, and, thereby, the path
+      // "<part1>/<part2>" may point to a different location. That is why
+      // we use FileManager::getCanonicalName(), which expands all indirections
+      // with llvm::sys::fs::real_path() and caches the result.
+      //
+      // On the other hand, it would be better to preserve as much of the
+      // original path as possible, because that helps a user to recognize it.
+      // real_path() expands all links, which is sometimes too much. Luckily,
+      // on Windows we can just use llvm::sys::path::remove_dots(), because,
+      // on that system, both aforementioned paths point to the same place.
+#ifdef _WIN32
+      SmallString<256> TmpFilename = File->getName();
+      llvm::sys::fs::make_absolute(TmpFilename);
+      llvm::sys::path::native(TmpFilename);
+      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
+      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+#else
+      Filename = SM.getFileManager().getCanonicalName(*File);
+#endif
+    }
+  }
+
+  return Filename;
 }
 
 /// Print out the file/line/column information and include trace.
diff --git a/clang/lib/Frontend/TextDiagnostic.cpp b/clang/lib/Frontend/TextDiagnostic.cpp
index e5bf86791c44a..ccdd59da89bd1 100644
--- a/clang/lib/Frontend/TextDiagnostic.cpp
+++ b/clang/lib/Frontend/TextDiagnostic.cpp
@@ -738,7 +738,39 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
 }
 
 void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
-  OS << SM.getNameForDiagnostic(Filename, DiagOpts);
+#ifdef _WIN32
+  SmallString<4096> TmpFilename;
+#endif
+  if (DiagOpts.AbsolutePath) {
+    auto File = SM.getFileManager().getOptionalFileRef(Filename);
+    if (File) {
+      // We want to print a simplified absolute path, i. e. without "dots".
+      //
+      // The hardest part here are the paths like "<part1>/<link>/../<part2>".
+      // On Unix-like systems, we cannot just collapse "<link>/..", because
+      // paths are resolved sequentially, and, thereby, the path
+      // "<part1>/<part2>" may point to a different location. That is why
+      // we use FileManager::getCanonicalName(), which expands all indirections
+      // with llvm::sys::fs::real_path() and caches the result.
+      //
+      // On the other hand, it would be better to preserve as much of the
+      // original path as possible, because that helps a user to recognize it.
+      // real_path() expands all links, which sometimes too much. Luckily,
+      // on Windows we can just use llvm::sys::path::remove_dots(), because,
+      // on that system, both aforementioned paths point to the same place.
+#ifdef _WIN32
+      TmpFilename = File->getName();
+      llvm::sys::fs::make_absolute(TmpFilename);
+      llvm::sys::path::native(TmpFilename);
+      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
+      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+#else
+      Filename = SM.getFileManager().getCanonicalName(*File);
+#endif
+    }
+  }
+
+  OS << Filename;
 }
 
 /// Print out the file/line/column information and include trace.
diff --git a/clang/test/Frontend/absolute-paths.c b/clang/test/Frontend/absolute-paths.c
index e507f0d4c34b3..e06cf262dd8e2 100644
--- a/clang/test/Frontend/absolute-paths.c
+++ b/clang/test/Frontend/absolute-paths.c
@@ -8,9 +8,9 @@
 
 #include "absolute-paths.h"
 
-// Check that the bogus prefix we added is stripped out even if absolute paths
-// are disabled.
-// NORMAL-NOT: SystemHeaderPrefix
+// Check whether the diagnostic from the header above includes the dummy
+// directory in the path.
+// NORMAL: SystemHeaderPrefix
 // ABSOLUTE-NOT: SystemHeaderPrefix
 // CHECK: warning: non-void function does not return a value
 
diff --git a/clang/test/Frontend/simplify-paths.c b/clang/test/Frontend/simplify-paths.c
deleted file mode 100644
index 90af7c4ee3836..0000000000000
--- a/clang/test/Frontend/simplify-paths.c
+++ /dev/null
@@ -1,18 +0,0 @@
-// REQUIRES: shell
-
-// RUN: rm -rf %t
-// RUN: mkdir -p %t/a/b/
-// RUN: echo 'void x;' > %t/test.h
-// RUN: echo 'const void x;' > %t/header_with_a_really_long_name.h
-// RUN: ln -s %t/header_with_a_really_long_name.h %t/a/shorter_name.h
-//
-// RUN: %clang_cc1 -fsyntax-only -I %t %s 2> %t/output.txt || true
-// RUN: cat %t/output.txt | FileCheck %s
-
-// Check that we strip '..' by canonicalising the path...
-#include "a/b/../../test.h"
-// CHECK: simplify-paths.c.tmp/test.h:1:6: error: variable has incomplete type 'void'
-
-// ... but only if the resulting path is actually shorter.
-#include "a/b/../shorter_name.h"
-// CHECK: simplify-paths.c.tmp/a/b/../shorter_name.h:1:12: error: variable has incomplete type 'const void'

@llvmbot
Copy link
Member

llvmbot commented Jul 12, 2025

@llvm/pr-subscribers-clang-tidy

Author: None (Sirraide)

Changes

Revert llvm/llvm-project#143520 for now since it’s causing issues for people who are using symlinks and prefer to preserve the original path (i.e. looks like we’ll have to make this configurable after all; I just need to figure out how to pass -no-canonical-prefixes down through the driver); I’m planning to refactor this a bit and reland it in a few days.


Full diff: https://github.com/llvm/llvm-project/pull/148367.diff

7 Files Affected:

  • (modified) clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp (+3-4)
  • (modified) clang/include/clang/Basic/SourceManager.h (-16)
  • (modified) clang/lib/Basic/SourceManager.cpp (-72)
  • (modified) clang/lib/Frontend/SARIFDiagnostic.cpp (+30-1)
  • (modified) clang/lib/Frontend/TextDiagnostic.cpp (+33-1)
  • (modified) clang/test/Frontend/absolute-paths.c (+3-3)
  • (removed) clang/test/Frontend/simplify-paths.c (-18)
diff --git a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
index 30bcdc3f87743..7efa7d070f69f 100644
--- a/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
+++ b/clang-tools-extra/test/clang-tidy/infrastructure/file-filter-symlinks.cpp
@@ -12,9 +12,8 @@
 // RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
 // RUN: clang-tidy -checks='-*,google-explicit-constructor' -header-filter='header\.h' -quiet %s -- -I %t 2>&1 | FileCheck --check-prefix=CHECK_HEADER %s
 
-// `-header-filter` operates on the actual file path that the user provided in
-// the #include directive; however, Clang's path name simplification causes the
-// path to be printed in canonicalised form here.
+// Check that `-header-filter` operates on the same file paths as paths in
+// diagnostics printed by ClangTidy.
 #include "dir1/dir2/../header_alias.h"
-// CHECK_HEADER_ALIAS: dir1/header.h:1:11: warning: single-argument constructors
+// CHECK_HEADER_ALIAS: dir1/dir2/../header_alias.h:1:11: warning: single-argument constructors
 // CHECK_HEADER-NOT: warning:
diff --git a/clang/include/clang/Basic/SourceManager.h b/clang/include/clang/Basic/SourceManager.h
index 00da67fba669c..ed967fd47dc83 100644
--- a/clang/include/clang/Basic/SourceManager.h
+++ b/clang/include/clang/Basic/SourceManager.h
@@ -824,12 +824,6 @@ class SourceManager : public RefCountedBase<SourceManager> {
 
   mutable std::unique_ptr<SrcMgr::SLocEntry> FakeSLocEntryForRecovery;
 
-  /// Cache for filenames used in diagnostics. See 'getNameForDiagnostic()'.
-  mutable llvm::StringMap<StringRef> DiagNames;
-
-  /// Allocator for absolute/short names.
-  mutable llvm::BumpPtrAllocator DiagNameAlloc;
-
   /// Lazily computed map of macro argument chunks to their expanded
   /// source location.
   using MacroArgsMap = std::map<unsigned, SourceLocation>;
@@ -1854,16 +1848,6 @@ class SourceManager : public RefCountedBase<SourceManager> {
   /// \return Location of the top-level macro caller.
   SourceLocation getTopMacroCallerLoc(SourceLocation Loc) const;
 
-  /// Retrieve the name of a file suitable for diagnostics.
-  // FIXME: Passing in the DiagnosticOptions here is a workaround for the
-  // fact that installapi does some weird things with DiagnosticsEngines,
-  // which causes the 'Diag' member of SourceManager (or at least the
-  // DiagnosticsOptions member thereof) to be a dangling reference
-  // sometimes. We should probably fix that or decouple the two classes
-  // to avoid this issue entirely.
-  StringRef getNameForDiagnostic(StringRef Filename,
-                                 const DiagnosticOptions &Opts) const;
-
 private:
   friend class ASTReader;
   friend class ASTWriter;
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp
index e6b61302cc294..b2b1488f9dc8e 100644
--- a/clang/lib/Basic/SourceManager.cpp
+++ b/clang/lib/Basic/SourceManager.cpp
@@ -2390,75 +2390,3 @@ SourceManagerForFile::SourceManagerForFile(StringRef FileName,
   assert(ID.isValid());
   SourceMgr->setMainFileID(ID);
 }
-
-StringRef
-SourceManager::getNameForDiagnostic(StringRef Filename,
-                                    const DiagnosticOptions &Opts) const {
-  OptionalFileEntryRef File = getFileManager().getOptionalFileRef(Filename);
-  if (!File)
-    return Filename;
-
-  bool SimplifyPath = [&] {
-    if (Opts.AbsolutePath)
-      return true;
-
-    // Try to simplify paths that contain '..' in any case since paths to
-    // standard library headers especially tend to get quite long otherwise.
-    // Only do that for local filesystems though to avoid slowing down
-    // compilation too much.
-    if (!File->getName().contains(".."))
-      return false;
-
-    // If we're not on Windows, check if we're on a network file system and
-    // avoid simplifying the path in that case since that can be slow. On
-    // Windows, the check for a local filesystem is already slow, so skip it.
-#ifndef _WIN32
-    if (!llvm::sys::fs::is_local(File->getName()))
-      return false;
-#endif
-
-    return true;
-  }();
-
-  if (!SimplifyPath)
-    return Filename;
-
-  // This may involve computing canonical names, so cache the result.
-  StringRef &CacheEntry = DiagNames[Filename];
-  if (!CacheEntry.empty())
-    return CacheEntry;
-
-  // We want to print a simplified absolute path, i. e. without "dots".
-  //
-  // The hardest part here are the paths like "<part1>/<link>/../<part2>".
-  // On Unix-like systems, we cannot just collapse "<link>/..", because
-  // paths are resolved sequentially, and, thereby, the path
-  // "<part1>/<part2>" may point to a different location. That is why
-  // we use FileManager::getCanonicalName(), which expands all indirections
-  // with llvm::sys::fs::real_path() and caches the result.
-  //
-  // On the other hand, it would be better to preserve as much of the
-  // original path as possible, because that helps a user to recognize it.
-  // real_path() expands all links, which sometimes too much. Luckily,
-  // on Windows we can just use llvm::sys::path::remove_dots(), because,
-  // on that system, both aforementioned paths point to the same place.
-  SmallString<256> TempBuf;
-#ifdef _WIN32
-  TempBuf = File->getName();
-  llvm::sys::fs::make_absolute(TempBuf);
-  llvm::sys::path::native(TempBuf);
-  llvm::sys::path::remove_dots(TempBuf, /* remove_dot_dot */ true);
-#else
-  TempBuf = getFileManager().getCanonicalName(*File);
-#endif
-
-  // In some cases, the resolved path may actually end up being longer (e.g.
-  // if it was originally a relative path), so just retain whichever one
-  // ends up being shorter.
-  if (!Opts.AbsolutePath && TempBuf.size() > Filename.size())
-    CacheEntry = Filename;
-  else
-    CacheEntry = TempBuf.str().copy(DiagNameAlloc);
-
-  return CacheEntry;
-}
diff --git a/clang/lib/Frontend/SARIFDiagnostic.cpp b/clang/lib/Frontend/SARIFDiagnostic.cpp
index f9714018f7b46..ac27d7480de3e 100644
--- a/clang/lib/Frontend/SARIFDiagnostic.cpp
+++ b/clang/lib/Frontend/SARIFDiagnostic.cpp
@@ -163,7 +163,36 @@ SARIFDiagnostic::addDiagnosticLevelToRule(SarifRule Rule,
 
 llvm::StringRef SARIFDiagnostic::emitFilename(StringRef Filename,
                                               const SourceManager &SM) {
-  return SM.getNameForDiagnostic(Filename, DiagOpts);
+  if (DiagOpts.AbsolutePath) {
+    auto File = SM.getFileManager().getOptionalFileRef(Filename);
+    if (File) {
+      // We want to print a simplified absolute path, i. e. without "dots".
+      //
+      // The hardest part here are the paths like "<part1>/<link>/../<part2>".
+      // On Unix-like systems, we cannot just collapse "<link>/..", because
+      // paths are resolved sequentially, and, thereby, the path
+      // "<part1>/<part2>" may point to a different location. That is why
+      // we use FileManager::getCanonicalName(), which expands all indirections
+      // with llvm::sys::fs::real_path() and caches the result.
+      //
+      // On the other hand, it would be better to preserve as much of the
+      // original path as possible, because that helps a user to recognize it.
+      // real_path() expands all links, which is sometimes too much. Luckily,
+      // on Windows we can just use llvm::sys::path::remove_dots(), because,
+      // on that system, both aforementioned paths point to the same place.
+#ifdef _WIN32
+      SmallString<256> TmpFilename = File->getName();
+      llvm::sys::fs::make_absolute(TmpFilename);
+      llvm::sys::path::native(TmpFilename);
+      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
+      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+#else
+      Filename = SM.getFileManager().getCanonicalName(*File);
+#endif
+    }
+  }
+
+  return Filename;
 }
 
 /// Print out the file/line/column information and include trace.
diff --git a/clang/lib/Frontend/TextDiagnostic.cpp b/clang/lib/Frontend/TextDiagnostic.cpp
index e5bf86791c44a..ccdd59da89bd1 100644
--- a/clang/lib/Frontend/TextDiagnostic.cpp
+++ b/clang/lib/Frontend/TextDiagnostic.cpp
@@ -738,7 +738,39 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
 }
 
 void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
-  OS << SM.getNameForDiagnostic(Filename, DiagOpts);
+#ifdef _WIN32
+  SmallString<4096> TmpFilename;
+#endif
+  if (DiagOpts.AbsolutePath) {
+    auto File = SM.getFileManager().getOptionalFileRef(Filename);
+    if (File) {
+      // We want to print a simplified absolute path, i. e. without "dots".
+      //
+      // The hardest part here are the paths like "<part1>/<link>/../<part2>".
+      // On Unix-like systems, we cannot just collapse "<link>/..", because
+      // paths are resolved sequentially, and, thereby, the path
+      // "<part1>/<part2>" may point to a different location. That is why
+      // we use FileManager::getCanonicalName(), which expands all indirections
+      // with llvm::sys::fs::real_path() and caches the result.
+      //
+      // On the other hand, it would be better to preserve as much of the
+      // original path as possible, because that helps a user to recognize it.
+      // real_path() expands all links, which sometimes too much. Luckily,
+      // on Windows we can just use llvm::sys::path::remove_dots(), because,
+      // on that system, both aforementioned paths point to the same place.
+#ifdef _WIN32
+      TmpFilename = File->getName();
+      llvm::sys::fs::make_absolute(TmpFilename);
+      llvm::sys::path::native(TmpFilename);
+      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
+      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+#else
+      Filename = SM.getFileManager().getCanonicalName(*File);
+#endif
+    }
+  }
+
+  OS << Filename;
 }
 
 /// Print out the file/line/column information and include trace.
diff --git a/clang/test/Frontend/absolute-paths.c b/clang/test/Frontend/absolute-paths.c
index e507f0d4c34b3..e06cf262dd8e2 100644
--- a/clang/test/Frontend/absolute-paths.c
+++ b/clang/test/Frontend/absolute-paths.c
@@ -8,9 +8,9 @@
 
 #include "absolute-paths.h"
 
-// Check that the bogus prefix we added is stripped out even if absolute paths
-// are disabled.
-// NORMAL-NOT: SystemHeaderPrefix
+// Check whether the diagnostic from the header above includes the dummy
+// directory in the path.
+// NORMAL: SystemHeaderPrefix
 // ABSOLUTE-NOT: SystemHeaderPrefix
 // CHECK: warning: non-void function does not return a value
 
diff --git a/clang/test/Frontend/simplify-paths.c b/clang/test/Frontend/simplify-paths.c
deleted file mode 100644
index 90af7c4ee3836..0000000000000
--- a/clang/test/Frontend/simplify-paths.c
+++ /dev/null
@@ -1,18 +0,0 @@
-// REQUIRES: shell
-
-// RUN: rm -rf %t
-// RUN: mkdir -p %t/a/b/
-// RUN: echo 'void x;' > %t/test.h
-// RUN: echo 'const void x;' > %t/header_with_a_really_long_name.h
-// RUN: ln -s %t/header_with_a_really_long_name.h %t/a/shorter_name.h
-//
-// RUN: %clang_cc1 -fsyntax-only -I %t %s 2> %t/output.txt || true
-// RUN: cat %t/output.txt | FileCheck %s
-
-// Check that we strip '..' by canonicalising the path...
-#include "a/b/../../test.h"
-// CHECK: simplify-paths.c.tmp/test.h:1:6: error: variable has incomplete type 'void'
-
-// ... but only if the resulting path is actually shorter.
-#include "a/b/../shorter_name.h"
-// CHECK: simplify-paths.c.tmp/a/b/../shorter_name.h:1:12: error: variable has incomplete type 'const void'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category clang-tidy clang-tools-extra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants