Skip to content

Conversation

hjyamauchi
Copy link

@hjyamauchi hjyamauchi commented Sep 22, 2025

Fix the following CAS related test for Windows

Clang :: ClangScanDeps/include-tree-working-directory.c

@hjyamauchi
Copy link
Author

@swift-ci please test llvm

@hjyamauchi
Copy link
Author

Some existing build failure

[2025-09-22T22:28:55.278Z] FAILED: [code=1] tools/clang/bindings/python/tests/CMakeFiles/check-clang-python /home/build-user/llvm-project/build/tools/clang/bindings/python/tests/CMakeFiles/check-clang-python 
[2025-09-22T22:28:55.278Z] cd /home/build-user/llvm-project/clang/bindings/python && /home/build-user/cmake/bin/cmake -E env CLANG_NO_DEFAULT_CONFIG=1 CLANG_LIBRARY_PATH=/home/build-user/llvm-project/build/lib /usr/bin/python3.10 -m unittest discover
[2025-09-22T22:28:55.278Z] ......................................................................................................................................................................
[2025-09-22T22:28:55.278Z] ======================================================================
[2025-09-22T22:28:55.278Z] FAIL: test_all_variants (tests.cindex.test_enums.TestEnums) [<enum 'CursorKind'>]
[2025-09-22T22:28:55.278Z] Check that all libclang enum values are also defined in cindex.py
[2025-09-22T22:28:55.278Z] ----------------------------------------------------------------------
[2025-09-22T22:28:55.278Z] Traceback (most recent call last):
[2025-09-22T22:28:55.278Z]   File "/home/build-user/llvm-project/clang/bindings/python/tests/cindex/test_enums.py", line 87, in test_all_variants
[2025-09-22T22:28:55.278Z]     self.assertEqual(
[2025-09-22T22:28:55.278Z] AssertionError: {'CXCursor_ForgePtrExpr', 'CXCursor_LastExpr'} variants are missing. Please ensure these are defined in <enum 'CursorKind'> in cindex.py.
[2025-09-22T22:28:55.278Z] 
[2025-09-22T22:28:55.278Z] ----------------------------------------------------------------------
[2025-09-22T22:28:55.278Z] Ran 167 tests in 2.968s
[2025-09-22T22:28:55.278Z] 
[2025-09-22T22:28:55.278Z] FAILED (failures=1)

@hjyamauchi hjyamauchi marked this pull request as ready for review September 23, 2025 00:53
// We store the file path into CAS here. Canonicalize it for Windows
// to avoid cache misses due to slash differences
SmallString<256> Storage(Filename);
llvm::sys::path::make_preferred(Storage);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more about how we are ending up with mismatched slashes in this case? The intent here was to keep the paths in the include tree matching the paths that are used naturally by the compiler so that caching doesn't change behaviour (e.g. spelling of paths in diagnostics) unless you opt in to prefix mapping. For example, that is why we don't call remove_dots here when we store the path, but only later when we construct the VFS. I'm not necessarily opposed to your change, but I want to understand it better first.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the include-tree-working-directory.c test, the incoming Filename in the first clang-scan-deps invocation is

C:/Users/hiroshi/cas/llvm-project/build-debug/tools/clang/test/ClangScanDeps/Output/include-tree-working-directory.c.tmp\t.c # Note the last separator is a backslash

and in the second clang-scan-deps invocation, it's

C:/Users/hiroshi/cas/llvm-project/build-debug/tools/clang/test/ClangScanDeps/Output/include-tree-working-directory.c.tmp/t.c # Note the last separator is a slash

These follow from the cdb json templates as below

//--- cdb.json.template
[{
  "directory": "DIR/other",
  "command": "clang -c t.c -I relative -working-directory DIR -o t.o -MD -serialize-diagnostics t.dia",
  "file": "DIR/t.c"
}]
//--- cdb2.json.template
[{
  "directory": "DIR/other",
  "command": "clang -c DIR/t.c -I DIR/relative -working-directory DIR/other -o DIR/t2.o -MD -serialize-diagnostics DIR/t2.dia -fdebug-compilation-dir=DIR -fcoverage-compilation-dir=DIR",
  "file": "DIR/t.c"
}]

In the first invocation/cdb json, t.c is a relative path from the working directory and path concatenation used the (windows natural) backslash, while in the second one, DIR/t.c is an absolute path with a slash as the last separator.

I guess, if we change DIR/t.c to ../t.c in the second json, we won't have a cache hit on Posix (right?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, if we change DIR/t.c to ../t.c in the second json, we won't have a cache hit on Posix (right?)

Right, we don't canonicalize paths in most cases by default. Note it is technically incorrect to eliminate ../ from paths without checking if they resolve the same if they might contain symlinks. But your point is the same for ./ components, which we only strip them from the VFS and lookup side and not from the stored include tree so that we preserve the spelling of paths seen in the original compilation.

So we've made a tradeoff that favours preserving the same behaviour as non-caching builds (e.g. diagnostics and path macros have the right path spellings), but that could hurt cache hit rates. Maybe there is an argument to change that tradeoff? But we should be deliberate, and probably consistent about it between platforms (if we're going to turn / to \ then we probably also want to remove ./ components). @akyrtzi and @cachemeifyoucan may have opinions on this too.

Is it possible to only change the tests on Windows, or perhaps only change the tests and the VFS part of the IncludeTree code (similar to how we call remove_dots only in the VFS side of things)? That would be my preference at least as an immediate fix.

Copy link
Author

@hjyamauchi hjyamauchi Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simply disabling the include-tree-working-directory.c test for Windows and not make any other changes (the logic in IncludeTree.cpp nor ther other tests)? I think in real uses the spelling of the paths will be consistent and cache misses of this nature should be rare.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a test-only fix for the include-tree-working-directory.c test for Windows. It only changes the test on Windows. Please take another look @benlangmuir @cachemeifyoucan

Clang :: ClangScanDeps/include-tree-working-directory.c
@hjyamauchi
Copy link
Author

@swift-ci please test llvm

Copy link

@cachemeifyoucan cachemeifyoucan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but I'm fine with merging this as-is to unblock the test on Windows

// RUN: clang-cas-test -cas %t/cas -print-include-tree @%t/tu.casid | %PathSanitizingFileCheck --sanitize PREFIX=%/t %s

// CHECK: [[PREFIX]]/t.c llvmcas://
// CHECK: PREFIX{{/|\\}}t.c llvmcas://

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a number of tests that do something like pipe the output through sed 's:\\\\\?:/:g' to avoid needing regex for every CHECK line. Does that work here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would probably work but I had had a review comment that using PathSanitizingFileCheck and doing this way would be more precise so I'm following that style.

("%{fs-src-root}", pathlib.Path(sourcedir).anchor),
("%{fs-tmp-root}", pathlib.Path(tmpBase).anchor),
("%{fs-sep}", os.path.sep),
("%{fs-sep-yaml}", "\\\\\\\\\\\\\\\\" if kIsWindows else "/"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did we get from 1 \ to 16 \s?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Four levels of backslash escaping - one for this python string, two for being quoted and being a replacement pattern in sed and one for being a json string.

@hjyamauchi hjyamauchi merged commit 16f18a3 into swiftlang:next Sep 26, 2025
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants