Skip to content

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Dec 11, 2023

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i => a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.

When the main file is preprocessed and we change `MainFileName` to the
original source file name (e.g. `a.i => a.c`), the source manager does
not contain `a.c`, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

```
% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"
```

Fix llvm#56378 by disassociating md5(a.i) with a.c.
@MaskRay MaskRay requested review from pogo59, dwblaikie and rnk December 11, 2023 03:55
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo labels Dec 11, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-clang-codegen

Author: Fangrui Song (MaskRay)

Changes

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i => a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i && clang -g -S a.i && clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.


Full diff: https://github.com/llvm/llvm-project/pull/75022.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+6-4)
  • (modified) clang/test/CodeGen/debug-info-preprocessed-file.i (+4)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 7cf661994a29c7..37d7a6755d3908 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -554,14 +554,16 @@ void CGDebugInfo::CreateCompileUnit() {
     // If the main file name provided is identical to the input file name, and
     // if the input file is a preprocessed source, use the module name for
     // debug info. The module name comes from the name specified in the first
-    // linemarker if the input is a preprocessed source.
+    // linemarker if the input is a preprocessed source. In this case we don't
+    // know the content to compute a checksum.
     if (MainFile->getName() == MainFileName &&
         FrontendOptions::getInputKindForExtension(
             MainFile->getName().rsplit('.').second)
-            .isPreprocessed())
+            .isPreprocessed()) {
       MainFileName = CGM.getModule().getName().str();
-
-    CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    } else {
+      CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    }
   }
 
   llvm::dwarf::SourceLanguage LangTag;
diff --git a/clang/test/CodeGen/debug-info-preprocessed-file.i b/clang/test/CodeGen/debug-info-preprocessed-file.i
index 23fd26525e3bf0..c8a2307d46c316 100644
--- a/clang/test/CodeGen/debug-info-preprocessed-file.i
+++ b/clang/test/CodeGen/debug-info-preprocessed-file.i
@@ -6,6 +6,10 @@
 # 1 "<built-in>" 2
 # 1 "preprocessed-input.c" 2
 
+/// The main file is preprocessed. We change it to preprocessed-input.c. Since
+/// the content is not available, we don't compute a checksum.
 // RUN: %clang -g -c -S -emit-llvm -o - %s | FileCheck %s
 // CHECK: !DICompileUnit(language: DW_LANG_C{{.*}}, file: ![[FILE:[0-9]+]]
 // CHECK: ![[FILE]] = !DIFile(filename: "/foo/bar/preprocessed-input.c"
+// CHECK-NOT: checksumkind:
+// CHECK-NOT: !DIFile(

@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-debuginfo

Author: Fangrui Song (MaskRay)

Changes

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i =&gt; a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i &amp;&amp; clang -g -S a.i &amp;&amp; clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.


Full diff: https://github.com/llvm/llvm-project/pull/75022.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+6-4)
  • (modified) clang/test/CodeGen/debug-info-preprocessed-file.i (+4)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 7cf661994a29c7..37d7a6755d3908 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -554,14 +554,16 @@ void CGDebugInfo::CreateCompileUnit() {
     // If the main file name provided is identical to the input file name, and
     // if the input file is a preprocessed source, use the module name for
     // debug info. The module name comes from the name specified in the first
-    // linemarker if the input is a preprocessed source.
+    // linemarker if the input is a preprocessed source. In this case we don't
+    // know the content to compute a checksum.
     if (MainFile->getName() == MainFileName &&
         FrontendOptions::getInputKindForExtension(
             MainFile->getName().rsplit('.').second)
-            .isPreprocessed())
+            .isPreprocessed()) {
       MainFileName = CGM.getModule().getName().str();
-
-    CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    } else {
+      CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    }
   }
 
   llvm::dwarf::SourceLanguage LangTag;
diff --git a/clang/test/CodeGen/debug-info-preprocessed-file.i b/clang/test/CodeGen/debug-info-preprocessed-file.i
index 23fd26525e3bf0..c8a2307d46c316 100644
--- a/clang/test/CodeGen/debug-info-preprocessed-file.i
+++ b/clang/test/CodeGen/debug-info-preprocessed-file.i
@@ -6,6 +6,10 @@
 # 1 "<built-in>" 2
 # 1 "preprocessed-input.c" 2
 
+/// The main file is preprocessed. We change it to preprocessed-input.c. Since
+/// the content is not available, we don't compute a checksum.
 // RUN: %clang -g -c -S -emit-llvm -o - %s | FileCheck %s
 // CHECK: !DICompileUnit(language: DW_LANG_C{{.*}}, file: ![[FILE:[0-9]+]]
 // CHECK: ![[FILE]] = !DIFile(filename: "/foo/bar/preprocessed-input.c"
+// CHECK-NOT: checksumkind:
+// CHECK-NOT: !DIFile(

@llvmbot
Copy link
Member

llvmbot commented Dec 11, 2023

@llvm/pr-subscribers-clang

Author: Fangrui Song (MaskRay)

Changes

When the main file is preprocessed and we change MainFileName to the
original source file name (e.g. a.i =&gt; a.c), the source manager does
not contain a.c, but we incorrectly associate the DIFile(a.c) with
md5(a.i). This causes CGDebugInfo::emitFunctionStart to create a
duplicate DIFile and leads to a spurious "inconsistent use of MD5
checksums" warning.

% cat a.c
void f() {}
% clang -c -g a.c  # no warning
% clang -E a.c -o a.i &amp;&amp; clang -g -S a.i &amp;&amp; clang -g -c a.s
a.s:9:2: warning: inconsistent use of MD5 checksums
        .file   1 "a.c"
        ^
% grep DIFile a.ll
!1 = !DIFile(filename: "a.c", directory: "/tmp/c", checksumkind: CSK_MD5, checksum: "c5b2e246df7d5f53e176b097a0641c3d")
!11 = !DIFile(filename: "a.c", directory: "/tmp/c")
% grep 'file.*a.c' a.s
        .file   "a.c"
        .file   0 "/tmp/c" "a.c" md5 0x2d14ea70fee15102033eb8d899914cce
        .file   1 "a.c"

Fix #56378 by disassociating md5(a.i) with a.c.


Full diff: https://github.com/llvm/llvm-project/pull/75022.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+6-4)
  • (modified) clang/test/CodeGen/debug-info-preprocessed-file.i (+4)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 7cf661994a29c7..37d7a6755d3908 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -554,14 +554,16 @@ void CGDebugInfo::CreateCompileUnit() {
     // If the main file name provided is identical to the input file name, and
     // if the input file is a preprocessed source, use the module name for
     // debug info. The module name comes from the name specified in the first
-    // linemarker if the input is a preprocessed source.
+    // linemarker if the input is a preprocessed source. In this case we don't
+    // know the content to compute a checksum.
     if (MainFile->getName() == MainFileName &&
         FrontendOptions::getInputKindForExtension(
             MainFile->getName().rsplit('.').second)
-            .isPreprocessed())
+            .isPreprocessed()) {
       MainFileName = CGM.getModule().getName().str();
-
-    CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    } else {
+      CSKind = computeChecksum(SM.getMainFileID(), Checksum);
+    }
   }
 
   llvm::dwarf::SourceLanguage LangTag;
diff --git a/clang/test/CodeGen/debug-info-preprocessed-file.i b/clang/test/CodeGen/debug-info-preprocessed-file.i
index 23fd26525e3bf0..c8a2307d46c316 100644
--- a/clang/test/CodeGen/debug-info-preprocessed-file.i
+++ b/clang/test/CodeGen/debug-info-preprocessed-file.i
@@ -6,6 +6,10 @@
 # 1 "<built-in>" 2
 # 1 "preprocessed-input.c" 2
 
+/// The main file is preprocessed. We change it to preprocessed-input.c. Since
+/// the content is not available, we don't compute a checksum.
 // RUN: %clang -g -c -S -emit-llvm -o - %s | FileCheck %s
 // CHECK: !DICompileUnit(language: DW_LANG_C{{.*}}, file: ![[FILE:[0-9]+]]
 // CHECK: ![[FILE]] = !DIFile(filename: "/foo/bar/preprocessed-input.c"
+// CHECK-NOT: checksumkind:
+// CHECK-NOT: !DIFile(

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, thanks!

@MaskRay MaskRay merged commit 831484e into llvm:main Dec 12, 2023
@MaskRay MaskRay deleted the clang-debug-file0 branch December 12, 2023 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category debuginfo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang: "-save-temps -g" option leads to "warning: inconsistent use of MD5 checksums"
3 participants