Skip to content

Conversation

@llvm-beanz
Copy link
Contributor

@llvm-beanz llvm-beanz commented Oct 7, 2016

This code was added in the mega-commit 6670bb7 with the commit message "Rewrite the CMake build system", so I'm a little bit light on context here. The comment in the code was:

# Construct a unique name for the custom target.
# Use a hash so that the file name does not push the OS limits for filename
# length.

This seems bunk to me. Since the string being hashed is a filename it seems to me that we don't need a hash to stay under the OS filename size limits. Additionally since output files must be unique using the filename as the unique target name seems like a good idea to me. The only issue here is that custom target names can't contain '/'s. To workaround that we replace the '/' character with '-'.

This patch has the added benefit of having the full filename encoded into the target names, so if you need to debug the build dependency graph you can much more easily walk back from the generated build files to the place in CMake where it was generated.

This code was added in the mega-commit 6670bb7 with the commit message "Rewrite the CMake build system", so I'm a little bit light on context here. The comment in the code was:

# Construct a unique name for the custom target.
# Use a hash so that the file name does not push the OS limits for filename
# length.

This seems bunk to me. Since the string being hashed is a filename it seems to me that we don't need a hash to stay under the OS filename size limits. Additionally since output files must be unique using the filename as the unique target name seems like a good idea to me. The only issue here is that custom target names can't contain '/'s. To workaround that we replace the '/' character with '-'.

This patch has the added benefit of having the full filename encoded into the target names, so if you need to debug the build dependency graph you can much more easily walk back from the generated build files to the place in CMake where it was generated.
@llvm-beanz
Copy link
Contributor Author

@swift-ci please smoke test

@llvm-beanz
Copy link
Contributor Author

@swift-ci please smoke test Linux platform

@erg
Copy link
Contributor

erg commented Oct 7, 2016

Files named a/b-c and a-b/c will have the same representation a-b-c. This seems like it's asking for bugs somewhere down the line if we had a unique representation before?

@llvm-beanz
Copy link
Contributor Author

@erg we have no situations where that is the case today, and there is always the workaround of specifying a target name manually. This is really just the fallback case for when the caller doesn't provide a target name.

Longer term, I think we should actually force all callers to construct unique and sensible target names manually to prevent overlap and make them debuggable.

@erg
Copy link
Contributor

erg commented Oct 7, 2016

LGTM!

@llvm-beanz llvm-beanz merged commit 370697f into swiftlang:master Oct 7, 2016
@jrose-apple
Copy link
Contributor

I assume this is because CMake will output files of the form $TARGET-PostBuild-RelWithDebInfo.sh or similar. Just because the base name fits in a file doesn't mean the fully-specified name will.

…actually, it's worse than that: you're turning a path into a path component. That can definitely hit filename limits without too much trouble.

@llvm-beanz
Copy link
Contributor Author

@jrose-apple the Xcode generator is the only place that would be an issue. Those script files aren't created with other generators. This patch actually causes other issues with the Xcode generator, and I'll have a fix shortly.

@jrose-apple
Copy link
Contributor

Those particular ones aren't, but I have no confidence that CMake won't decide to generate files based on target names at some point, and I don't want to fall over because of it. I don't see any reason to actually change the behavior that's working.

@llvm-beanz
Copy link
Contributor Author

Not being able to debug the build system because the custom target names are unintelligible is a problem. I originally wrote this patch to find the dependency issues that were causing swift no-change rebuilds to execute hundreds of build steps because I was having too much trouble tracking it down through the mess of hash-named targets.

The latest PR removes common path prefixes, which actually makes the paths pretty short in practice. CMake might decide to generate files based on target names for more things in the future, but I think the risk of the target names we're generating now being larger than OS file component limits is pretty minimal, and the benefit of making the system more easily debuggable is worth it.

For context, the longest target name I see that was generated with this patch is 104 characters. OS X and Windows both set a 255 character limit on filename components, so we're still 151 characters short of having any problems (note this is an in-tree build where the path components are 12 characters larger than build-script builds).

If we encounter problems with this in the future I think the fix is pretty simple, just specify target names explicitly instead of allowing add_custom_command_target to add them for us.

@jrose-apple
Copy link
Contributor

It's never our machines I'm worried about. Buildbot paths have gotten well over 255 characters before, and I can certainly imagine someone's local build inside some container user having similar issues. But removing common path prefixes should take care of all that, so I feel a lot better.

@llvm-beanz
Copy link
Contributor Author

Cool.

In case I wasn't clear, 255 is the limit on components, not full filenames. Windows allows up to 32,767 characters in a full path on NTFS, and HFS+ has no limit on full path, just component length (most other *nix file systems have similar limits to HFS+).

@gottesmm
Copy link
Contributor

gottesmm commented Oct 9, 2016

@llvm-beanz I see what @erg is saying here. I would be fine with ignoring the a/b-c and a-b/c issue if there was an easy way to detect that we hit the problem. From what I can tell instead this would just be a landmine bug.

Question, wouldn't the following scheme work:

  1. Escape all '-' in all components by placing a '-' in front of it. Then we know that real dashes are '--'.
  2. Escape all '_' by placing a '_' in front of it. Then all real '_' are '__'.
  3. Then replace all '/' with '-_'.

You could use different characters than '_' or '-' if you wanted to. NOTE I just made this up on the fly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants