Merge pull request #31018 from gottesmm/pr-d6426d88ab60dd2b653c227ec20517db745d6131

shahmishal · web-flow · commit 8c42386429d8 · 2020-04-14T13:27:26.000-07:00
[doc] Describe how to use git-bisect in the presence of branch forwarding/feature branches.
diff --git a/docs/DebuggingTheCompiler.rst b/docs/DebuggingTheCompiler.rst
@@ -694,3 +694,65 @@ Multiple Logs at a Time
 Note, you can also turn on more than one log at a time as well, e.x.::
 
     (lldb) log enable -f /tmp/lldb-types-log.txt lldb types expression
+
+Using git-bisect in the presence of branch forwarding/feature branches
+======================================================================
+
+``git-bisect`` is a useful tool for finding where a regression was
+introduced. Sadly ``git-bisect`` does not handle long lived branches
+and will in fact choose commits from upstream branches that may be
+missing important content from the downstream branch. As an example,
+consider a situation where one has the following straw man commit flow
+graph::
+
+    github/master -> github/tensorflow
+
+In this case if one attempts to use ``git-bisect`` on
+github/tensorflow, ``git-bisect`` will sometimes choose commits from
+github/master resulting in one being unable to compile/test specific
+tensorflow code that has not been upstreamed yet. Even worse, what if
+we are trying to bisect in between two that were branched from
+github/tensorflow and have had subsequent commits cherry-picked on
+top. Without any loss of generality, lets call those two tags
+``tag-tensorflow-bad`` and ``tag-tensorflow-good``. Since both of
+these tags have had commits cherry-picked on top, they are technically
+not even on the github/tensorflow branch, but rather in a certain
+sense are a tag of a feature branch from master/tensorflow. So,
+``git-bisect`` doesn't even have a clear history to bisect on in
+multiple ways.
+
+With those constraints in mind, we can bisect! We just need to be
+careful how we do it. Lets assume that we have a test script called
+``test.sh`` that indicates error by the error code. With that in hand,
+we need to compute the least common ancestor of the good/bad
+commits. This is traditionally called the "merge base" of the
+commits. We can compute this as so::
+
+    TAG_MERGE_BASE=$(git merge-base tags/tag-tensorflow-bad tags/tag-tensorflow-good)
+
+Given that both tags were taken from the feature branch, the reader
+can prove to themselves that this commit is guaranteed to be on
+``github/tensorflow`` and not ``github/master`` since all commits from
+``github/master`` are forwarded using git merges.
+
+Then lets assume that we checked out ``$TAG_MERGE_BASE`` and then ran
+``test.sh`` and did not hit any error. Ok, we can not bisect. Sadly,
+as mentioned above if we run git-bisect in between ``$TAG_MERGE_BASE``
+and ``tags/tag-tensorflow-bad``, ``git-bisect`` will sometimes choose
+commits from ``github/master`` which would cause ``test.sh`` to fail
+if we are testing tensorflow specific code! To work around this
+problem, we need to start our bisect and then tell ``git-bisect`` to
+ignore those commits by using the skip sub command::
+
+    git bisect start tags/tag-tensorflow-bad $TAG_MERGE_BASE
+    for rev in $(git rev-list $TAG_MERGE_BASE..tags/tag-tensorflow-bad --merges --first-parent); do
+        git rev-list $rev^2 --not $rev^
+    done | xargs git bisect skip
+
+Once this has been done, one uses ``git-bisect`` normally. One thing
+to be aware of is that ``git-bisect`` will return a good/bad commits
+on the feature branch and if one of those commits is a merge from the
+upstream branch, one will need to analyze the range of commits from
+upstream for the bad commit afterwards. The commit range in the merge
+should be relatively small though compared with the large git history
+one just bisected.