Skip to content
This repository was archived by the owner on Sep 11, 2020. It is now read-only.
This repository was archived by the owner on Sep 11, 2020. It is now read-only.

Tree difference does not work correctly with unnormalized Unicode names #1057

@vmarkovtsev

Description

@vmarkovtsev

Tree difference algorithm does not handle unnormalized Unicode names correctly (Tree listing does it correctly, however). Here is how to reproduce:

git clone git://github.com/dotnet/cli /tmp/cli
go run bug.go /tmp/cli

bug.go:

package main

import (
    "os"
	"strings"

	"gopkg.in/src-d/go-git.v4"
	"gopkg.in/src-d/go-git.v4/plumbing"
	"gopkg.in/src-d/go-git.v4/plumbing/object"
)

func main() {
    r, err := git.PlainOpen(os.Args[1])
    if err != nil {
        panic(err)
    }
	c, err := r.CommitObject(plumbing.NewHash("55c59d621ea22921ecaabd99266d45a7921aab70"))
	if err != nil {
        panic(err)
    }
	t1, err := c.Tree()
	if err != nil {
        panic(err)
    }
	t1, err = t1.Tree("TestAssets/TestProjects")
	if err != nil {
        panic(err)
    }
    files := map[string]bool{}
	t1.Files().ForEach(func(f *object.File) error {
		files[f.Name] = true
		return nil
	})
	
	
	c, err = r.CommitObject(plumbing.NewHash("6fcbefa4f7a0016a68d3cda52779298a5cd20837"))
	if err != nil {
        panic(err)
    }
	t2, err := c.Tree()
	if err != nil {
        panic(err)
    }
	t2, err = t2.Tree("TestAssets/TestProjects")
	if err != nil {
        panic(err)
    }
	
	diff, err := object.DiffTree(t1, t2)
	for _, d := range diff {
		if strings.HasPrefix(d.To.Name, "TestAppWithUnico") &&
			strings.HasSuffix(d.To.Name, "Program.cs") {
			println(d.String())
			println(files[d.To.Name])
		}
	}
}

We see:

<Action: Insert, Path: TestAppWithUnicodéPath/Program.cs>
true

The expected output is empty.

Here is what is happening. 55c59d621ea22921ecaabd99266d45a7921aab70 and 6fcbefa4f7a0016a68d3cda52779298a5cd20837 are two consecutive commits.

cd /tmp/cli
git checkout 55c59d621ea22921ecaabd99266d45a7921aab70
echo TestAssets/TestProjects/TestAppWithUni*

git checkout 6fcbefa4f7a0016a68d3cda52779298a5cd20837
echo TestAssets/TestProjects/TestAppWithUni*

Output:

TestAssets/TestProjects/TestAppWithUnicodéPath

TestAssets/TestProjects/TestAppWithUnicodéPath TestAssets/TestProjects/TestAppWithUnicodéPath

There are two almost identical directories. One is in normalized Unicode, the other is not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions