Contribute, refactoring suggestions, modernize C#, DataSet, DataLoader

Hi @NiklasGustafsson, I would like to contribute to TensorSharp as we are looking at using it to replace our CNTK usage in our full end-to-end machine learning pipelines written in C#. I have been looking at this example repo in that regard, which is a great starting place. I've mainly work with image models so I've been looking at the CIFAR10 example. I understand that these examples have been created quickly and that they are bare-bones, I would like to improve them. :)

For example, I have a few issues with the `Reader`s like `CIFAR10Reader` and how they both randomize data by pre-defining randomized batches, which is not normally how you would do this, you create unique random batches for each epoch. Similarly, an epoch would usually (nothing is standardized here and you really can do anything you'd like so this is just IMHO) be defined by iterating our the samples of the dataset **once**, not by adding transforms after and hence multiplying by that like:
```csharp
        public IEnumerable<(Tensor, Tensor)> Data()
        {
            for (var i = 0; i < data.Count; i++) {
                yield return (data[i], labels[i]);

                foreach (var tfrm in _transforms) {
                    yield return (tfrm.forward(data[i]), labels[i]);
                }
            }
        }
```
Also you wouldn't "transform" or augment the data if it is test data, of course you can then just not set the transforms.

Anyway, I was thinking my first contribution could be to refactor the readers to and implement concepts similar to pytorch `DataSet` and `DataLoader`. I have worked with this API but am not an expert, nor I am necessarily a fan of the python APIs, but it seems you'd like to have TorchSharp be similar to pytorch, so basing it on that makes sense. Would that be of any interest?

Before doing this I would very much like to migrate this example repo to .NET 6 and C# 10 too and follow standard C# code guidelines and use modern language features, to really make the examples shine with regards to C#. Since performance is my passion I'd also like the examples to at least minimally try to be efficient about what happens, even in cases where it does not matter so much. 

Just an example below, I would replace the below with a proper Fischer-Yates shuffle, that is easy to implement.
```
Enumerable.Range(0, count).OrderBy(c => rnd.Next()).ToArray();
```
Reproducibility is important too, so all random stuff should be seeded.

Sorry, I am sure you know all this, but I wanted to at least ask whether such changes are of interest first? If you guys agree with them?

To recap I propose:
 * Migrate to .NET 6 and C# 10
 * Refactor readers based on a DataSet and DataLoader concept (rough draft)
    * Address minor various issues as part of this

And we can take it from there.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Contribute, refactoring suggestions, modernize C#, DataSet, DataLoader #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Contribute, refactoring suggestions, modernize C#, DataSet, DataLoader #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions