Skip to content

Conversation

@hmalphettes
Copy link

Setting the mapHeaders function in the options will skip some columns
and edit the printed headers line.

For example:

var writer = csv({
  mapHeaders: function (header) {
    return header === 'skipthisone' ? null
    : header.substring(0,1).toUpperCase() + header.substring(1)
  }
})

This the reverse of csv-parser's mapHeaders:
mafintosh/csv-parser#54

Setting the mapHeaders function in the options will skip some columns
and edit the printed headers line.

For example:
```
var writer = csv({
  mapHeaders: function (header) {
    return header === 'skipthisone' ? null
    : header.substring(0,1).toUpperCase() + header.substring(1)
  }
})
```

This the reverse of csv-parser's `mapHeaders`
@notslang
Copy link
Collaborator

notslang commented Aug 23, 2016

Having a function to edit the headers seems overly complex... you could just pipe the stream of objects into a transform that rewrites the keys or removes keys you don't want before sending them to csv-write-stream.

@hmalphettes
Copy link
Author

hmalphettes commented Aug 24, 2016

@slang800 ok, that works too.

I would argue that no matter what that transform stream will cause some overhead.
And such a transform stream is almost as complex as the csv-write-stream

If one wants to optimise such a transform stream one would need to redo the same type of work than what the csv-write-stream compiler is doing:

  • extract the headers, transform them
  • compile a function that extracts the values from the stream of objects and rows and pushes a transformed stream of rows.

So at this point all that is left is the csv formatting and one would end-up with csv-write-stream

Hence I think there is value in handling this transform in the csv-write-stream itself.

@notslang
Copy link
Collaborator

It's not complex and you shouldn't need to extract the headers or redo any work. It should look something like this:

var map = require('through2')
var transform = map({objectMode: true}, function (obj, enc, cb) {
  var i, key, len, newKey, keys
  delete obj.skipthisone
  keys = Object.keys(obj)
  for (i = 0, len = keys.length; i < len; i++) {
    key = keys[i]
    newKey = key.substring(0, 1).toUpperCase() + key.substring(1)
    obj[newKey] = obj[key]
    delete obj[key]
  }
  cb(null, obj)
})

...Unless I'm totally misunderstanding what you want to do.

@hmalphettes
Copy link
Author

hmalphettes commented Aug 24, 2016

Thanks @slang800

The csv-write-stream could have been written in the same fashion than what you suggest:

      this.push(Object.keys(record).map(k => {
        const value = record[k];
        if (typeof value === 'string') {
          return '"' + value.replace(/"/g, '""') + '"';
        }

        return value;
      }).join() + '\n');

Instead it produces a block of javascript code in the _compile method.

That is benchmarked to be much faster than making V8 discover the object it transforms with Object.keys or any other for-loop on every row.

For example this is what it generates for {hello: "world", foo: "bar", baz: "taco"}:

function toRow(obj) {
var a0 = obj.hello == null ? "" : obj.hello
var a1 = obj.foo == null ? "" : obj.foo
var a2 = obj.baz == null ? "" : obj.baz
var result = (/[,\r\n"]/.test(a0) ? esc(a0+"") : a0)+","+(/[,\r\n"]/.test(a1) ? esc(a1+"") : a1)+","+(/[,\r\n"]/.test(a2) ? esc(a2+"") : a2)
return result +"\n"
}

So I am trying to make sure that transforming the headers and skipping columns wont be a perf bottleneck by incorporating that into the writer.

This PR does that for an extra O(1) and adds zero overhead to the serialisation itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants