Skip to content

nbbrd/picocsv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

picocsv - unusual CSV library for Java

Download Changes Reproducible Builds

Picocsv is an unusual CSV library designed to be embedded in other libraries.
While it can be used directly, it's main purpose is to be the core foundation of those other libraries.
For a more user-friendly CSV library, you should have a look at the fast and well-documented FastCSV library.

👍 Key points:

  • lightweight library with no dependency (~25KB)
  • very fast (cf. benchmark) and efficient (no heap memory allocation)
  • designed to be embedded into other libraries as an external dependency or as a single-file source
  • has a module-info that makes it compatible with JPMS
  • compatible with GraalVM Native Image (genuine Java, no reflection, no bytecode manipulation)
  • can be easily shaded
  • Java 8 minimum requirement

🚀 Features:

Important

Note that the Csv.Format#acceptMissingField option must be set to false to closely follow the RFC4180 specification. The default value is currently true but will be reversed in the next major release.

Features

Read/Write

picocsv provides a low-level API to read and write CSV files from/to character streams.
This API follows the try-with-resources statement and closes the underlying character stream after use.

Reading character streams

The reading is done by the Csv.Reader class and has the following characteristics:

  • it is instantiated by the Csv.Reader.of(Csv.Format, Csv.ReaderOptions, java.io.Reader) factory method
  • its options are defined by the Csv.ReaderOptions class

Typical reader instantiation and usage:

try (java.io.Reader chars = ...) {
  try (Csv.Reader reader = Csv.Reader.of(Csv.Format.DEFAULT, Csv.ReaderOptions.DEFAULT, chars)) {
    ...
  }
}

Basic reading 1️⃣ of all fields 2️⃣ skipping comments 3️⃣:

while (reader.readLine()) {      // 1️⃣
  if (!reader.isComment()) {     // 3️⃣
    while (reader.readField()) { // 2️⃣
      CharSequence field = reader;
      ...
    }
  }
}

Configuring reading options:

Csv.ReaderOptions strict = Csv.ReaderOptions.builder().lenientSeparator(false).build();

Writing character streams

The writing is done by the Csv.Writer class and has the following characteristics:

  • it is instantiated by the Csv.Writer.of(Csv.Format, Csv.WriterOptions, java.io.Writer) factory method
  • its options are defined by the Csv.WriterOptions class

Typical writer instantiation and usage:

try (java.io.Writer chars = ...) {
  try (Csv.Writer writer = Csv.Writer.of(Csv.Format.DEFAULT, Csv.WriterOptions.DEFAULT, chars)) {
    ...
  }
}

Basic writing 1️⃣ of some fields 2️⃣ and comments 3️⃣:

writer.writeComment("Some comment"); // 3️⃣
writer.writeField("Some field");     // 2️⃣
writer.writeEndOfLine();             // 1️⃣

Configuring writing options:

Csv.WriterOptions customOptions = Csv.WriterOptions.builder().maxCharsPerField(1024).build();

Null-free API

picocsv provides a null-free API that accepts null parameters and returns non-null values.

writer.writeComment(null); // same as `csv.writeComment("")`
writer.writeField(null); // same as `csv.writeField("")`

Custom formats

Custom formats are defined by the Csv.Format object:

Option Description Default Value
#separator Line separator \r\n
#delimiter Field delimiter ,
#quote Quoting character "
#comment Comment character #
Csv.Format tsv = Csv.Format.builder().delimiter('\t').build();
Csv.Format embedded = Csv.Format.builder().delimiter('=').separator(",").build();

Cookbook

Readable/Appendable

picocsv only supports java.io.Reader/java.io.Writer as input/output for performance reasons. However, it is still possible to use Readable/Appendable by wrapping them in adapters.

See Cookbook#asCharReader(Readable) and Cookbook#asCharWriter(Appendable).

Disabling comments

Comments can be disabled by setting the Csv.Format#comment option to the null character \0.

Csv.Format noComment = Csv.Format.builder().comment('\0').build();

Note

Note that this might lead to problems since binary data is allowed in RFC-4180-bis. It will be fixed in a future release.

Skipping comments

Comments can be skipped by using the Csv.Reader#isComment() method.

while (reader.readLine()) {
    if (!reader.isComment()) {
        while (reader.readField()) { ... }
    }
}

See Cookbook#skipComments(Csv.Reader).

Skipping empty lines

Empty lines are valid lines represented by a single empty field in RFC-4180.
However, it is still possible to skip them by using the Csv.Format#acceptMissingField option.

Csv.Format format = Csv.Format.builder().acceptMissingField(true).build();
try (Csv.Reader reader = ...) {
    while (reader.readLine()) {
        if (!reader.readField()) {
            continue; // 💡 line without field => empty line
        }
        do { ... } while (reader.readField());
    }
}

Skipping fields

Fields can be skipped by reading them without using their value. The underlying implementation does not allocate heap memory to parse fields and provides access to those fields through a CharSequence interface. Therefore, the string value creation is delayed until it is actually needed, reducing memory usage and garbage collection.

try (Csv.Reader reader = ...) {
    while (reader.readLine()) {
        reader.readField(); // 💡 read field but do not use it => skip it
        while (!reader.readField()) {
          String field = reader.toString(); // use the field value
        }
    }
}

See Cookbook#skipFields(Csv.Reader, int).

Setup

Maven setup:

<dependency>
    <groupId>com.github.nbbrd.picocsv</groupId>
    <artifactId>picocsv</artifactId>
    <version>LATEST_VERSION</version>
</dependency>

Developing

This project is written in Java and uses Apache Maven as a build tool.
It requires Java 8 as minimum version and all its dependencies are hosted on Maven Central.

The code can be build using any IDE or by just type-in the following commands in a terminal:

git clone https://github.com/nbbrd/picocsv.git
cd picocsv
mvn clean install

Contributing

Any contribution is welcome and should be done through pull requests and/or issues.

Licensing

The code of this project is licensed under the European Union Public Licence (EUPL).