Picocsv is an unusual CSV library designed to be embedded in other libraries.
While it can be used directly, it's main purpose is to be the core foundation of those other libraries.
For a more user-friendly CSV library, you should have a look at the fast and well-documented FastCSV library.
👍 Key points:
- lightweight library with no dependency (~25KB)
- very fast (cf. benchmark) and efficient (no heap memory allocation)
- designed to be embedded into other libraries as an external dependency or as a single-file source
- has a module-info that makes it compatible with JPMS
- compatible with GraalVM Native Image (genuine Java, no reflection, no bytecode manipulation)
- can be easily shaded
- Java 8 minimum requirement
🚀 Features:
- reads/writes CSV from/to character streams
- provides a minimalist null-free low-level API
- does not interpret content
- does not correct invalid files
- follows the RFC4180 specification
- supports custom line separator, field delimiter, quoting character and comment character
- supports custom quoting strategy
- supports unicode characters
Important
Note that the Csv.Format#acceptMissingField
option must be set to false
to closely follow the RFC4180 specification.
The default value is currently true
but will be reversed in the next major release.
picocsv provides a low-level API to read and write CSV files from/to character streams.
This API follows the try-with-resources statement
and closes the underlying character stream after use.
The reading is done by the Csv.Reader
class and has the following characteristics:
- it is instantiated by the
Csv.Reader.of(Csv.Format, Csv.ReaderOptions, java.io.Reader)
factory method - its options are defined by the
Csv.ReaderOptions
class
Typical reader instantiation and usage:
try (java.io.Reader chars = ...) {
try (Csv.Reader reader = Csv.Reader.of(Csv.Format.DEFAULT, Csv.ReaderOptions.DEFAULT, chars)) {
...
}
}
Basic reading 1️⃣ of all fields 2️⃣ skipping comments 3️⃣:
while (reader.readLine()) { // 1️⃣
if (!reader.isComment()) { // 3️⃣
while (reader.readField()) { // 2️⃣
CharSequence field = reader;
...
}
}
}
Configuring reading options:
Csv.ReaderOptions strict = Csv.ReaderOptions.builder().lenientSeparator(false).build();
The writing is done by the Csv.Writer
class and has the following characteristics:
- it is instantiated by the
Csv.Writer.of(Csv.Format, Csv.WriterOptions, java.io.Writer)
factory method - its options are defined by the
Csv.WriterOptions
class
Typical writer instantiation and usage:
try (java.io.Writer chars = ...) {
try (Csv.Writer writer = Csv.Writer.of(Csv.Format.DEFAULT, Csv.WriterOptions.DEFAULT, chars)) {
...
}
}
Basic writing 1️⃣ of some fields 2️⃣ and comments 3️⃣:
writer.writeComment("Some comment"); // 3️⃣
writer.writeField("Some field"); // 2️⃣
writer.writeEndOfLine(); // 1️⃣
Configuring writing options:
Csv.WriterOptions customOptions = Csv.WriterOptions.builder().maxCharsPerField(1024).build();
picocsv provides a null-free API that accepts null parameters and returns non-null values.
writer.writeComment(null); // same as `csv.writeComment("")`
writer.writeField(null); // same as `csv.writeField("")`
Custom formats are defined by the Csv.Format
object:
Option | Description | Default Value |
---|---|---|
#separator |
Line separator | \r\n |
#delimiter |
Field delimiter | , |
#quote |
Quoting character | " |
#comment |
Comment character | # |
Csv.Format tsv = Csv.Format.builder().delimiter('\t').build();
Csv.Format embedded = Csv.Format.builder().delimiter('=').separator(",").build();
picocsv only supports java.io.Reader
/java.io.Writer
as input/output for performance reasons.
However, it is still possible to use Readable
/Appendable
by wrapping them in adapters.
See Cookbook#asCharReader(Readable)
and Cookbook#asCharWriter(Appendable)
.
Comments can be disabled by setting the Csv.Format#comment
option to the null character \0
.
Csv.Format noComment = Csv.Format.builder().comment('\0').build();
Note
Note that this might lead to problems since binary data is allowed in RFC-4180-bis. It will be fixed in a future release.
Comments can be skipped by using the Csv.Reader#isComment()
method.
while (reader.readLine()) {
if (!reader.isComment()) {
while (reader.readField()) { ... }
}
}
See Cookbook#skipComments(Csv.Reader)
.
Empty lines are valid lines represented by a single empty field in RFC-4180.
However, it is still possible to skip them by using the Csv.Format#acceptMissingField
option.
Csv.Format format = Csv.Format.builder().acceptMissingField(true).build();
try (Csv.Reader reader = ...) {
while (reader.readLine()) {
if (!reader.readField()) {
continue; // 💡 line without field => empty line
}
do { ... } while (reader.readField());
}
}
Fields can be skipped by reading them without using their value.
The underlying implementation does not allocate heap memory to parse fields
and provides access to those fields through a CharSequence
interface.
Therefore, the string value creation is delayed until it is actually needed, reducing memory usage and garbage collection.
try (Csv.Reader reader = ...) {
while (reader.readLine()) {
reader.readField(); // 💡 read field but do not use it => skip it
while (!reader.readField()) {
String field = reader.toString(); // use the field value
}
}
}
See Cookbook#skipFields(Csv.Reader, int)
.
Maven setup:
<dependency>
<groupId>com.github.nbbrd.picocsv</groupId>
<artifactId>picocsv</artifactId>
<version>LATEST_VERSION</version>
</dependency>
This project is written in Java and uses Apache Maven as a build tool.
It requires Java 8 as minimum version and all its dependencies are hosted on Maven Central.
The code can be build using any IDE or by just type-in the following commands in a terminal:
git clone https://github.com/nbbrd/picocsv.git
cd picocsv
mvn clean install
Any contribution is welcome and should be done through pull requests and/or issues.
The code of this project is licensed under the European Union Public Licence (EUPL).