-
-
Notifications
You must be signed in to change notification settings - Fork 729
fix: fix csv file collections #3513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@Jasonzyt is attempting to deploy a commit to the NuxtLabs Team on Vercel. A member of the Team first needs to authorize it. |
commit: |
|
Thanks for the PR @Jasonzyt people: defineCollection({
type: "data",
source: "org/people.csv",
schema: z.object({
"name": z.string(),
"email": z.string().email(),
}),
}),It is important to create 1 to 1 and predictable mapping between files in content directory and documents in database. Splitting CSV files outside of collection source login, breaks this predictability. |
|
In my case, I only need single file CSV support. My PR can implement single CSV support, but tbh some logic needs to be optimized. |
|
I'll update your PR, |
|
@Jasonzyt Could you check and test the behavior? |
| const { queries, hash } = generateCollectionInsert(collection, parsedContent) | ||
| list.push([key, queries, hash]) | ||
| } | ||
| const { queries, hash } = generateCollectionInsert(collection, parsedContent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iβm not sure why you reverted this.
Since CSV files contain multiple rows, and each row should be treated as an independent ParsedContent, they require a special handling process. Each CSV file should not be processed as a single ParsedContent.
In contrast, formats like Markdown/JSON/YML contain only one ParsedContent per file, so they must be handled differently.
And the facts prove this point: after installing the latest package, errors occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV and other formats must be handled seperately. Maybe we can find a better way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to do this, Content now has defineCSVSource which reads csv file and generate a document for each row
In my case const { data } = await useAsyncData("albumsMeta", () => {
return queryCollection("albumsMeta").order("updated", "DESC").all();
});returns This is exactly the bug mentioned by #3511 |
|
@Jasonzyt could you share your reproduction repository? or provide minimal one? |
|
Here's my repo: https://github.com/Jasonzyt/gallery |
|
@Jasonzyt There was a mistake in a string, should be good now. Try with |
|
I installed Edit: |
|
Please solve this problem urgently. I have projects to submit |
|
@Jasonzyt I checked with your projects and it works as expected with single file sources. try with Note: I had to remove styles and some other deps because I was facing issue with |
|
Okay, I found the problem: export default defineContentConfig({
collections: {
albumsMeta: defineCollection({
type: "data",
+ source: "test.csv",
- source: "test/**.csv",
schema: z.object({
id: z.string(),
name: z.string(),
description: z.string(),
cover: z.string(),
updated: z.string(),
urlFormat: z.string(),
}),
}),
...defineAlbumCollections(),
},
});Now everything works well! @farnabaz Thanks for the review! I think we should complete docs before merging |
|
@farnabaz It seems there's some bug parsing multiple .csv files |
|
@farnabaz Okay now I understand everything. Sry for my misunderstand of your newly updated docs. |
| return new Promise((resolve) => { | ||
| const csvKeys: string[] = [] | ||
| let count = 0 | ||
| createReadStream(join(resolvedSource.cwd, fixed, keys[0]!)) | ||
| .on('data', function (chunk) { | ||
| for (let i = 0; i < chunk.length; i += 1) | ||
| if (chunk[i] == 10) { | ||
| if (count > 0) { // count === 0 is CSV header row and should not be included | ||
| csvKeys.push(`${keys[0]}#${count}`) | ||
| } | ||
| count += 1 | ||
| } | ||
| }) | ||
| .on('end', () => resolve(csvKeys)) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV files without trailing newlines will have their last data row missing from the collection. The getKeys function only generates row keys when it encounters newline characters, so the final row is skipped if the file doesn't end with a newline.
View Details
π Patch Details
diff --git a/src/utils/source.ts b/src/utils/source.ts
index 801f27c0..2fd55da6 100644
--- a/src/utils/source.ts
+++ b/src/utils/source.ts
@@ -134,17 +134,26 @@ export function defineCSVSource(source: CollectionSource): ResolvedCollectionSou
return new Promise((resolve) => {
const csvKeys: string[] = []
let count = 0
+ let lastByteWasNewline = true
createReadStream(join(resolvedSource.cwd, fixed, keys[0]!))
.on('data', function (chunk) {
- for (let i = 0; i < chunk.length; i += 1)
+ for (let i = 0; i < chunk.length; i += 1) {
+ lastByteWasNewline = (chunk[i] == 10)
if (chunk[i] == 10) {
if (count > 0) { // count === 0 is CSV header row and should not be included
csvKeys.push(`${keys[0]}#${count}`)
}
count += 1
}
+ }
+ })
+ .on('end', () => {
+ // If file doesn't end with newline and we have at least one data row, add the last row
+ if (!lastByteWasNewline && count > 0) {
+ csvKeys.push(`${keys[0]}#${count}`)
+ }
+ resolve(csvKeys)
})
- .on('end', () => resolve(csvKeys))
})
},
getItem: async (key) => {
Analysis
CSV files without trailing newlines have their last row missing from collection
What fails: In defineCSVSource() function in src/utils/source.ts, the getKeys implementation only generates row keys when it encounters newline characters. This causes the final data row to be skipped if the CSV file doesn't end with a newline character.
How to reproduce:
# Create a CSV file without a trailing newline
echo -n "name,email\nJohn,[email protected]\nJane,[email protected]" > test.csv
# Use defineCSVSource to read the keys
# Expected: ['test.csv#1', 'test.csv#2']
# Actual: ['test.csv#1'] (Jane's row is missing)Result: The getKeys() function returns only 1 row key instead of 2. When the file has 2 data rows but no trailing newline, only the first row is accessible. The second row is silently dropped from the collection.
Expected behavior: Both rows should be included in the collection regardless of whether the file ends with a newline. CSV files without trailing newlines are common and are valid per RFC 4180 (CSV specification).
Root cause: The algorithm increments a counter each time it encounters a newline (byte value 10), but it only pushes a key when the counter is > 0. If the file doesn't end with a newline, the stream ends without triggering the final key generation for the last row.
Fix: Track whether the last byte processed was a newline character (lastByteWasNewline variable). In the 'end' event handler, if the file doesn't end with a newline and we have at least one data row (count > 0), push an additional key for the final incomplete row.

π Linked issue
Close #3511
β Type of change
π Description
Currently, CSV collections donβt behave as described in the documentation.
Instead of allowing each CSV file to contain multiple entries (as the documentation
states), the current implementation treats each CSV file as a single piece of content.
π Checklist