diff --git a/.claude/claude.md b/.claude/claude.md new file mode 100644 index 000000000..ec78701a6 --- /dev/null +++ b/.claude/claude.md @@ -0,0 +1,61 @@ +## SQL + +CRITICAL: SQL correctness is paramount in dbplyr. Before implementing SQL for any backend, you MUST use the **sql-research** skill to research syntax and behavior. Only implement after completing research and documentation. + +## R package development + +### Key commands + +``` +# To run code +Rscript -e "devtools::load_all(); code" + +# To run all tests +Rscript -e "devtools::test()" + +# To run tests for R/{name.R} +Rscript -e "devtools::test(filter = '{name}', reporter = 'llm')" + +# To document the package +Rscript -e "devtools::document()" + +# To check pkgdown documentation +Rscript -e "pkgdown::check_pkgdown()" + +# To format code +air format . +``` + +### Coding + +* Always run `air format .` after generating code +* Use the base pipe operator (`|>`) not the magrittr pipe (`%>%`) + +### Testing + +- Tests for `R/{name}.R` go in `tests/testthat/test-{name}.R`. +- All new code should have an accompanying test. +- If there are existing tests, place new tests next to similar existing tests. +- Strive to keep your tests minimal with few comments. + +### Documentation + +- Every user-facing function should be exported and have roxygen2 documentation. +- Wrap roxygen comments at 80 characters. +- Internal functions should not have roxygen documentation. +- Whenever you add a new documentation topic, also add the topic to `_pkgdown.yml`. +- Use `pkgdown::check_pkgdown()` to check that all topics are included in the reference index. + +### Writing + +- Use sentence case for headings. + +### Proofreading + +If the user asks you to proofread a file, act as an expert proofreader and editor with a deep understanding of clear, engaging, and well-structured writing. + +Work paragraph by paragraph, always starting by making a TODO list that includes individual items for each top-level heading. + +Fix spelling, grammar, and other minor problems without asking the user. Label any unclear, confusing, or ambiguous sentences with a FIXME comment. + +Only report what you have changed. diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 000000000..1caff0c58 --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,15 @@ +{ + "$schema": "https://json.schemastore.org/claude-code-settings.json", + "permissions": { + "allow": [ + "Bash(find:*)", + "Bash(Rscript:*)", + "Bash(rm:*)", + "Bash(air format:*)", + "Edit(/**)", + "WebSearch", + "WebFetch" + ], + "deny": [] + } +} diff --git a/.claude/skills/create-skill/SKILL.md b/.claude/skills/create-skill/SKILL.md new file mode 100644 index 000000000..1c99ccc95 --- /dev/null +++ b/.claude/skills/create-skill/SKILL.md @@ -0,0 +1,291 @@ +--- +name: create-skill +description: Guide for creating new Claude Code skills. Use when you need to create a new skill to package expertise or workflow into a reusable capability that Claude can automatically invoke. +--- + +# Create Skill + +Use this skill when creating new Claude Code skills that package expertise, workflows, or domain knowledge into reusable capabilities. + +## Overview + +Skills are autonomous capabilities that Claude Code can invoke automatically based on the user's request. Each skill consists of a SKILL.md file with YAML frontmatter and markdown instructions, plus optional supporting files. + +## Skill structure + +### Directory layout + +``` +.claude/skills/ +└── your-skill-name/ + ├── SKILL.md # Required: Main skill definition + ├── reference.md # Optional: Additional documentation + ├── examples.md # Optional: Example usage + ├── templates/ # Optional: Template files + │ └── template.txt + └── scripts/ # Optional: Helper scripts + └── helper.py +``` + +### SKILL.md format + +Every SKILL.md file must have: + +1. **YAML frontmatter** (required) +2. **Markdown content** with instructions + +#### YAML frontmatter + +```yaml +--- +name: skill-name +description: Brief description of what the skill does and when to use it (max 1024 chars) +--- +``` + +**Requirements:** +- `name`: lowercase letters, numbers, and hyphens only (max 64 characters) +- `description`: Clear description for Claude to understand when to invoke this skill + - Should explain WHAT the skill does + - Should explain WHEN to use it + - The description is critical for discoverability + +#### Markdown content + +Structure your skill instructions clearly: + +```markdown +# Skill Name + +Brief introduction of when to use this skill. + +## Overview + +High-level explanation of what this skill does. + +## Workflow + +### Step 1: First step +- Details +- Instructions + +### Step 2: Second step +- More details + +## Key concepts + +Important concepts the user needs to understand. + +## Examples + +Concrete examples showing how to use the skill. + +## Checklist + +- [ ] Verification steps +- [ ] Required actions +``` + +## Creating a new skill + +### 1. Identify the need + +Create a skill when: +- You have a repeating workflow that requires multiple steps +- You want to package domain expertise (like SQL translation, code review patterns) +- You need to ensure consistent processes are followed +- You want to make complex tasks accessible + +### 2. Design the skill + +Plan your skill by answering: +- **What**: What does this skill do? +- **When**: When should Claude invoke it? +- **How**: What are the step-by-step instructions? +- **Why**: What expertise or knowledge does it encode? + +### 3. Write the SKILL.md + +**YAML frontmatter:** +- Choose a descriptive `name` (kebab-case) +- Write a clear `description` that helps Claude understand when to use it +- The description should be specific enough to avoid false positives + +**Content structure:** +- Start with a brief introduction +- Break down the workflow into clear, numbered steps +- Include examples from the actual codebase when relevant +- Provide a checklist for verification +- Keep instructions concise but complete + +### 4. Add supporting files (optional) + +If your skill needs: +- **Templates**: Add to `templates/` directory +- **Scripts**: Add to `scripts/` directory +- **Reference docs**: Create `reference.md` +- **Examples**: Create `examples.md` + +### 5. Test the skill + +Test that Claude invokes your skill by: +1. Asking a question that should trigger the skill +2. Verifying Claude loads and follows the skill instructions +3. Checking that the workflow produces correct results + +Iterate on the description if Claude doesn't invoke it at the right times. + +## Best practices + +### Description writing + +✅ **Good descriptions:** +- "Guide for adding SQL function translations to dbplyr backends. Use when implementing new database-specific R-to-SQL translations." +- "Code review checklist for security vulnerabilities. Use after writing authentication, database, or API code." + +❌ **Bad descriptions:** +- "SQL stuff" (too vague) +- "Use this for everything related to databases" (too broad) + +### Instruction writing + +- **Be specific**: Provide concrete steps, not vague guidance +- **Be concise**: Remove unnecessary words +- **Use examples**: Show, don't just tell +- **Reference real files**: Point to actual codebase examples +- **Include verification**: Add checklists to ensure completeness + +### Naming conventions + +- Use kebab-case for skill names +- Choose names that describe the action or domain +- Examples: `sql-translation`, `create-skill`, `review-security`, `deploy-production` + +## Skill invocation + +### How skills work + +1. **Discovery**: Claude reads the skill's description +2. **Decision**: Claude decides if the skill matches the user's request +3. **Loading**: The SKILL.md file is loaded into the conversation context +4. **Execution**: Claude follows the skill's instructions +5. **Context**: Supporting files are available if referenced + +### Automatic vs manual invocation + +- **Automatic** (preferred): Claude invokes based on description match +- **Manual**: User explicitly requests the skill (not common) + +Most skills should be designed for automatic invocation. + +## Examples + +### Minimal skill + +```yaml +--- +name: format-code +description: Format code using air format. Use after writing or modifying R code files. +--- + +# Format Code + +Run `air format .` to format all R code in the project. + +## Checklist + +- [ ] Run `air format .` +- [ ] Verify no formatting errors +``` + +### Workflow skill + +```yaml +--- +name: add-test +description: Add tests for new R functions. Use when creating new functions in R/ directory. +--- + +# Add Test + +Add tests for new R functions following dbplyr conventions. + +## Workflow + +### 1. Identify test file +- Tests for `R/{name}.R` go in `tests/testthat/test-{name}.R` + +### 2. Write tests +- Place new tests next to similar existing tests +- Keep tests minimal with few comments +- Use `expect_snapshot()` for SQL translation tests + +### 3. Run tests +```bash +Rscript -e "devtools::test(filter = '{name}', reporter = 'llm')" +``` + +## Checklist + +- [ ] Created/updated test file +- [ ] Tests are minimal and focused +- [ ] All tests pass +``` + +## Common patterns + +### Research workflows + +For skills that require research before implementation: +1. Specify search steps with WebSearch +2. Require documentation with citations +3. Only implement after research is complete + +See `sql-translation` skill for an example. + +### Multi-step processes + +For complex workflows: +1. Break into numbered steps +2. Use subsections for each step +3. Include verification at each stage +4. Provide a final checklist + +### Domain expertise + +For packaging specialized knowledge: +1. Explain key concepts upfront +2. Provide reference information +3. Include decision trees or flowcharts +4. Link to external documentation + +## Troubleshooting + +**Skill not being invoked:** +- Check description clarity +- Make description more specific +- Verify YAML syntax + +**Skill invoked at wrong times:** +- Description too broad +- Add specifics about when NOT to use it + +**Instructions unclear:** +- Add more concrete examples +- Break down complex steps +- Reference actual files from the codebase + +## Checklist + +Before completing a new skill: + +- [ ] Created `.claude/skills/{skill-name}/` directory +- [ ] Created `SKILL.md` with YAML frontmatter +- [ ] `name` field uses kebab-case (lowercase, hyphens only) +- [ ] `description` clearly explains what and when (max 1024 chars) +- [ ] Content has clear structure with sections +- [ ] Workflow broken into numbered steps +- [ ] Examples included where helpful +- [ ] Checklist provided for verification +- [ ] Tested that Claude invokes the skill correctly +- [ ] Supporting files added if needed diff --git a/.claude/skills/sql-research/SKILL.md b/.claude/skills/sql-research/SKILL.md new file mode 100644 index 000000000..0ba260cf3 --- /dev/null +++ b/.claude/skills/sql-research/SKILL.md @@ -0,0 +1,187 @@ +--- +name: sql-research +description: Guide for researching SQL syntax and behavior for database backends. Use when you need to research how a SQL function, command, or feature works in a specific database before implementing it in dbplyr. +--- + +# SQL Research Skill + +Use this skill when researching SQL syntax and behavior for any database backend before implementing translations or features in dbplyr. + +## When to use this skill + +- Before implementing any SQL translation for a database backend +- When you need to understand SQL syntax, behavior, or edge cases +- When documenting database-specific SQL features +- Before writing SQL-generating code in dbplyr + +## Critical principle + +**SQL correctness is paramount in dbplyr.** You MUST complete research and documentation BEFORE implementing any SQL-related code. + +## Research workflow + +### 1. Search for official documentation + +Use WebSearch to find official documentation for "{dialect} {function/command}": + +- **Prioritize official database documentation** and reputable sources +- Search for syntax, behavior, edge cases, and version-specific differences +- Look for: + - Function signatures and argument types + - Return types and behavior + - NULL handling + - Type coercion rules + - Limitations or restrictions + - Differences across database versions + +### 2. Document your findings + +Create `research/{dialect}-{command}.md` with the following structure: + +```markdown +# {Dialect} - {Function/Command} + +## Summary +[1-2 sentence summary focused on R-to-SQL translation] + +## Syntax +[Minimal syntax examples from official sources] + +## Key behaviors +[Only behaviors that matter for dbplyr translation] + +## Limitations +[Only restrictions that affect dbplyr usage] + +## Sources +- [Source name](URL) +- [Source name](URL) +``` + +**Documentation guidelines:** +- Keep it minimal and focused on dbplyr use cases +- Include only what's relevant to translating R code to SQL +- ALL citations with URLs are REQUIRED (no exceptions) +- NO comparisons with other databases +- Use concrete examples from official sources +- Keep it as concise as possible + +### 3. Verify your research + +Cross-reference multiple sources when: +- Documentation seems incomplete or unclear +- Behavior differs across database versions +- Edge cases aren't well documented +- Official docs contradict community sources + +**Best practices:** +- Check at least 2-3 authoritative sources +- Note any version-specific differences +- Document uncertainties or ambiguities +- When in doubt, test with actual database if possible + +### 4. Proceed to implementation + +Only after completing research and documentation should you: +- Implement SQL translations +- Write SQL-generating code +- Add tests for the functionality + +## Example research files + +### Minimal example + +```markdown +# PostgreSQL - POSITION + +## Summary +Returns the starting position of a substring within a string (1-indexed). + +## Syntax +POSITION(substring IN string) + +## Key behaviors +- Returns integer position (1-indexed) +- Returns 0 if substring not found +- Case-sensitive by default +- NULL if any argument is NULL + +## Sources +- [PostgreSQL String Functions](https://www.postgresql.org/docs/current/functions-string.html) +``` + +### Complex example + +```markdown +# SQL Server - STRING_AGG + +## Summary +Concatenates string values with a specified separator, optionally ordering results. + +## Syntax +STRING_AGG(expression, separator) [WITHIN GROUP (ORDER BY order_expression)] + +## Key behaviors +- Available in SQL Server 2017+ (compatibility level 110+) +- Returns NULL for empty groups +- Separator must be a literal or variable, not an expression +- WITHIN GROUP clause is optional but commonly used for deterministic ordering +- Maximum output length is 2GB + +## Limitations +- Not available in SQL Server 2016 or earlier +- Cannot use with DISTINCT (use subquery instead) +- Separator cannot be a computed expression + +## Sources +- [SQL Server STRING_AGG](https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql) +- [Compatibility requirements](https://docs.microsoft.com/en-us/sql/t-sql/functions/string-agg-transact-sql#compatibility-support) +``` + +## Common research patterns + +### String functions +- Character encoding and collation +- 0-indexed vs 1-indexed positions +- NULL handling +- Regular expression support and syntax + +### Date/time functions +- Date/time types and precision +- Timezone handling +- Format strings and conventions +- Interval arithmetic + +### Aggregate functions +- NULL handling in aggregates +- Empty group behavior +- DISTINCT support +- Window function variants + +### Window functions +- OVER clause syntax +- Frame specifications (ROWS vs RANGE) +- Partitioning and ordering +- Function-specific restrictions + +## Checklist + +Before completing SQL research: + +- [ ] Searched official database documentation +- [ ] Identified syntax and key behaviors +- [ ] Documented edge cases and limitations +- [ ] Created research file in `research/{dialect}-{function}.md` +- [ ] Included ALL source URLs +- [ ] Kept documentation minimal and focused +- [ ] Cross-referenced multiple sources if needed +- [ ] Ready to proceed with implementation + +## Tips + +- **Start broad, then narrow**: Search for the general command first, then dig into specifics +- **Use official docs first**: Official documentation is most authoritative +- **Check version availability**: Many SQL features are version-specific +- **Note NULL behavior**: NULL handling often differs across databases +- **Document what matters**: Focus on dbplyr translation needs, not general SQL education +- **Keep it short**: Research docs should be scannable reference material, not tutorials diff --git a/.claude/skills/sql-translation/SKILL.md b/.claude/skills/sql-translation/SKILL.md new file mode 100644 index 000000000..b463e03f8 --- /dev/null +++ b/.claude/skills/sql-translation/SKILL.md @@ -0,0 +1,188 @@ +--- +name: sql-translation +description: Guide for adding SQL function translations to dbplyr backends. Use when implementing new database-specific R-to-SQL translations for functions like string manipulation, date/time, aggregates, or window functions. +--- + +# SQL Translation Skill + +Use this skill when adding new SQL function translations for a specific database backend. + +## Overview + +This skill guides you through adding SQL translations to dbplyr. SQL translations convert R functions to their SQL equivalents for different database backends. + +## Workflow + +### 1. Research SQL (CRITICAL - ALWAYS FIRST) + +Before implementing any SQL translation, you MUST research the SQL syntax and behavior using the **sql-research** skill. See that skill for the complete research workflow. + +**Quick summary:** +- Search official documentation for "{dialect} {function}" +- Document findings in `research/{dialect}-{function}.md` +- Include all source URLs +- Only proceed to implementation after completing research + +### 2. Identify the backend file + +SQL translations are defined in backend-specific files: +- `R/backend-sqlite.R` - SQLite +- `R/backend-postgres.R` - PostgreSQL +- `R/backend-mysql.R` - MySQL +- `R/backend-mssql.R` - MS SQL Server +- etc. + +### 3. Add translation + +Translations are added to the `sql_translation()` method for the connection class. This method returns a `sql_variant()` with three components: + +**Scalar translations** (for mutate/filter): +```r +sql_translator(.parent = base_scalar, + # Simple function name mapping + log10 = function(x) sql_expr(log(!!x)), + + # Function with different arguments + round = function(x, digits = 0L) { + digits <- as.integer(digits) + sql_expr(round(((!!x)) %::% numeric, !!digits)) + }, + + # Infix operators + paste0 = sql_paste(""), + + # Complex logic + grepl = function(pattern, x, ignore.case = FALSE) { + if (ignore.case) { + sql_expr(((!!x)) %~*% ((!!pattern))) + } else { + sql_expr(((!!x)) %~% ((!!pattern))) + } + } +) +``` + +**Aggregate translations** (for summarise): +```r +sql_translator(.parent = base_agg, + sd = sql_aggregate("STDEV", "sd"), + median = sql_aggregate("MEDIAN"), + quantile = sql_not_supported("quantile") +) +``` + +**Window translations** (for mutate with groups): +```r +sql_translator(.parent = base_win, + sd = win_aggregate("STDEV"), + median = win_absent("median"), + quantile = sql_not_supported("quantile") +) +``` + +### 4. Helper functions + +Common translation patterns: + +- `sql_expr()` - Build SQL expressions with `!!` for interpolation +- `sql_cast(type)` - Type casting (e.g., `sql_cast("REAL")`) +- `sql_aggregate(sql_name, r_name)` - Simple aggregates +- `sql_paste(sep)` - String concatenation +- `sql_not_supported(name)` - Mark unsupported functions +- `win_aggregate(sql_name)` - Window aggregates +- `win_absent(name)` - Window functions not supported + +### 5. Test the translation + +**Interactive testing:** +```r +Rscript -e "devtools::load_all(); library(dplyr, warn.conflicts = FALSE); + translate_sql(your_function(x), con = simulate_yourdb())" +``` + +**Write tests:** +- Tests for `R/{name}.R` go in `tests/testthat/test-{name}.R` +- Place new tests next to similar existing tests +- Keep tests minimal with few comments + +Example test: +```r +test_that("backend_name translates function_name correctly", { + lf <- lazy_frame(x = 1, con = simulate_backend()) + + expect_snapshot( + lf |> mutate(y = your_function(x)) + ) +}) +``` + +### 6. Document the translation + +**Update backend documentation:** +- Edit the `@description` section in the backend file (e.g., `R/backend-postgres.R`) +- List key translation differences +- Add examples to `@examples` if helpful + +**Example:** +```r +#' Backend: PostgreSQL +#' +#' @description +#' See `vignette("translation-function")` and `vignette("translation-verb")` for +#' details of overall translation technology. Key differences for this backend +#' are: +#' +#' * Many stringr functions +#' * lubridate date-time extraction functions +#' * Your new translation +``` + +### 7. Format and check + +```bash +# Format code +air format . + +# Run relevant tests +Rscript -e "devtools::test(filter = 'backend-name', reporter = 'llm')" + +# Check documentation +Rscript -e "devtools::document()" +``` + +## Key concepts + +**Parent translators:** +- `base_scalar` - Common scalar functions (math, string, logical) +- `base_agg` - Common aggregates (sum, mean, min, max) +- `base_win` - Common window functions + +**SQL expression building:** +- Use `sql_expr()` to build SQL +- Use `!!` to interpolate R variables +- Use `%as%` for AS, `%::%` for ::, etc. + +**Argument handling:** +- Check arguments with `check_bool()`, `check_unsupported_arg()` +- Convert R types appropriately (e.g., `as.integer()`) +- Handle optional arguments with defaults + +## Resources + +See also: +- `vignette("translation-function")` - Function translation overview +- `vignette("new-backend")` - Creating new backends +- Existing backend files for examples + +## Checklist + +Before completing a SQL translation: + +- [ ] Researched SQL syntax in official documentation +- [ ] Created research file in `research/{dialect}-{function}.md` +- [ ] Added translation to appropriate `sql_translator()` section +- [ ] Tested translation interactively +- [ ] Added/updated tests +- [ ] Updated backend documentation +- [ ] Ran `air format .` +- [ ] Verified tests pass diff --git a/research/mysql-infinity.md b/research/mysql-infinity.md new file mode 100644 index 000000000..beb76d727 --- /dev/null +++ b/research/mysql-infinity.md @@ -0,0 +1,67 @@ +# MySQL - Infinity + +## Summary + +MySQL does **not support** IEEE 754 infinity values (positive or negative infinity) for FLOAT or DOUBLE data types. The SQL standard defines infinity and NaN as invalid values for these types, and MySQL adheres to this standard. + +## Behavior + +### Arithmetic operations + +Division operations that would produce infinity in IEEE 754 return NULL instead: + +```sql +SELECT 1/0, 0/0; +-- Result: NULL, NULL (not Inf, NaN) +``` + +### INSERT operations + +Attempting to insert infinity values results in conversion to 0.0 or errors: + +- Inserting string literals 'Inf', '+Inf', '-Inf', or 'Infinity' typically results in 0.0 being stored +- Depending on SQL mode, may generate Error Code 1265: "Data truncated for column" warnings +- In strict SQL modes, the INSERT may fail entirely + +### SELECT operations (platform-dependent) + +When very large values are stored (e.g., `1e+52`), behavior is platform-dependent: + +- Some platforms: SELECT returns `inf` and `-inf` +- Other platforms: SELECT returns `0` and `-0` + +This inconsistency is documented in the MySQL reference manual as platform or implementation dependent behavior. + +## Workarounds + +Since MySQL lacks native infinity support, common workarounds include: + +1. **Large numbers**: Use arbitrarily large values (e.g., `1e308` for positive, `-1e308` for negative) +2. **Language constants**: Use `Double.MAX_VALUE` / `Double.MIN_VALUE` from application code +3. **NULL values**: Represent infinity as NULL with additional flag columns +4. **Separate column**: Add a VARCHAR column to store text representations ('Infinity', '-Infinity') while the numeric column stores NULL +5. **Application layer**: Convert infinity values to/from alternative representations in application code before storage and after retrieval + +## Recommendations + +For applications requiring exact numeric values or special value handling: + +- Use DECIMAL type for exact values (no special values, but no approximation either) +- Handle infinity cases at the application layer before inserting into MySQL +- Document clearly how infinity is represented in your schema + +## Contrast with other databases + +Some database systems do support infinity: + +- **PostgreSQL**: Supports 'Infinity', '-Infinity', and 'NaN' for DOUBLE PRECISION and REAL types +- **Apache Doris**: Supports Inf, -Inf, and NaN conforming to IEEE 754 + +## Sources + +- [MySQL 8.4 Reference Manual: Floating-Point Types](https://dev.mysql.com/doc/refman/8.4/en/floating-point-types.html) +- [MySQL 8.4 Reference Manual: Problems with Floating-Point Values](https://dev.mysql.com/doc/refman/8.4/en/problems-with-float.html) +- [MySQL Bug #80953: Documentation over Infinity vs NaN](https://bugs.mysql.com/bug.php?id=80953) +- [Stack Overflow: MySQL IEEE floating point NaN, PositiveInfinity, NegativeInfinity](https://stackoverflow.com/questions/41936403/mysql-ieee-floating-point-nan-positiveinfinity-negativeinfinity) +- [Database Administrators Stack Exchange: Does MySQL support '+/- infinity' numeric value?](https://dba.stackexchange.com/questions/338224/does-mysql-support-infinity-numeric-value) +- [Stack Overflow: Storing Double.POSITIVE_INFINITY in MySQL](https://stackoverflow.com/questions/5807749/storing-double-positive-infinity-in-mysql-ejb-entity-jboss) diff --git a/research/redshift-is-not-distinct-on.md b/research/redshift-is-not-distinct-on.md new file mode 100644 index 000000000..1b835fdf5 --- /dev/null +++ b/research/redshift-is-not-distinct-on.md @@ -0,0 +1,50 @@ +# Redshift - IS NOT DISTINCT FROM in ON clause + +## Summary + +Redshift does NOT support the `IS NOT DISTINCT FROM` operator in JOIN ON clauses (or anywhere else). This PostgreSQL operator is not available in Redshift, requiring workarounds for null-safe comparisons in joins. + +## Syntax + +NOT SUPPORTED in Redshift: +```sql +-- This does NOT work in Redshift +SELECT * +FROM a +JOIN b ON a.col IS NOT DISTINCT FROM b.col +``` + +WORKAROUND - Explicit NULL check: +```sql +SELECT * +FROM a +JOIN b ON (a.col = b.col) OR (a.col IS NULL AND b.col IS NULL) +``` + +WORKAROUND - Using COALESCE: +```sql +SELECT * +FROM a +JOIN b ON COALESCE(a.col, 'sentinel_value') = COALESCE(b.col, 'sentinel_value') +``` + +## Key behaviors + +- Standard equality (`=`) in Redshift returns NULL when either operand is NULL +- NULL does not equal NULL in standard SQL comparisons +- Redshift does not match rows in joins when join columns contain NULL, even if both sides are NULL +- For multi-column joins, each column needs its own NULL handling + +## Limitations + +- No native null-safe comparison operator +- Must use verbose workarounds that increase query complexity +- COALESCE approach requires choosing a sentinel value that won't occur naturally in the data +- Explicit NULL check pattern becomes unwieldy with multiple join columns + +## Sources + +- [Amazon Redshift: Comparison condition](https://docs.aws.amazon.com/redshift/latest/dg/r_comparison_condition.html) +- [Amazon Redshift: Nulls](https://docs.aws.amazon.com/redshift/latest/dg/r_Nulls.html) +- [PopSQL: How to Compare Two Values When One is Null in Redshift](https://popsql.com/learn-sql/redshift/how-to-compare-two-values-when-one-is-null-in-redshift) +- [Stack Overflow: Redshift left outer join is leaving out nulls](https://stackoverflow.com/questions/28080883/redshift-left-outer-join-is-leaving-out-nulls) diff --git a/research/redshift-to_date.md b/research/redshift-to_date.md new file mode 100644 index 000000000..626682b4f --- /dev/null +++ b/research/redshift-to_date.md @@ -0,0 +1,37 @@ +# Redshift - TO_DATE + +## Summary +Converts a string to DATE type using a format string for dbplyr's date_build translation. + +## Syntax +```sql +TO_DATE(string, format) +TO_DATE(string, format, is_strict) +``` + +## Parameters +- `string`: Character string to convert +- `format`: String literal defining date format (e.g., 'YYYY-MM-DD') +- `is_strict` (optional): Boolean for error handling (default FALSE allows overflow) + +## Key behaviors +- Returns DATE type +- Does not support format string with Q (Quarter number) +- With `is_strict=FALSE` (default), accepts overflow values (e.g., '2001-06-31' becomes '2001-07-01') +- With `is_strict=TRUE`, raises error for out-of-range dates + +## Examples +```sql +-- Basic conversion +to_date('02 Oct 2001', 'DD Mon YYYY') -- Returns 2001-10-02 + +-- Using YYYY-MM-DD format +to_date('2001-10-02', 'YYYY-MM-DD') + +-- String concatenation with CAST and || operator +TO_DATE(CAST(2020 AS TEXT) || '-' || CAST(1 AS TEXT) || '-' || CAST(15 AS TEXT), 'YYYY-MM-DD') +``` + +## Sources +- [TO_DATE function - Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_TO_DATE_function.html) +- [+ (Concatenation) operator - Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/r_DATE-CONCATENATE_function.html)