From 326f11104ec2e4ba7729e7ff3deff424d0eb68d0 Mon Sep 17 00:00:00 2001 From: RAJPUT MIHIKA AJITSINGH <163915078+mihika632@users.noreply.github.com> Date: Mon, 6 Oct 2025 01:19:57 +0530 Subject: [PATCH 1/3] Add SQL Injection guide for beginners This guide provides a comprehensive overview of SQL injection, its risks, prevention techniques, and secure coding practices for developers. --- ebook/en/content/sql_injection.md | 296 ++++++++++++++++++++++++++++++ 1 file changed, 296 insertions(+) create mode 100644 ebook/en/content/sql_injection.md diff --git a/ebook/en/content/sql_injection.md b/ebook/en/content/sql_injection.md new file mode 100644 index 0000000..7c65cc6 --- /dev/null +++ b/ebook/en/content/sql_injection.md @@ -0,0 +1,296 @@ +# SQL Injection โ€” A Beginner's Understanding + +> **Purpose**: This guide provides beginners with a clear, safe, and practical overview of what SQL injection (SQLi) is, how it happens at a conceptual level, and โ€” most importantly โ€” how to prevent and detect it. It focuses on defensive, responsible practices without providing exploit instructions. + +## ๐Ÿ“‹ Table of Contents + +1. [What is SQL Injection?](#1-what-is-sql-injection) +2. [Why it matters (risks & impact)](#2-why-it-matters-risks--impact) +3. [High-level types of SQLi (conceptual)](#3-high-level-types-of-sqli-conceptual) +4. [How SQLi happens โ€” a conceptual example](#4-how-sqli-happens--a-conceptual-example) +5. [Vulnerable vs. secure code (safe examples)](#5-vulnerable-vs-secure-code-safe-examples) +6. [Top prevention techniques (detailed)](#6-top-prevention-techniques-practical-prioritized) +7. [Detection, logging, and monitoring](#7-detection-logging-and-monitoring) +8. [Secure coding checklist for developers](#8-secure-coding-checklist-practical-copyable) +9. [Responsible testing and legal/ethical notes](#9-responsible-testing-and-legalethical-notes) +10. [Learning resources & next steps](#10-learning-resources--next-steps) + +--- + +## 1. What is SQL Injection? + +**SQL Injection** is a class of security vulnerability where untrusted input affects the structure or execution of SQL queries. When input is improperly handled, an attacker could cause the database to run unintended commands or reveal data. + +> **The core problem**: Untrusted data mixed with query structure. + +--- + +## 2. Why it matters (risks & impact) + +SQL injection can lead to serious consequences: + +- ๐Ÿ”“ **Data leakage**: Personal or sensitive data exposure +- ๐Ÿ—‘๏ธ **Data modification or deletion**: Unauthorized changes to critical information +- ๐Ÿšช **Authentication bypass**: Circumventing login mechanisms +- ๐Ÿ’ป **Complete system compromise**: If database credentials or OS-level commands are exposed +- โš–๏ธ **Legal, financial, and reputational consequences**: Regulatory fines and brand damage + +--- + +## 3. High-level types of SQLi (conceptual) + +Understanding these categories helps defenders know what to look for in logs and responses: + +### ๐ŸŽฏ **In-band SQLi** +The same channel is used to both inject and get results (e.g., error messages). + +### ๐Ÿ” **Blind SQLi** +Attacker learns true/false or timing information when responses don't directly reveal data. + +### ๐Ÿ“ก **Out-of-band SQLi** +Uses alternate channels to retrieve results (rare and environment-dependent). + +--- + +## 4. How SQLi happens โ€” a conceptual example + +Imagine an application that builds a query by concatenating user input directly into SQL: + +```sql +-- โŒ Conceptual (dangerous) pattern: +query = "SELECT * FROM users WHERE username = '" + userInput + "'"; +``` + +If `userInput` contains special characters or SQL fragments, it can change the intended query structure. + +> **Key idea**: The vulnerability isn't the database โ€” it's how the application constructs the query using untrusted input. + +--- + +## 5. Vulnerable vs. secure code (safe examples) + +### โŒ **Vulnerable pattern** (conceptual demonstration) + +```javascript +// Do NOT use: concatenating raw user input into SQL +username = getRequestParam("username") +sql = "SELECT * FROM users WHERE username = '" + username + "'" +db.execute(sql) +``` + +*This shows the pattern that causes risk, but does not provide exploit strings.* + +### โœ… **Secure pattern** โ€” Parameterized queries / prepared statements + +Use parameters so input never changes query structure: + +```javascript +// โœ… Preferred: parameterized query +username = getRequestParam("username") +sql = "SELECT * FROM users WHERE username = ?" +db.execute(sql, [username]) // driver sends query & data separately +``` + +### ๐Ÿ›ก๏ธ **Additional secure practice** โ€” ORM / query builders + +Using a well-maintained ORM or query builder (correctly) typically avoids manual string building, but still requires care (e.g., avoid raw SQL APIs unless necessary). + +**Examples:** + +```python +# Python with SQLAlchemy ORM +user = session.query(User).filter(User.username == username).first() +``` + +```javascript +// Node.js with Sequelize ORM +const user = await User.findOne({ where: { username: username } }); +``` + +```java +// Java with JPA/Hibernate +User user = entityManager.createQuery( + "SELECT u FROM User u WHERE u.username = :username", User.class) + .setParameter("username", username) + .getSingleResult(); +``` + +--- + +## 6. Top prevention techniques (practical, prioritized) + +### ๐Ÿฅ‡ **Primary Defenses** + +1. **Use parameterized queries / prepared statements everywhere** + - Most effective single defense + - Separates code from data completely + +2. **Use stored procedures with parameters** + - Only if they use parameters and don't build SQL strings unsafely + - Centralized business logic + +3. **Use least privilege database accounts** + - App account should only have needed permissions + - Separate accounts for different application functions + +### ๐Ÿฅˆ **Secondary Defenses** + +4. **Input validation and normalization** + - Enforce allowed formats (e.g., email regex, numeric ranges) + - Use as safety net, not primary defense + +5. **Proper escaping** + - Only when parameterization is not possible + - Must use the DB driver's proper escaping functions + +6. **Use allowlist (whitelisting) for critical parameters** + - Especially for sort column names, table names + - Validate against known good values + +### ๐Ÿฅ‰ **Supporting Defenses** + +7. **Use ORMs/query builders carefully** + - Avoid raw query concatenation + - Understand ORM's SQL generation + +8. **Centralize DB access logic** + - Reduce code duplication and mistakes + - Easier to audit and maintain + +9. **Web Application Firewalls (WAFs)** + - Additional layer, not replacement for secure code + - Can detect and block common patterns + +10. **Keep everything updated** + - Patch drivers, ORMs, and frameworks regularly + - Monitor security advisories + +--- + +## 7. Detection, logging, and monitoring + +### ๐Ÿ“Š **Logging Strategy** + +- โœ… Log failed queries and unusual DB errors +- โš ๏ธ Be careful not to log sensitive data +- ๐Ÿ“ˆ Monitor query patterns and latency spikes +- ๐Ÿšจ Look for unusual application errors or stack traces + +### ๐Ÿ” **Monitoring Indicators** + +- Anomalies that can indicate probing attempts +- High volume of failed parameterized queries +- Unexpected error types or frequencies +- Unusual database access patterns + +### ๐Ÿ› ๏ธ **Testing Tools** + +- **SAST (Static Application Security Testing)**: During development +- **DAST (Dynamic Application Security Testing)**: In testing environments +- **Database activity monitoring**: Production environments + +--- + +## 8. Secure coding checklist (practical, copyable) + +### ๐Ÿ” **Development Checklist** + +- [ ] All DB queries use parameterized queries or prepared statements +- [ ] No raw concatenation of user input into SQL strings +- [ ] Inputs validated with allowlists for format and length +- [ ] App DB user has minimal privileges (no DROP/ALTER unless required) +- [ ] Error messages returned to users are generic +- [ ] Detailed errors logged only server-side +- [ ] Sensitive data in logs is redacted + +### ๐Ÿš€ **Deployment Checklist** + +- [ ] Regular dependency and runtime patching is scheduled +- [ ] Security tests run in CI (SAST/DAST) before production deploy +- [ ] Production WAF or equivalent protections in place +- [ ] Incident response plan includes data breach steps + +--- + +## 9. Responsible testing and legal/ethical notes + +### โš–๏ธ **Legal Considerations** + +> **โš ๏ธ Warning**: Never test for SQLi on systems you don't own or have explicit permission to test. Unauthorized security testing is illegal in many jurisdictions. + +### ๐Ÿงช **Safe Testing Practices** + +- โœ… Use staging/test environments that mirror production +- โœ… Follow responsible disclosure process +- โœ… Coordinate with application owners +- โœ… Use automated tools for defensive remediation only + +### ๐ŸŽฏ **Recommended Testing Environments** + +- Local VMs designed for learning +- Intentionally vulnerable applications you own +- Dedicated security testing labs +- Authorized penetration testing environments + +--- + +## 10. Learning resources & next steps + +### ๐Ÿ“š **Essential Reading** + +- ๐ŸŒ **[OWASP Top Ten](https://owasp.org/Top10/)** โ€” Read the Injection category and remediation guidance +- ๐Ÿ“‹ **[OWASP SQL Injection Cheat Sheet](https://cheatsheetseries.owasp.org/cheatsheets/SQL_Injection_Prevention_Cheat_Sheet.html)** โ€” Defensive best practices +- ๐Ÿ”ง **Secure coding guides** for your language/framework + +### ๐Ÿ’ป **Platform-Specific Resources** + +#### Python +- [Python DB-API 2.0](https://peps.python.org/pep-0249/) - Parameterized queries +- [SQLAlchemy Security](https://docs.sqlalchemy.org/en/14/core/security.html) + +#### Java +- [JDBC PreparedStatement](https://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html) +- [Hibernate Security Guide](https://hibernate.org/orm/documentation/5.6/) + +#### Node.js +- [node-postgres Parameterized Queries](https://node-postgres.com/features/queries) +- [Sequelize Security](https://sequelize.org/docs/v6/core-concepts/paranoid/) + +#### PHP +- [PDO Prepared Statements](https://www.php.net/manual/en/pdo.prepared-statements.php) +- [mysqli Prepared Statements](https://www.php.net/manual/en/mysqli.quickstart.prepared-statements.php) + +### ๐Ÿƒโ€โ™‚๏ธ **Next Steps** + +1. **Learn to configure secure DB drivers** and connection libraries +2. **Practice secure coding** in controlled lab environments +3. **Set up automated security testing** in your CI/CD pipeline +4. **Implement monitoring and alerting** for your applications +5. **Create incident response procedures** for security events + +--- + +## ๐ŸŽฏ Quick Takeaway + +> **Don't concatenate. Parameterize.** That's the shortest path to preventing SQL injection. + +Combine parameterization with: +- ๐Ÿ”’ Least privilege +- โœ… Input allowlists +- ๐Ÿ“Š Proper logging +- ๐Ÿงช Regular testing + +...to build a secure application. + +--- + +## ๐Ÿ“ž Additional Resources + +- **OWASP**: [https://owasp.org](https://owasp.org) +- **SANS Secure Coding**: [https://www.sans.org/white-papers/](https://www.sans.org/white-papers/) +- **NIST Cybersecurity Framework**: [https://www.nist.gov/cyberframework](https://www.nist.gov/cyberframework) + +--- + +*Last updated: October 2025* +*This guide focuses on defensive practices and responsible security education.* From 4d58fdc903630a48bc7d27dc54ab527c853e8f5f Mon Sep 17 00:00:00 2001 From: mihika632 <163915078+mihika632@users.noreply.github.com> Date: Sat, 11 Oct 2025 23:09:03 +0530 Subject: [PATCH 2/3] added dml --- ebook/en/content/DML.md | 459 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 459 insertions(+) create mode 100644 ebook/en/content/DML.md diff --git a/ebook/en/content/DML.md b/ebook/en/content/DML.md new file mode 100644 index 0000000..79b0687 --- /dev/null +++ b/ebook/en/content/DML.md @@ -0,0 +1,459 @@ +--- +title: Data Manipulation Language (DML) +summary: Comprehensive guide to SQL Data Manipulation Language โ€” SELECT, INSERT, UPDATE, DELETE, MERGE, transactions, performance, and best practices. +--- + +# Data Manipulation Language (DML) + +Data Manipulation Language (DML) in SQL is the set of commands used to retrieve, insert, update, and delete data stored in relational databases. This chapter covers core DML statements, advanced usage patterns, transaction control, performance considerations, and common pitfalls. + +## Table of Contents + +- [Overview](#overview) +- [SELECT โ€” Querying Data](#select---querying-data) + - [Basic SELECT](#basic-select) + - [Filtering: WHERE](#filtering-where) + - [Projection: Choosing columns](#projection-choosing-columns) + - [Sorting: ORDER BY](#sorting-order-by) + - [Limiting & Paging: LIMIT / OFFSET](#limiting--paging-limit--offset) + - [Aggregate Functions: GROUP BY, HAVING](#aggregate-functions-group-by-having) + - [Joins: INNER, LEFT, RIGHT, FULL, CROSS](#joins-inner-left-right-full-cross) + - [Subqueries & CTEs](#subqueries--ctes) + - [Window Functions](#window-functions) +- [INSERT โ€” Adding Data](#insert---adding-data) + - [Single-row INSERT](#single-row-insert) + - [Multi-row INSERT](#multi-row-insert) + - [INSERT ... SELECT](#insert--select) + - [Upsert / INSERT ... ON CONFLICT / REPLACE / MERGE](#upsert--insert--on-conflict--replace--merge) +- [UPDATE โ€” Modifying Data](#update---modifying-data) + - [Basic UPDATE](#basic-update) + - [UPDATE with JOINs](#update-with-joins) + - [UPDATE with CTEs](#update-with-ctes) + - [Performance & Safety Tips](#performance--safety-tips) +- [DELETE โ€” Removing Data](#delete---removing-data) + - [DELETE with WHERE](#delete-with-where) + - [DELETE with JOINs / Subqueries](#delete-with-joins--subqueries) + - [TRUNCATE vs DELETE](#truncate-vs-delete) +- [MERGE โ€” Conditional INSERT/UPDATE (SQL:2003)](#merge---conditional-insertupdate-sql2003) +- [Transactions & Concurrency](#transactions--concurrency) + - [BEGIN / COMMIT / ROLLBACK](#begin--commit--rollback) + - [Isolation Levels](#isolation-levels) + - [Locking Considerations](#locking-considerations) +- [Performance Considerations](#performance-considerations) + - [Indexes and DML](#indexes-and-dml) + - [Bulk operations & batching](#bulk-operations--batching) + - [Explain / Execution Plans](#explain--execution-plans) +- [Security & Best Practices](#security--best-practices) +- [Common Pitfalls](#common-pitfalls) +- [Examples & Exercises](#examples--exercises) +- [References & Further Reading](#references--further-reading) + +--- + +## Overview + +DML is concerned with the data inside tables. The most common operations are: + +- `SELECT` โ€” Read/query data +- `INSERT` โ€” Add new rows +- `UPDATE` โ€” Modify existing rows +- `DELETE` โ€” Remove rows +- `MERGE` โ€” Conditional insert or update (SQL standard; syntax varies) + +Unlike Data Definition Language (DDL) commands (CREATE, ALTER, DROP), DML modifies the data and is typically covered by transaction semantics. + +--- + +## SELECT โ€” Querying Data + +### Basic SELECT + +```sql +SELECT column1, column2 +FROM table_name; +``` + +Select all columns: + +```sql +SELECT * +FROM employees; +``` + +Avoid `SELECT *` in production queries โ€” it fetches unnecessary columns and can hurt performance. + +### Filtering: WHERE + +```sql +SELECT id, name, salary +FROM employees +WHERE department = 'sales' AND salary > 50000; +``` + +Operators: `=`, `!=`/`<>`, `<`, `>`, `<=`, `>=`, `BETWEEN`, `IN`, `LIKE`, `IS NULL`. + +### Projection: Choosing columns + +Select only necessary columns to reduce I/O and network overhead. + +### Sorting: ORDER BY + +```sql +SELECT id, name, salary +FROM employees +ORDER BY salary DESC, name ASC; +``` + +### Limiting & Paging: LIMIT / OFFSET + +Postgres/MySQL: + +```sql +SELECT * FROM employees +ORDER BY id +LIMIT 10 OFFSET 20; -- page 3 if page size is 10 +``` + +SQL Server uses `OFFSET ... FETCH` or `TOP` for similar behavior. + +### Aggregate Functions: GROUP BY, HAVING + +```sql +SELECT department, COUNT(*) AS cnt, AVG(salary) AS avg_sal +FROM employees +GROUP BY department +HAVING AVG(salary) > 50000; +``` + +`HAVING` is a filter applied after grouping โ€” use it to filter aggregates. + +### Joins: INNER, LEFT, RIGHT, FULL, CROSS + +Inner join: + +```sql +SELECT e.name, d.name AS dept +FROM employees e +JOIN departments d ON e.dept_id = d.id; +``` + +Left (outer) join: + +```sql +SELECT e.name, d.name AS dept +FROM employees e +LEFT JOIN departments d ON e.dept_id = d.id; +``` + +Right/Full joins behave similarly โ€” full outer joins are not supported in all RDBMS. + +Cross join (cartesian product): + +```sql +SELECT * FROM colors CROSS JOIN sizes; +``` + +### Subqueries & CTEs + +Subquery example: + +```sql +SELECT name FROM employees +WHERE dept_id = (SELECT id FROM departments WHERE name = 'Sales'); +``` + +CTE (Common Table Expression): + +```sql +WITH top_sales AS ( + SELECT * FROM orders WHERE amount > 10000 +) +SELECT * FROM top_sales WHERE created_at > '2024-01-01'; +``` + +CTEs improve readability and can be recursive. + +### Window Functions + +Window functions provide row-wise aggregates without collapsing rows: + +```sql +SELECT id, name, salary, + RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank +FROM employees; +``` + +--- + +## INSERT โ€” Adding Data + +### Single-row INSERT + +```sql +INSERT INTO employees (name, dept_id, salary) +VALUES ('Alice', 3, 70000); +``` + +### Multi-row INSERT + +```sql +INSERT INTO employees (name, dept_id, salary) VALUES +('Bob', 2, 50000), +('Carol', 1, 60000); +``` + +### INSERT ... SELECT + +Copy rows between tables: + +```sql +INSERT INTO employees_archive (id, name, dept_id, salary) +SELECT id, name, dept_id, salary +FROM employees +WHERE created_at < '2023-01-01'; +``` + +### Upsert / INSERT ... ON CONFLICT / REPLACE / MERGE + +Postgres `ON CONFLICT`: + +```sql +INSERT INTO products (id, name, qty) +VALUES (1, 'widget', 10) +ON CONFLICT (id) DO UPDATE SET qty = products.qty + EXCLUDED.qty; +``` + +MySQL `ON DUPLICATE KEY UPDATE`: + +```sql +INSERT INTO products (id, name, qty) +VALUES (1, 'widget', 10) +ON DUPLICATE KEY UPDATE qty = qty + VALUES(qty); +``` + +SQL Server / standard `MERGE`: + +```sql +MERGE INTO products AS tgt +USING (VALUES (1, 'widget', 10)) AS src(id, name, qty) +ON tgt.id = src.id +WHEN MATCHED THEN UPDATE SET qty = tgt.qty + src.qty +WHEN NOT MATCHED THEN INSERT (id, name, qty) VALUES (src.id, src.name, src.qty); +``` + +Use upserts carefully โ€” consider concurrency and race conditions. + +--- + +## UPDATE โ€” Modifying Data + +### Basic UPDATE + +```sql +UPDATE employees +SET salary = salary * 1.05 +WHERE performance_rating = 'A'; +``` + +**Warning**: Without a `WHERE` clause, `UPDATE` modifies every row. + +### UPDATE with JOINs + +Postgres syntax using `FROM`: + +```sql +UPDATE employees e +SET dept_name = d.name +FROM departments d +WHERE e.dept_id = d.id; +``` + +MySQL supports multi-table `UPDATE` with `JOIN`: + +```sql +UPDATE employees e +JOIN departments d ON e.dept_id = d.id +SET e.dept_name = d.name; +``` + +### UPDATE with CTEs + +You can compute complex sets with CTEs and update them: + +```sql +WITH mgr_salaries AS ( + SELECT id, salary FROM employees WHERE is_manager = true +) +UPDATE employees e +SET salary = salary * 1.10 +FROM mgr_salaries m +WHERE e.id = m.id; +``` + +### Performance & Safety Tips + +- Always run a `SELECT` with the same `WHERE` before executing `UPDATE`. +- Use transactions for multi-step updates. +- Consider row-limiting updates for very large tables (batching). + +--- + +## DELETE โ€” Removing Data + +### DELETE with WHERE + +```sql +DELETE FROM employees WHERE created_at < '2019-01-01'; +``` + +### DELETE with JOINs / Subqueries + +```sql +DELETE FROM employees e +USING departments d +WHERE e.dept_id = d.id AND d.name = 'Obsolete'; +``` + +Or with subquery: + +```sql +DELETE FROM employees WHERE dept_id IN (SELECT id FROM departments WHERE is_active = false); +``` + +### TRUNCATE vs DELETE + +- `DELETE` removes rows and can be rolled back in a transaction (depending on DB). It activates triggers. +- `TRUNCATE` is faster, resets storage, and usually cannot be rolled back in some systems; it bypasses triggers. + +Use `TRUNCATE` when you need an efficient table wipe and don't need per-row triggers or transactional rollback. + +--- + +## MERGE โ€” Conditional INSERT/UPDATE (SQL:2003) + +`MERGE` combines `INSERT` and `UPDATE` logic based on matching keys. It's handy for data warehousing and ETL tasks. Syntax varies by vendor. + +Basic pattern: + +```sql +MERGE INTO target AS t +USING source AS s +ON (t.key = s.key) +WHEN MATCHED THEN UPDATE SET ... +WHEN NOT MATCHED THEN INSERT (...) +; +``` + +Be cautious: `MERGE` has had implementation bugs in some RDBMS historically; test carefully. + +--- + +## Transactions & Concurrency + +### BEGIN / COMMIT / ROLLBACK + +```sql +BEGIN; +UPDATE accounts SET balance = balance - 100 WHERE id = 1; +UPDATE accounts SET balance = balance + 100 WHERE id = 2; +COMMIT; -- or ROLLBACK; +``` + +Transactions ensure atomicity: either all changes in the transaction commit or none do. + +### Isolation Levels + +Common levels: +- `READ UNCOMMITTED` โ€” dirty reads allowed +- `READ COMMITTED` โ€” avoid dirty reads +- `REPEATABLE READ` โ€” stable reads within transaction +- `SERIALIZABLE` โ€” highest isolation, may reduce concurrency + +Choose appropriate level balancing correctness and performance. + +### Locking Considerations + +- Row-level locks are common; table locks may be taken for some operations. +- Long-running transactions increase lock contention and risk deadlocks. +- Use short transactions and batch updates to minimize locks. + +--- + +## Performance Considerations + +### Indexes and DML + +- Indexes speed up `SELECT` but slow `INSERT`/`UPDATE`/`DELETE` because indexes must be maintained. +- For heavy write workloads, consider minimizing indexes or using bulk-loading techniques. + +### Bulk operations & batching + +- Insert/update in batches (e.g., 1,000 rows at a time) to reduce transaction overhead. +- Use native bulk loaders (e.g., `COPY` in Postgres, `LOAD DATA INFILE` in MySQL). + +### Explain / Execution Plans + +- Use `EXPLAIN`/`EXPLAIN ANALYZE` to investigate query performance. +- Check for sequential scans, index usage, join algorithms, and estimated vs actual rows. + +--- + +## Security & Best Practices + +- Use parameterized queries / prepared statements to prevent SQL injection. +- Principle of least privilege for DB users. +- Avoid constructing SQL with string concatenation of user input. +- Sanitize and validate inputs; however, do not rely on input validation alone for security. + +--- + +## Common Pitfalls + +- Running `UPDATE`/`DELETE` without `WHERE`. +- Using `SELECT *` in production queries. +- Neglecting transaction boundaries. +- Over-indexing write-heavy tables. +- Misusing `MERGE` without understanding concurrency semantics. + +--- + +## Examples & Exercises + +### Example 1: Find top 3 earning employees per department (Postgres `DISTINCT ON` vs Window) + +```sql +SELECT DISTINCT ON (department) department, id, name, salary +FROM employees +ORDER BY department, salary DESC +LIMIT 3; -- not correct; use window functions instead + +-- Correct using window functions +SELECT department, id, name, salary +FROM ( + SELECT department, id, name, salary, + ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rn + FROM employees +) t +WHERE rn <= 3; +``` + +### Exercise 1: + +Write an `UPDATE` statement to increase salaries by 10% for employees with `performance_rating = 'A'` but only for departments where average salary is below 70000. + +### Exercise 2: + +Using `MERGE` or `ON CONFLICT`, write a statement to update product stock when shipments arrive; insert new product row if not present. + +--- + +## References & Further Reading + +- PostgreSQL Documentation โ€” DML: https://www.postgresql.org/docs/current/dml.html +- MySQL Documentation โ€” DML: https://dev.mysql.com/doc/refman/en/data-manipulation.html +- SQL Standard (ISO/IEC 9075) โ€” MERGE +- Use `EXPLAIN` and `EXPLAIN ANALYZE` for query tuning + +--- + +*Last updated: October 2025* From 8eb391f2d1d13e8735eab2fc4f2c27f6ec9e63d5 Mon Sep 17 00:00:00 2001 From: mihika632 <163915078+mihika632@users.noreply.github.com> Date: Sat, 11 Oct 2025 23:18:10 +0530 Subject: [PATCH 3/3] added sql null functions guide for beginners --- ebook/en/content/sql-null-functions.md | 231 +++++++++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 ebook/en/content/sql-null-functions.md diff --git a/ebook/en/content/sql-null-functions.md b/ebook/en/content/sql-null-functions.md new file mode 100644 index 0000000..ea2476d --- /dev/null +++ b/ebook/en/content/sql-null-functions.md @@ -0,0 +1,231 @@ +## SQL NULLs and related functions + +This document explains SQL NULL semantics and the most useful functions/operators for handling NULL values across common SQL dialects (PostgreSQL, MySQL, SQL Server, Oracle). It includes clear examples, common pitfalls, performance notes, and practice exercises. + +### What is NULL? + +NULL represents the absence of a value โ€” unknown or not applicable. It is not the same as an empty string `''`, zero `0`, or a boolean `FALSE`. + +- NULL means "unknown"; comparisons with NULL using normal comparison operators return `NULL` (treated as false in WHERE clauses). +- SQL uses three-valued logic: `TRUE`, `FALSE`, and `UNKNOWN` (the latter arises from expressions involving NULL). + +Example: + +```sql +SELECT NULL = NULL AS equal, NULL IS NULL AS is_null; +-- equal => NULL (UNKNOWN), is_null => true +``` + +### Checking for NULL: `IS NULL` / `IS NOT NULL` + +Use `IS NULL` and `IS NOT NULL` to test for NULL explicitly. Do not use `= NULL` or `<> NULL`. + +```sql +SELECT * FROM users WHERE last_login IS NULL; -- users who never logged in +SELECT * FROM users WHERE email IS NOT NULL; -- users who provided email +``` + +### `COALESCE` โ€” choose the first non-NULL value + +`COALESCE(expr1, expr2, ..., exprN)` returns the first non-NULL expression from the argument list. + +Use cases: +- Provide default values when columns may be NULL. +- Replace chained `CASE` expressions. + +Examples: + +```sql +SELECT COALESCE(phone_mobile, phone_home, phone_work, 'no-phone') AS contact +FROM contacts; + +-- Using COALESCE in aggregation contexts +SELECT department_id, COALESCE(SUM(bonus), 0) AS total_bonus +FROM employees +GROUP BY department_id; +``` + +Notes: +- `COALESCE` stops evaluating once it finds a non-NULL argument. Arguments may have side-effects in some DBMS (e.g., volatile functions). + +### `NULLIF` โ€” convert an equality into NULL + +`NULLIF(expr1, expr2)` returns `NULL` if `expr1 = expr2`, otherwise returns `expr1`. + +Use cases: +- Convert sentinel values to NULL (e.g., `-1` or empty string indicating missing value). + +Example: + +```sql +SELECT NULLIF(salary, 0) AS salary_or_null FROM payroll; +-- If salary equals 0, returns NULL; otherwise returns salary. +``` + +### `IS DISTINCT FROM` / `IS NOT DISTINCT FROM` (null-safe equality) + +Some SQL dialects (Postgres, newer SQLite, and others) support `IS DISTINCT FROM` and `IS NOT DISTINCT FROM`. These perform null-safe comparisons where NULL is treated as a comparable value. + +```sql +-- null-safe equality: true when both are null or when they are equal +SELECT a, b FROM t WHERE a IS NOT DISTINCT FROM b; +``` + +Note: MySQL has the `<=>` operator for null-safe equality (e.g., `a <=> b`). SQL Server does not implement `IS (NOT) DISTINCT FROM` natively. + +### Dialect differences and functions + +- PostgreSQL: full support for `COALESCE`, `NULLIF`, `IS DISTINCT FROM`, `IS NOT DISTINCT FROM`. +- MySQL: supports `COALESCE`, `NULLIF`, and the null-safe equality operator `<=>` (e.g., `a <=> b`). +- SQL Server (T-SQL): supports `COALESCE`, `NULLIF`, and `ISNULL(expr, replacement)` (T-SQL specific). `ISNULL` returns the replacement value if the expression is NULL. Note: `ISNULL` is different from `COALESCE` in evaluation and result type behavior. +- Oracle: supports `COALESCE` and `NULLIF`, plus `NVL(expr, replacement)` (similar to `ISNULL`). `NVL` returns `replacement` when `expr` is NULL. + +Comparison notes between `COALESCE` and dialect-specific helpers: + +- `COALESCE` is SQL-standard and preferable for portability. +- `ISNULL` (SQL Server) and `NVL` (Oracle) are vendor-specific and sometimes slightly different in type coercion rules and evaluation order. + +### Aggregates and NULLs + +- Aggregation functions (`SUM`, `AVG`, `COUNT`, `MAX`, `MIN`) typically ignore NULL values (except `COUNT(*)` which counts rows). + +Examples: + +```sql +SELECT SUM(amount) FROM payments; -- ignores NULL amounts +SELECT COUNT(amount) FROM payments; -- counts only rows where amount is not NULL +SELECT COUNT(*) FROM payments; -- counts all rows +``` + +If you want to treat NULL as zero in aggregation, combine `COALESCE` or `SUM(COALESCE(x,0))`. + +### NULLs in ORDER BY and GROUP BY + +- Ordering: database systems differ in whether NULLs sort first or last by default (Postgres places NULLs first for ASC earlier versions? โ€” prefer `NULLS FIRST` / `NULLS LAST` explicitly). + +```sql +SELECT id, score FROM leaderboard ORDER BY score DESC NULLS LAST; +``` + +- GROUP BY: GROUP BY groups NULLs together; `GROUP BY` treats NULLs as equal for grouping purposes. + +### Handling NULLs in JOINs + +- LEFT/RIGHT/INNER JOIN behavior is not changed by NULLs directly, but matching on columns that can be NULL requires null-safe comparisons or `IS NULL` checks. + +Example problem: find pairs where a.value = b.value but treat NULL=NULL as a match. + +Postgres solution using `IS NOT DISTINCT FROM`: + +```sql +SELECT a.id, b.id +FROM a JOIN b ON a.value IS NOT DISTINCT FROM b.value; +``` + +MySQL solution using `<=>`: + +```sql +SELECT a.id, b.id +FROM a JOIN b ON a.value <=> b.value; +``` + +SQL Server workaround using `CASE`: + +```sql +SELECT a.id, b.id +FROM a JOIN b + ON (a.value = b.value) OR (a.value IS NULL AND b.value IS NULL); +``` + +### Null propagation and expressions + +- Arithmetic or string expressions where one operand is NULL yield NULL (e.g., `1 + NULL` โ†’ NULL). +- Use `COALESCE` or explicit `IS NULL` checks to provide defaults. + +Example: + +```sql +SELECT price * COALESCE(quantity, 0) AS total +FROM sales; +``` + +### Performance considerations + +- Index usage: testing for `IS NULL` can use an index in many DBMSs, but complex expressions with `COALESCE` or function-wrapped columns may prevent index usage (depends on DBMS and expression). Use generated/functional indexes if needed. +- Avoid wrapping indexed columns in functions in WHERE/JOIN predicates if you want to keep index seeks. +- Statistics and histograms might treat NULLs specially; be aware of the distribution of NULLs when planning queries. + +### Best practices + +1. Prefer `IS NULL` / `IS NOT NULL` for explicit null checks. +2. Use `COALESCE` for portable defaulting; use `ISNULL` / `NVL` only when targeting specific DBs. +3. Avoid relying on `= NULL` or `<> NULL` (they don't behave as expected). +4. When comparing nullable columns for equality in JOINs, prefer null-safe comparisons (`IS NOT DISTINCT FROM` or `<=>`), or add explicit `IS NULL` checks. +5. Be explicit about NULL ordering with `NULLS FIRST` / `NULLS LAST` if sorting matters. + +### Examples and recipes + +1) Replace missing text with a placeholder: + +```sql +SELECT id, COALESCE(notes, '[no notes]') AS notes +FROM tasks; +``` + +2) Safe division (avoid division by NULL or zero): + +```sql +SELECT a, b, + CASE + WHEN COALESCE(b, 0) = 0 THEN NULL + ELSE a / b + END AS ratio +FROM t; +``` + +3) Deduplicate rows treating NULLs as equal using window function (Postgres example): + +```sql +SELECT DISTINCT ON (col1, col2) * +FROM items +ORDER BY col1, col2, created_at DESC; +-- Note: DISTINCT/ GROUP BY treat NULLs as equal for grouping +``` + +4) Convert sentinel values to NULL then aggregate: + +```sql +SELECT customer_id, SUM(COALESCE(NULLIF(points, -1), 0)) AS points_sum +FROM loyalty +GROUP BY customer_id; +``` + +### Common pitfalls & gotchas + +- Using `=` or `<>` with NULL yields `UNKNOWN` (treated as false in WHERE): `WHERE x = NULL` won't match anything. +- `COALESCE` returns the first non-NULL value and uses the type precedence rules of the DBMS to determine the result type. +- Vendor functions (`ISNULL`, `NVL`) may have different type coercion rules or evaluation semantics. + +### Exercises + +1. Given a `products` table with columns `(id, price, discount)`, write a query that returns the effective price using `discount` when present, otherwise `price`. If both `price` and `discount` are NULL, return 0. + +2. You have two tables `a(id, value)` and `b(id, value)`. Write a query that returns pairs of ids where the values are equal or both NULL. + +3. Given a `sales` table `(id, amount, discount)` where `discount` is NULL when no discount applies, write a query to compute total revenue treating NULL discount as 0 and excluding rows where `amount` is NULL. + +4. (Advanced) Explain why `COALESCE(f(), g())` might call only `f()` in some DBMS and how to ensure both are safe to call. + +### References + +- SQL standard: COALESCE, NULLIF +- PostgreSQL docs: NULL handling and `IS DISTINCT FROM` +- MySQL docs: `<=>` operator and NULL handling +- SQL Server docs: `ISNULL`, `NULL` semantics + +--- + +If you'd like, I can also: +- add dialect-specific examples in separate sections for PostgreSQL, MySQL, SQL Server, and Oracle; +- add runnable sample datasets and a small test script (SQL) you can run against SQLite/Postgres to try these examples; +- create a short quiz with answers.