Last updated on

SQL

SQL 是 Structured Query Language 的简称

💾 Relational Database Management Systems (RDBMS) Basics

Relational Database Management Systems (RDBMS) are a primary method for storing application data, providing a structure for persistent storage on disk to ensure data integrity even if the computer crashes (unlike data stored in RAM).

The main motivation for using RDBMS is to organize very large amounts of data efficiently for reading and writing.

🌳 B+ Tree Data Structure

RDBMS typically uses a B+ Tree data structure for efficient indexing and storage:

  • M-way Tree: Unlike a binary tree where each node has two children, a B+ Tree is an M-way tree, where each node can have $M$ children. This reduces the height of the tree, which minimizes the number of disk reads/writes needed to find data.
  • Keys and Data:
    • Non-Leaf Nodes (Internal Nodes) store keys used only for navigation to find the correct data path.
    • Leaf Nodes store all the actual data records.
  • Sequential Access: All leaf nodes are linked together in a sorted (sequential) fashion. This is a key feature, allowing for highly efficient range queries (retrieving a range of values) in addition to point lookups.

🔑 Indexes

An Index is a key value chosen to sort the data within the B+ Tree. The purpose of an index is to enable finding and reading data as quickly as possible. For example, in a phone book table, the name column would be the logical index for searching alphabetically.


🏛️ RDBMS Core Concepts (Tables and Constraints)

Relational databases organize data into tables, which represent a collection of related data and adhere to a strict structure defined by a schema.

📝 Table Structure and Schema

  • Schema: Defines the structure of the table, specifying the name and data type (e.g., INT, VARCHAR(100)) for every field (column).
  • Records/Rows: Each instance of data that conforms to the schema is called a row or record.

🔗 Relationships and Constraints

RDBMS excels at enforcing data integrity and relationships between tables using various constraints:

  • Primary Key: A column (or set of columns) that uniquely identifies every single row in a table. It is mandatory for every table.
  • Foreign Key Constraint: Establishes a relationship where a key value in one table must correspond to an existing key value in another table (e.g., a phone_number in the homes table must exist in the people table). This enforces referential integrity.
  • Other Constraints: Rules like NOT NULL (a column must always have a value) or UNIQUE (values in a column must be unique across all rows).

🤝 Joins

The relationships between tables allow for Joins, which combine rows from two or more tables based on a related column (like matching a foreign key). This enables powerful querying, such as finding a person’s name by looking up their phone number in a separate homes table.


🛡️ Trade-offs: The ACID Properties

The system design trade-offs of relational databases are best summarized by the ACID properties, which are the fundamental guarantee of data integrity in RDBMS.

PropertyFull NameDescriptionKey Concept
AAtomicityA database transaction (a collection of one or more operations) is treated as a single, indivisible unit: All or Nothing. If any part of the transaction fails, the entire transaction is rolled back (none of it is applied).Prevents partial updates (e.g., money leaving one account but never arriving at the destination).
CConsistencyEnsures that any transaction brings the database from one valid state to another. All operations must obey the predefined constraints (like NOT NULL or Foreign Key rules).Guarantees that the data always follows the structural and relational rules set by the schema.
IIsolationMultiple concurrent transactions must appear to the user as if they were executed serially (one after the other). The intermediate state of one transaction is invisible to others.Prevents concurrency problems like Dirty Reads (reading data that hasn’t been committed yet), ensuring correct results when multiple users are modifying data.
DDurabilityOnce a transaction has been successfully committed, the changes are permanent and must persist even in the event of a system failure (e.g., power loss).This is why RDBMS must store data on disk, as data in RAM is volatile.

🔄 Transactions

A Transaction is a sequence of SQL operations that begins with a BEGIN statement and concludes with a COMMIT statement (for success) or a ROLLBACK (for failure). Only committed transactions are made permanent (Durable).

  • Trade-Off: Maintaining ACID compliance—especially Isolation and Consistency—is expensive in terms of computation and overhead, sometimes requiring sacrificing a degree of speed to ensure data integrity. This is the core trade-off when compared to non-ACID (NoSQL) databases.