Database technologies
Agenda
DBMS vs RDBMS
MySQL: Introduction, Installation.
SQL
CREATE TABLE, MySQL data types
SELECT with LIMIT, ORDER, WHERE, GROUP BY, HAVING
INSERT, UPDATE, DELETE
Joins, Sub-queries
Transaction & Locking
GRANT & REVOKE
MySQL programming (PSM)
Stored procedure
Cursors
Functions
Triggers
DBMS
Any enterprise application need to manage data.
In early days of software development, programmers store data into files and does operation on it. However data is highly application specific.
Even today many software manage their data in custom formats e.g. Tally, Address book, etc.
As data management became more common, DBMS systems were developed to handle the data. This enabled developers to focus on the business logic e.g. FoxPro, DBase, Excel, etc.
At least CRUD (Create, Retrieve, Update and Delete) operations are supported by all databases.
Traditional databases are file based, less secure, single-user, non- distributed, manage less amount of data (MB), complicated relation management, file-locking and need number of lines of code to use in applications.
RDBMS
RDBMS is relational DBMS.
It organizes data into Tables, rows and columns. The tables are related to each other.
RDBMS follow table structure, more secure, multi-user, server-client architecture, server side processing, clustering support, manage huge data (TB), built-in relational capabilities, table-locking or row-locking and can be easily integrated with applications.
e.g. DB2, Oracle, MS-SQL, MySQL, MS-Access, SQLite, …
RDBMS design is based on Codd’s rules developed at IBM (in 1970).
1_rdbms.png
SQL
Clients send SQL queries to RDBMS server and operations are performed accordingly.
Originally it was named as RQBE (Relational Query By Example).
SQL is ANSI standardised in 1987 and then revised multiple times adding new features. Recent revision in 2016.
SQL is case insensitive.
There are five major categories:
DDL: Data Definition Language e.g. CREATE, ALTER, DROP, RENAME.
DML: Data Manipulation Language e.g. INSERT, UPDATE, DELETE.
DQL: Data Query Language e.g. SELECT.
DCL: Data Control Language e.g. CREATE USER, GRANT, REVOKE.
TCL: Transaction Control Language e.g. SAVEPOINT, COMMIT, ROLLBACK.
Table & column names allows alphabets, digits & few special symbols.
If name contains special symbols then it should be back-quotes.
e.g. Tbl1,
T1#
,T2$
etc. Names can be max 30 chars long.
MySQL
Developed by Michael Widenius in 1995. It is named after his daughter name Myia.
Sun Microsystems acquired MySQL in 2008.
Oracle acquired Sun Microsystem in 2010.
MySQL is free and open-source database under GPL. However some enterprise modules are close sourced and available only under commercial version of MySQL.
MariaDB is completely open-source clone of MySQL.
MySQL support multiple database storage and processing engines.
MySQL versions:
< 5.5: MyISAM storage engine
5.5: InnoDb storage engine
5.6: SQL Query optimizer improved, memcached style NoSQL
5.7: Windowing functions, JSON data type added for flexible schema
8.0: CTE, NoSQL document store.
MySQL is database of year 2019 (in database engine ranking).
Getting started
root login can be used to perform CRUD as well as admin operations.
It is recommended to create users for performing non-admin tasks.
mysql> CREATE DATABASE db;
mysql> SHOW DATABASES;
mysql> CREATE USER dbuser@localhost IDENTIFIED BY 'dbpass';
mysql> SELECT user, host FROM mysql.user;
mysql> GRANT ALL PRIVILEGES ON db.* TO dbuser@localhost;
mysql> FLUSH PRIVILEGES;
mysql> EXIT;
terminal> mysql –u dbuser –pdbpass
mysql> SHOW DATABASES;
mysql> SELECT USER(), DATABASE();
mysql> USE db;
mysql> SHOW TABLES;
mysql> CREATE TABLE student(id INT, name VARCHAR(20), marks DOUBLE);
mysql> INSERT INTO student VALUE
mysql> SELECT * FROM student;
Database logical layout
Database/schema is like a namespace/container that stores all db objects related to a project.
It contains tables, constraints, relations, stored procedures, functions, triggers, ...
There are some system databases e.g. mysql, performance_schema, information_schema, sys, ... They contains db internal/system information.
e.g. SELECT user, host FROM mysql.user;
A database contains one or more tables.
Tables have multiple columns.
Each column is associated with a data-type.
Columns may have zero or more constraints.
The data in table is in multiple rows.
Each row have multiple values (as per columns).
Database physical layout
In MySQL, the data is stored on disk in its data directory i.e. /var/lib/mysql
Each database/schema is a separate sub-directory in data dir.
Each table in the db, is a file on disk.
e.g. student table in current db is stored in file /var/lib/mysql/db/student.ibd.
Data is stored in binary format.
A file may not be contiguously stored on hard disk.
Data rows are not contiguous. They are scattered in the hard disk.
In one row, all fields are consecutive.
When records are selected, they are selected in any order.
MySQL data types
RDBMS have similar data types (but not same).
MySQL data types can be categorised as follows
Numeric types (Integers)
TINYINT (1 byte), SMALLINT (2 byte), MEDIUMINT (3 byte), INT (4 byte), BIGINT (8 byte), BIT(n bits)
integer types can signed (default) or unsigned.
Numeric types (Floating point)
approx. precision – FLOAT (4 byte), DOUBLE (8 byte) | DECIMAL(m, n) – exact precision
Date/Time types
DATE, TIME, DATETIME, TIMESTAMP, YEAR
String types – size = number of chars * size of char
CHAR(1-255) – Fixed length, Very fast access.
VARCHAR(1-65535) – Variable length, Stores length + chars.
TINYTEXT (255), TEXT (64K), MEDIUMTEXT (16M), LONGTEXT (4G) – Variable length, Slower access.
Binary types – size = number of bytes
BINARY, VARBINARY, TINYBLOB, BLOB, MEDIUMBLOB, LONGBLOB
Miscellaneous types
ENUM, SET
CHAR vs VARCHAR vs TEXT
CHAR
Fixed inline storage.
If smaller data is given, rest of space is unused.
Very fast access.
VARCHAR
Variable inline storage.
Stores length and characters.
Slower access than CHAR.
TEXT
Variable external storage.
Very slow access.
Not ideal for indexing.
CREATE TABLE temp(c1 CHAR(4), c2 VARCHAR(4), c3 TEXT(4));
DESC temp;
INSERT INTO temp VALUES('abcd', 'abcd', 'abcdef');
INSERT – DML
Insert a new row (all columns, fixed order).
INSERT INTO table VALUES (v1, v2, v3);
Insert a new row (specific columns, arbitrary order).
INSERT INTO table(c3, c1, c2) VALUES (v3, v1, v2);
INSERT INTO table(c1, c2) VALUES (v1, v2);
Missing columns data is NULL.
NULL is special value and it is not stored in database.
Insert multiple rows.
INSERT INTO table VALUES (av1, av2, av3), (bv1, bv2, bv3), (cv1, cv2, cv3).
Insert rows from another table.
INSERT INTO table SELECT c1, c2, c3 FROM another-table;
INSERT INTO table (c1,c2) SELECT c1, c2 FROM another-table;
SQL scripts
SQL script is multiple SQL queries written into a .sql file.
SQL scripts are mainly used while database backup and restore operations.
SQL scripts can be executed from terminal as:
terminal> mysql –u user –ppassword db < /path/to/sqlfile
SQL scripts can be executed from command line as:
mysql> SOURCE /path/to/sqlfile
Note that SOURCE is MySQL CLI client command.
It reads commands one by one from the script and execute them on server
SELECT – DQL
Select all columns (in fixed order).
SELECT * FROM table;
Select specific columns / in arbitrary order.
SELECT c1, c2, c3 FROM table;
Column alias
SELECT c1 AS col1, c2 col2 FROM table;
Computed columns.
SELECT c1, c2, c3, expr1, expr2 FROM table;
SELECT c1, CASE WHEN condition1 THEN value1, WHEN condition2 THEN value2, … ELSE valuen END FROM table;
Distinct values in column.
SELECT DISTINCT c1 FROM table;
SELECT DISTINCT c1, c2 FROM table;
Select limited rows.
SELECT * FROM table LIMIT n;
SELECT * FROM table LIMIT m, n;
SELECT – DQL – ORDER BY
In db rows are scattered on disk. Hence may not be fetched in a fixed order.
Select rows in asc order.
SELECT * FROM table ORDER BY c1;
SELECT * FROM table ORDER BY c2 ASC;
Select rows in desc order.
SELECT * FROM table ORDER BY c3 DESC;
Select rows sorted on multiple columns.
SELECT * FROM table ORDER BY c1, c2;
SELECT * FROM table ORDER BY c1 ASC, c2 DESC;
SELECT * FROM table ORDER BY c1 DESC, c2 DESC;
Select top or bottom n rows.
SELECT * FROM table ORDER BY c1 ASC LIMIT n;
SELECT * FROM table ORDER BY c1 DESC LIMIT n;
SELECT * FROM table ORDER BY c1 ASC LIMIT m, n;
SELECT – DQL – WHERE
It is always good idea to fetch only required rows (to reduce network traffic).
The WHERE clause is used to specify the condition, which records to be fetched.
Relational operators
<, >, <=, >=, =, != or <>
NULL related operators
NULL is special value and cannot be compared using relational operators.
IS NULL or <=>, IS NOT NULL.
Logical operators
AND, OR, NOT
AND, OR, NOT
BETWEEN operator (include both ends)
c1 BETWEEN val1 AND val2
IN operator (equality check with multiple values)
c1 IN (val1, val2, val3)
LIKE operator (similar strings)
c1 LIKE ‘pattern’.
% represent any number of any characters.
_ represent any single character.
UPDATE – DML
To change one or more rows in a table.
Update row(s) single column.
UPDATE table SET c2=new-value WHERE c1=some-value;
Update multiple columns.
UPDATE table SET c2=new-value, c3=new-value WHERE c1=some-value;
Update all rows single column.
UPDATE table SET c2=new-value;
DELETE – DML vs TRUNCATE – DDL vs DROP – DDL
DELETE
To delete one or more rows in a table.
Delete row(s)
DELETE FROM table WHERE c1=value;
Delete all rows
DELETE FROM table
TRUNCATE
Delete all rows.
TRUNCATE TABLE table;
Truncate is faster than DELETE.
DROP
Delete all rows as well as table structure.
DROP TABLE table;
DROP TABLE table IF EXISTS;
Delete database/schema.
DROP DATABASE db;
Seeking HELP
HELP is client command to seek help on commands/functions.
HELP SELECT;
HELP Functions;
HELP SIGN;
DUAL table
A dummy/in-memory a table having single row & single column.
It is used for arbitrary calculations, testing functions, etc.
SELECT 2 + 3 * 4 FROM DUAL;
SELECT NOW() FROM DUAL;
SELECT USER(), DATABASE() FROM DUAL;
In MySQL, DUAL keyword is optional.
SELECT 2 + 3 * 4;
SELECT NOW();
SELECT USER(), DATABASE();
SQL functions
RDBMS provides many built-in functions to process the data.
These functions can be classified as:
Single row functions
One row input produce one row output.
e.g. ABS(), CONCAT(), IFNULL(), …
Multi-row or Group functions
Values from multiple rows are aggregated to single value.
e.g. SUM(), MIN(), MAX(), …
These functions can also be categorized based on data types or usage.
Numeric functions
String functions
Date and Time functions
Control flow functions
Information functions
Miscellaneous functions
Numeric & String functions
ABS()
POWER()
ROUND(), FLOOR(), CEIL()
ASCII(), CHAR()
CONCAT()
SUBSTRING()
LOWER(), UPPER()
TRIM(), LTRIM(), RTRIM()
LPAD(), RPAD()
REGEXP_LIKE()
Date-Time and Information functions
VERSION()
USER(), DATABASE()
MySQL supports multiple date time related data types
DATE (3), TIME (3), DATETIME (5), TIMESTAMP (4), YEAR (1)
SYSDATE(), NOW()
DATE(), TIME()
DAYOFMONTH(), MONTH(), YEAR(), HOUR(), MINUTE(), SECOND(), …
DATEDIFF(), DATE_ADD(), TIMEDIFF()
MAKEDATE(), MAKETIME()
Control and NULL and List functions
NULL is special value in RDBMS that represents absence of value in that column.
NULL values do not work with relational operators and need to use special operators.
Most of functions return NULL if NULL value is passed as one of its argument.
ISNULL()
IFNULL()
NULLIF()
COALESCE()
GREATEST(), LEAST()
IF(condition, true-value, false-value)
Group functions
Work on group of rows of table.
Input to function is data from multiple rows & then output is single row. Hence these functions are called as "Multi Row Function“ or "Group Functions“.
These functions are used to perform aggregate ops like sum, avg, max, min, count or std dev, etc. Hence these fns are also called as "Aggregate Functions".
Example: SUM(), AVG(), MAX(), MIN(), COUNT().
NULL values are ignored by group functions.
Limitations of GROUP functions:
Cannot select group function along with a column.
Cannot select group function along with a single row fn.
Cannot use group function in WHERE clause/condition.
Cannot nest a group function in another group fn.
GROUP BY clause
GROUP BY is used for analysis of data i.e. generating reports & charts.
When GROUP BY single column, generated output can be used to plot 2-D chart. When GROUP BY two column, generated output can be used toplot 3-D chart and so on.
GROUP BY queries are also called as Multi-dimensional / Spatial queries.
Syntactical Characteristics:
If a column is used for GROUP BY, then it may or may not be used in SELECT clause.
If a column is in SELECT, it must be in GROUP BY.
When GROUP BY query is fired on database server, it does following:
Load data from server disk into server RAM.
Sort data on group by columns.
Group similar records by group columns.
Perform given aggregate ops on each column.
Send result to client.
# Transaction
Transaction is set of DML queries executed as a single unit.
Transaction examples
accounts table [id, type, balance]
UPDATE accounts SET balance=balance-1000 WHERE id = 1;
UPDATE accounts SET balance=balance+1000 WHERE id = 2;
RDBMS transaction have ACID properties.
Atomicity
All queries are executed as a single unit. If any query is failed, other queries are discarded.
Consistency
When transaction is completed, all clients see the same data.
Isolation
Multiple transactions (by same or multiple clients) are processed concurrently.
Durable
When transaction is completed, all data is saved on disk.
Transaction management
START TRANSACTION;
COMMIT WORK;
START TRANSACTION;
ROLLBACK WORK;
In MySQL autocommit variable is by default 1. So each DML command is auto- committed into database.
SELECT @@autocommit;
Changing autocommit to 0, will create new transaction immediately after current transaction is completed. This setting can be made permanent in config file.
SET autocommit=0;
Save-point is state of database tables (data) at the moment (within a transaction).
It is advised to create save-points at end of each logical section of work.
Database user may choose to rollback to any of the save-point.
Transaction management with Save-points
Commit always commit the whole transaction.
ROLLBACK or COMMIT clears all save-points.
Transaction is set of DML statements.
If any DDL statement is executed, current transaction is automatically committed.
Any power failure, system or network failure automatically rollback current state.
Transactions are isolated from each other and are consistent.
Row locking
When an user update or delete a row (within a transaction), that row is locked and becomes read-only for other users.
The other users see old row values, until transaction is committed by first user.
If other users try to modify or delete such locked row, their transaction processing is blocked until row is unlocked.
Other users can INSERT into that table. Also they can UPDATE or DELETE other rows.
The locks are automatically released when COMMIT/ROLLBACK is done by the user.
This whole process is done automatically in MySQL. It is called as "OPTIMISTIC LOCKING".
Manually locking the row in advanced before issuing UPDATE or DELETE is known as "PESSIMISTIC LOCKING".
This is done by appending FOR UPDATE to the SELECT query.
It will lock all selected rows, until transaction is committed or rollbacked.
If these rows are already locked by another users, the SELECT operationm is blocked until rows lock is released.
By default MySQL does table locking. Row locking is possible only when table is indexed on the column.
Entity Relations
To avoid redundancy of the data, data should be organized into multiple tables so that tables are related to each other.
The relations can be one of the following
One to One
One to Many
Many to One
Many to Many
Entity relations is outcome of Normalization process.
Join statements are used to SELECT data from multiple tables using single query.
Typical RDBMS supports following types of joins:
Cross Join
Inner Join
Left Outer Join
Right Outer Join
Full Outer Join
Self join
Cross Joins
Compares each row of Table1 with every row of Table2.
Yields all possible combinations of Table1 and Table2.
In MySQL, The larger table is referred as "Driving Table", while smaller table is referred as "Driven Table". Each row of Driving table is combined with every row of Driven table.
Cross join is the fastest join, because there is no condition check involved.
Inner Join
The inner JOIN is used to return rows from both tables that satisfy the join condition.
Non-matching rows from both tables are skipped.
If join condition contains equality check, it is referred as equi-join; otherwise it is non-equi-join.
Left Outer Join
Left outer join is used to return matching rows from both tables along with additional rows in left table.
Corresponding to additional rows in left table, right table values are taken as NULL.
Corresponding to additional rows in left table, right table values are taken as NULL.
Right Outer Join
Right outer join is used to return matching rows from both tables along with additional rows in right table.
Corresponding to additional rows in right table, left table values are taken as NULL.
OUTER keyword is optional.
Full Outer Join
Full join is used to return matching rows from both tables along with additional rows in both tables.
Corresponding to additional rows in left or right table, opposite table values are taken as NULL.
Full outer join is not supported in MySQL, but can be simulated using set operators.
Set Operators
UNION operator is used to combine results of two queries. The common data is taken only once. It can be used to simulate full outer join.
UNION ALL operator is used to combine results of two queries. Common data is repeated.
Self Joins
When join is done on same table, then it is known as "Self Join". The both columns in condition belong to the same table.
Self join may be an inner join or outer join.
Multi-table Joins
Sub queries
Sub-query is query within query. Typically it work with SELECT statements.
Output of inner query is used as input to outer query.
If no optimization is enabled, for each row of outer query result, sub-query is executed once. This reduce performance of sub-query.
Single row sub-query
Sub-query returns single row.
Usually it is compared in outer query using relational operators.
Multi-row sub-query
Sub-query returns multiple rows.
Usually it is compared in outer query using operators like IN, ANY or ALL.
IN operator checks for equality with results from sub-queries.
ANY operator compares with one of the result from sub-queries.
ALL operator compares with all the results from sub-queries.
Correlated sub-query
If number of results from sub-query are reduced, query performance will increase.
This can be done by adding criteria (WHERE clause) in sub-query based on outer query row.
Typically correlated sub-query use IN, ALL, ANY and EXISTS operators.
Sub queries with UPDATE and DELETE are not supported in all RDBMS.
In MySQL, Sub-queries in UPDATE/DELETE is allowed, but sub-query should not SELECT from the same table, on which UPDATE/DELETE operation is in progress.
Views
RDBMS view represents view (projection) of the data.
View is based on SELECT statement.
Typically it is restricted view of the data (limited rows or columns) from one or more tables (joins and/or sub-queries) or summary of the data (grouping).
Data of view is not stored on server hard-disk; but its SELECT statement is stored in compiled form. It speed up execution of view.
Views are of two types: Simple view and Complex view
Usually if view contains computed columns, group by, joins or sub-queries, then the views are said to be complex. DML operations are not supported on these views.
DML operations on view affects underlying table.
View can be created with CHECK OPTION to ensure that DML operations can be performed only the data visible in that view.
Views can be differentiated with: SHOW FULL TABLES.
Views can be dropped with DROP VIEW statement.
View can be based on another view.
Applications of views
Security: Providing limited access to the data.
Hide source code of the table.
Simplifies complex queries.
Data Control Language
Security is built-in feature of any RDBMS. It is implemented in terms of permissions (a.k.a. privileges).
There are two types of privileges.
System privileges
Privileges for certain commands i.e. CREATE, ALTER, DROP, ...
Typically these privileges are given to the database administrator or higher authority user.
Object privileges
RDBMS objects are table, view, stored procedure, function, triggers, …
Can perform operations on the objects i.e. INSERT, UPDATE, DELETE, SELECT, CALL, ...
Typically these privileges are given to the database users.
Permissions are given to user using GRANT command.
GRANT CREATE ON db.* TO user@host;
GRANT CREATE ON . TO user1@host, user2@host;
GRANT SELECT ON db.table TO user@host;
GRANT SELECT, INSERT, UPDATE ON db.table TO user@host;
GRANT ALL ON db.* TO user@host;
By default one user cannot give permissions to other user. This can be enabled using WITH GRANT OPTION.
GRANT ALL ON . TO user@host WITH GRANT OPTION;
Permissions assigned to any user can be withdrawn using REVOKE command.
REVOKE SELECT, INSERT ON db.table FROM user@host;
Permissions can be activated by FLUSH PRIVILEGES.
System GRANT tables are reloaded by this command. Auto done after GRANT, REVOKE.
Command is necessary is GRANT tables are modified using DML operations.
Index
Index enable faster searching in tables by indexed columns.
CREATE INDEX idx_name ON table(column);
One table can have multiple indexes on different columns/order.
Typically indexes are stored as some data structure (like BTREE or HASH) on disk.
Indexes are updated during DML operations. So DML operation are slower on indexed tables.
Index can be ASC or DESC.
It cause storage of key values in respective order (MySQL 8.x onwards).
ASC/DESC index is used by optimizer on ORDER BY queries.
There are four types of indexes:
Simple index
CREATE INDEX idx_name ON table(column [ASC|DESC]);
Unique index
CREATE UNIQUE INDEX idx_name ON table(column [ASC|DESC]);
Doesn’t allow duplicate values.
Composite index
CREATE INDEX idx_name ON table(column1 [ASC|DESC], column2 [ASC|DESC]);
Composite index can also be unique. Do not allow duplicate combination of columns.
Clustered index
PRIMARY index automatically created on Primary key for row lookup.
If primary key is not available, hidden index is created on synthetic column.
It is maintained in tabular form and its reference is used in other indexes.
Indexes should be created on shorter (INT, CHAR, …) columns to save disk space.
Few RDBMS do not allow indexes on external columns i.e. TEXT, BLOB.
Last updated