1
UNIT-IV: Transaction Processing Concepts
Transaction processing and Concurrency Control: Definition of Transaction, Desirable ACID
properties, overview of serializability, serializable and non serializable transactions
Concurrency Control: Definition of concurrency, lost update, dirty read and incorrect summary
problems due to concurrency
The concept of transaction provides a mechanism for describing logical units of database
processing. Transaction processing systems are systems with large databases and hundreds of
concurrent users that are executing database transactions. Examples of such systems include
systems for reservations, banking, credit card processing, stock markets, supermarket
checkout, and other similar systems. They require high availability and fast response time for
hundreds of concurrent users.
The main concepts that are needed in transaction processing systems is transaction, which is
used to represent a logical unit of database processing that must be completed in its entirety to
ensure correctness.
The concurrency control problem occurs when multiple transactions submitted by various users
interfere with one another in a way that produces incorrect results.
Introduction to Transaction Processing
Single-User Versus Multiuser Systems
A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if
many users can use the system—and hence access the database—concurrently.
 Most DBMS are multiuser (e.g., airline reservation system).
 Multiprogramming operating systems allow the computer to execute multiple
programs (or processes) at the same time (having one CPU, concurrent execution
of processes is actually interleaved).
 If the computer has multiple hardware processors (CPUs), parallel processing of
multiple processes is possible.
Introduction to Transactions, Read and Write Operations & DBMS Buffers
A Transaction is a logical unit of database processing that includes one or more access
operations ((e.g., insertion, deletion, modification, or retrieval operations).
The database operations that form a transaction can either be embedded within an application
program or they can be specified interactively via a high-level query language such as SQL. One
way of specifying the transaction boundaries is by specifying explicit begin transaction and end
transaction statements in an application program; in this case, all database access operations
between the two are considered as forming one transaction.
A single application program may contain more than one transaction if it contains several
transaction boundaries.
2
If the database operations in a transaction do not update the database but only retrieve data, the
transaction is called a read-only transaction.
SIMPLE MODEL OF A DATABASE (for purposes of discussing transactions):
 A database is a collection of named data items.
 Granularity The size of a data item is called its granularity, and it can be a field of
some record in the database, or it may be a larger unit such as a record or even a whole
disk block
 Basic operations are read and write
o read_item(X): Reads a database item named X into a program variable. To
simplify our notation, we assume that the program variable is also named X.
o write_item(X): Writes the value of program variable X into the database item
named X.
 read_item(X) command includes the following steps:
o Find the address of the disk block that contains item X.
o Copy that disk block into a buffer in main memory (if that disk block is not
already in some main memory buffer).
o Copy item X from the buffer to the program variable named X.
 write_item(X) command includes the following steps:
o Find the address of the disk block that contains item X.
o Copy that disk block into a buffer in main memory (if that disk block is not
already in some main memory buffer).
o Copy item X from the program variable named X into its correct location in the
buffer.
o Store the updated block from the buffer back to disk (either immediately or at
some later point in time).
Example: Let T1 be a transaction that transfer N=50 from Account X to Account Y. This can be
defined in fig(a) as:
3
Desirable Properties of Transactions (ACID properties)
To ensure integrity of data, we require that the database system maintains the following
properties of the transactions. These are often called the ACID properties. The following are the
ACID properties:
1. Atomicity: A transaction is an atomic unit of processing; it is either performed in its
entirety or not performed at all.
2. Consistency preservation: A transaction is consistency preserving if its complete
execution take(s) the database from one consistent state to another.
3. Isolation: A transaction should appear as though it is being executed in isolation from
other transactions. That is, the execution of a transaction should not be interfered with by
any other transactions executing concurrently.
4. Durability or permanency: The changes applied to the database by a committed
transaction must persist in the database. These changes must not be lost because of any
failure.
Transaction States
A transaction is an atomic unit of work that is either completed in its entirety or not done at all.
For recovery purposes, the system needs to keep track of when the transaction starts, terminates,
and commits or aborts (see below). Hence, the recovery manager keeps track of the following
operations:
 Active, the initial state; the transaction stays in this state while it is executing.
 Partially committed, after the final statement has been executed
 Failed, after the discovery that normal execution can no longer proceed.
 Aborted, after the transaction has been rolled backed and the database has been restored
to its state prior to the start of transaction.
 Committed, after successful completion.
4
Overview of Schedule, Serializability, Serializable and Non-serializable transactions
Schedules (Histories) of Transactions
When transactions are executing concurrently in an interleaved fashion, then the order of
execution of operations from the various transactions is known as a schedule (or history).
A schedule (or history) S of n transactions T1,T2 , ...,Tn is an ordering of the operations of the
transactions subject to the constraint that, for each transaction Ti, that participates in S, the
operations of in S must appear in the same order in which they occur in.
For the purpose of recovery and concurrency control, we are mainly interested in the read_item
and write_item operations of the transactions, as well as the commit and abort operations. A
shorthand notation for describing a schedule uses the symbols r, w, c, and a for the operations
read_item, write_item, commit, and abort, respectively.
5
For example, the schedule of Figure (a),(b),(c),(d) shown above which we shall call Sa Sb Sc Sd ,
can be written as follows in this notation:
 Sa : r1(X); w1(X); r1(Y); w1(Y); r2(X);w2(X);
 Sb: r2(X); w2(X); r1(X); w1(X); r1(Y);w1(Y);
 Sc : r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
 Sd: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;
Two operations in a schedule are said to conflict if they satisfy all three of the following
conditions:
 they belong to different transactions;
 they access the same item X; and
 at least one of the operations is a write_item(X).
For example, in schedule Sc, the operations r1(X) and w2(X) conflict, as do the operations r2(X)
and w1(X), and the operations w1(X) and w2(X).
However, the operations r1(X) and r2(X) do not conflict, since they are both read operations; the
operations w2(X) and w1(Y) do not conflict, because they operate on distinct data items X and Y;
and the operations r1(X) and w1(X) do not conflict, because they belong to the same transaction.
Serializability of Schedules
 The concept of serializability of schedules is used to identify which schedules are
correct when transaction executions have interleaving of their operations in the schedules.
Two transactions T1 & T2 execute serially in a schedule S if it :
i) Execute all the operations of transaction T1 (in sequence) followed by all the
operations of transaction T2(in sequence).
ii) Execute all the operations of transaction T2 (in sequence) followed by all the
operations of transaction T1 (in sequence).
Otherwise, they execute concurrently.
 A schedule S is serial if for any transaction Ti executing in S, all the operations in Ti are
executed consecutively in S; otherwise, S is called nonserial. In other words, only one
transaction at a time is executed in S. There is no interleaving. e.g of serial schedule:
i) T1 is followed by T2 & ii) T2 is followed by T1.
 A schedule S of n transactions is serializable if it is equivalent to some serial schedule of
the same n transactions.
A schedule S of n transactions T1, T2, ..., Tn, is said to be a complete schedule if the following
conditions hold:
 The operations in S are exactly those operations in T1, T2, ..., Tn, including a commit or
abort operation as the last operation for each transaction in the schedule.
 For any pair of operations from the same transaction Ti, their order of appearance in S is
the same as their order of appearance in Ti.
 For any two conflicting operations, one of the two must occur before the other in the
schedule.
6
CONCURRENCY CONTROL
Concurrency in a DBMS
Concurrent execution of user programs is essential for good DBMS performance. Because disk
accesses are frequent, and relatively slow, it is important to keep the CPU busy all the time by
working on several user programs concurrently. Concurrency is achieved by the DBMS, which
interleaves actions (reads/writes of DB objects) of various transactions. Each transaction must
leave the database in a consistent state.
Advantages:
- increase the throughput of the system.
- minimize response/waiting time for each transaction.
Why Concurrency Control Is Needed
Concurrency control is mainly concerned with the database access commands in a transaction.
Transactions submitted by the various users may execute concurrently and may access and
update the same database items. If this concurrent execution is uncontrolled, it may lead to
problems, such as an inconsistent database.
Several problems can occur when concurrent transactions execute in an uncontrolled manner.
Some of these problems are:
i. Lost Update Problem
ii. Dirty Read Problem &
iii. The Incorrect Summary Problem
Consider an example by referring figure 1 a & b, where T1 is a transaction that transfer N=50
from Account X to Account Y. and T2 is a simple transaction that simply add M to Account X
referenced in T1..
i) The Lost Update Problem : This problem occurs when two transactions that access the same
database items have their operations interleaved in a way that makes the value of some database
item incorrect.
Suppose that transactions are submitted at approximately the same time, and suppose that their
operations are interleaved as shown in Figure (a) below; then the final value of item X is
incorrect, because reads the value of X before changes it in the database, and hence the updated
7
value resulting from is lost. For example, At Initial, if X = 80, N = 5 and M = 4, then the final
result should be X = 79; but in the interleaving of operations shown in Figure (a), it is X = 84
because the update in that removed 5 from X was lost.
ii) The Temporary Update (or Dirty Read) Problem
 This problem occurs when one transaction updates a database item and then the
transaction fails for some reason.
 The updated item is accessed by another transaction before it is changed back
to its original value.
Figure (b) below shows an example where updates item X and then fails before completion, so
the system must change X back to its original value. Before it can do so, however, transaction
reads the "temporary" value of X, which will not be recorded permanently in the database
because of the failure. The value of item X that is read by is called dirty data, because it has been
created by a transaction that has not completed and committed yet; hence, this problem is also
known as the dirty read problem.
8
iii) The Incorrect Summary Problem
If one transaction is calculating an aggregate summary function on a number of records while
other transactions are updating some of these records, the aggregate function may calculate some
values before they are updated and others after they are updated.
For example, suppose that a transaction is calculating the total number of amounts on all the
accounts; meanwhile, transaction is executing. If the interleaving of operations shown in Figure
(c) occurs, the result of will be off by an amount N because reads the value of X after N have
been subtracted from it but reads the value of Y before those N have been added to it.

UNIT-IV: Transaction Processing Concepts

  • 1.
    1 UNIT-IV: Transaction ProcessingConcepts Transaction processing and Concurrency Control: Definition of Transaction, Desirable ACID properties, overview of serializability, serializable and non serializable transactions Concurrency Control: Definition of concurrency, lost update, dirty read and incorrect summary problems due to concurrency The concept of transaction provides a mechanism for describing logical units of database processing. Transaction processing systems are systems with large databases and hundreds of concurrent users that are executing database transactions. Examples of such systems include systems for reservations, banking, credit card processing, stock markets, supermarket checkout, and other similar systems. They require high availability and fast response time for hundreds of concurrent users. The main concepts that are needed in transaction processing systems is transaction, which is used to represent a logical unit of database processing that must be completed in its entirety to ensure correctness. The concurrency control problem occurs when multiple transactions submitted by various users interfere with one another in a way that produces incorrect results. Introduction to Transaction Processing Single-User Versus Multiuser Systems A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users can use the system—and hence access the database—concurrently.  Most DBMS are multiuser (e.g., airline reservation system).  Multiprogramming operating systems allow the computer to execute multiple programs (or processes) at the same time (having one CPU, concurrent execution of processes is actually interleaved).  If the computer has multiple hardware processors (CPUs), parallel processing of multiple processes is possible. Introduction to Transactions, Read and Write Operations & DBMS Buffers A Transaction is a logical unit of database processing that includes one or more access operations ((e.g., insertion, deletion, modification, or retrieval operations). The database operations that form a transaction can either be embedded within an application program or they can be specified interactively via a high-level query language such as SQL. One way of specifying the transaction boundaries is by specifying explicit begin transaction and end transaction statements in an application program; in this case, all database access operations between the two are considered as forming one transaction. A single application program may contain more than one transaction if it contains several transaction boundaries.
  • 2.
    2 If the databaseoperations in a transaction do not update the database but only retrieve data, the transaction is called a read-only transaction. SIMPLE MODEL OF A DATABASE (for purposes of discussing transactions):  A database is a collection of named data items.  Granularity The size of a data item is called its granularity, and it can be a field of some record in the database, or it may be a larger unit such as a record or even a whole disk block  Basic operations are read and write o read_item(X): Reads a database item named X into a program variable. To simplify our notation, we assume that the program variable is also named X. o write_item(X): Writes the value of program variable X into the database item named X.  read_item(X) command includes the following steps: o Find the address of the disk block that contains item X. o Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). o Copy item X from the buffer to the program variable named X.  write_item(X) command includes the following steps: o Find the address of the disk block that contains item X. o Copy that disk block into a buffer in main memory (if that disk block is not already in some main memory buffer). o Copy item X from the program variable named X into its correct location in the buffer. o Store the updated block from the buffer back to disk (either immediately or at some later point in time). Example: Let T1 be a transaction that transfer N=50 from Account X to Account Y. This can be defined in fig(a) as:
  • 3.
    3 Desirable Properties ofTransactions (ACID properties) To ensure integrity of data, we require that the database system maintains the following properties of the transactions. These are often called the ACID properties. The following are the ACID properties: 1. Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all. 2. Consistency preservation: A transaction is consistency preserving if its complete execution take(s) the database from one consistent state to another. 3. Isolation: A transaction should appear as though it is being executed in isolation from other transactions. That is, the execution of a transaction should not be interfered with by any other transactions executing concurrently. 4. Durability or permanency: The changes applied to the database by a committed transaction must persist in the database. These changes must not be lost because of any failure. Transaction States A transaction is an atomic unit of work that is either completed in its entirety or not done at all. For recovery purposes, the system needs to keep track of when the transaction starts, terminates, and commits or aborts (see below). Hence, the recovery manager keeps track of the following operations:  Active, the initial state; the transaction stays in this state while it is executing.  Partially committed, after the final statement has been executed  Failed, after the discovery that normal execution can no longer proceed.  Aborted, after the transaction has been rolled backed and the database has been restored to its state prior to the start of transaction.  Committed, after successful completion.
  • 4.
    4 Overview of Schedule,Serializability, Serializable and Non-serializable transactions Schedules (Histories) of Transactions When transactions are executing concurrently in an interleaved fashion, then the order of execution of operations from the various transactions is known as a schedule (or history). A schedule (or history) S of n transactions T1,T2 , ...,Tn is an ordering of the operations of the transactions subject to the constraint that, for each transaction Ti, that participates in S, the operations of in S must appear in the same order in which they occur in. For the purpose of recovery and concurrency control, we are mainly interested in the read_item and write_item operations of the transactions, as well as the commit and abort operations. A shorthand notation for describing a schedule uses the symbols r, w, c, and a for the operations read_item, write_item, commit, and abort, respectively.
  • 5.
    5 For example, theschedule of Figure (a),(b),(c),(d) shown above which we shall call Sa Sb Sc Sd , can be written as follows in this notation:  Sa : r1(X); w1(X); r1(Y); w1(Y); r2(X);w2(X);  Sb: r2(X); w2(X); r1(X); w1(X); r1(Y);w1(Y);  Sc : r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);  Sd: r1(X); w1(X); r2(X); w2(X); r1(Y); a1; Two operations in a schedule are said to conflict if they satisfy all three of the following conditions:  they belong to different transactions;  they access the same item X; and  at least one of the operations is a write_item(X). For example, in schedule Sc, the operations r1(X) and w2(X) conflict, as do the operations r2(X) and w1(X), and the operations w1(X) and w2(X). However, the operations r1(X) and r2(X) do not conflict, since they are both read operations; the operations w2(X) and w1(Y) do not conflict, because they operate on distinct data items X and Y; and the operations r1(X) and w1(X) do not conflict, because they belong to the same transaction. Serializability of Schedules  The concept of serializability of schedules is used to identify which schedules are correct when transaction executions have interleaving of their operations in the schedules. Two transactions T1 & T2 execute serially in a schedule S if it : i) Execute all the operations of transaction T1 (in sequence) followed by all the operations of transaction T2(in sequence). ii) Execute all the operations of transaction T2 (in sequence) followed by all the operations of transaction T1 (in sequence). Otherwise, they execute concurrently.  A schedule S is serial if for any transaction Ti executing in S, all the operations in Ti are executed consecutively in S; otherwise, S is called nonserial. In other words, only one transaction at a time is executed in S. There is no interleaving. e.g of serial schedule: i) T1 is followed by T2 & ii) T2 is followed by T1.  A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions. A schedule S of n transactions T1, T2, ..., Tn, is said to be a complete schedule if the following conditions hold:  The operations in S are exactly those operations in T1, T2, ..., Tn, including a commit or abort operation as the last operation for each transaction in the schedule.  For any pair of operations from the same transaction Ti, their order of appearance in S is the same as their order of appearance in Ti.  For any two conflicting operations, one of the two must occur before the other in the schedule.
  • 6.
    6 CONCURRENCY CONTROL Concurrency ina DBMS Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses are frequent, and relatively slow, it is important to keep the CPU busy all the time by working on several user programs concurrently. Concurrency is achieved by the DBMS, which interleaves actions (reads/writes of DB objects) of various transactions. Each transaction must leave the database in a consistent state. Advantages: - increase the throughput of the system. - minimize response/waiting time for each transaction. Why Concurrency Control Is Needed Concurrency control is mainly concerned with the database access commands in a transaction. Transactions submitted by the various users may execute concurrently and may access and update the same database items. If this concurrent execution is uncontrolled, it may lead to problems, such as an inconsistent database. Several problems can occur when concurrent transactions execute in an uncontrolled manner. Some of these problems are: i. Lost Update Problem ii. Dirty Read Problem & iii. The Incorrect Summary Problem Consider an example by referring figure 1 a & b, where T1 is a transaction that transfer N=50 from Account X to Account Y. and T2 is a simple transaction that simply add M to Account X referenced in T1.. i) The Lost Update Problem : This problem occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database item incorrect. Suppose that transactions are submitted at approximately the same time, and suppose that their operations are interleaved as shown in Figure (a) below; then the final value of item X is incorrect, because reads the value of X before changes it in the database, and hence the updated
  • 7.
    7 value resulting fromis lost. For example, At Initial, if X = 80, N = 5 and M = 4, then the final result should be X = 79; but in the interleaving of operations shown in Figure (a), it is X = 84 because the update in that removed 5 from X was lost. ii) The Temporary Update (or Dirty Read) Problem  This problem occurs when one transaction updates a database item and then the transaction fails for some reason.  The updated item is accessed by another transaction before it is changed back to its original value. Figure (b) below shows an example where updates item X and then fails before completion, so the system must change X back to its original value. Before it can do so, however, transaction reads the "temporary" value of X, which will not be recorded permanently in the database because of the failure. The value of item X that is read by is called dirty data, because it has been created by a transaction that has not completed and committed yet; hence, this problem is also known as the dirty read problem.
  • 8.
    8 iii) The IncorrectSummary Problem If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated. For example, suppose that a transaction is calculating the total number of amounts on all the accounts; meanwhile, transaction is executing. If the interleaving of operations shown in Figure (c) occurs, the result of will be off by an amount N because reads the value of X after N have been subtracted from it but reads the value of Y before those N have been added to it.