Normalization in DBMS

 Normalization in DBMS

Normalization is the process or rules for minimizing data redundancy (duplicates or repetition) in relation or set of relations. 

It divides larger tables into smaller tables and links or connects them by relationships.

Redundancy creates the cause of insertion, deletion, and update anomalies.

Insert anomalies − Insert data in a record that does not exist at all.

Deletion anomalies − Delete a record, but parts of it were undeleted because the data is also saved somewhere else. 

Update anomalies –  due to scattered copies of data in the database, when updating a data item in the database but its old values are present at several places in the database, so old and new values find at a time in a database which is wrong and database in an inconsistent state. 


Normalization is a method to remove all these anomalies and bring the database to a consistent state.

This process divides into various Normal forms.


Types of normal forms:

I-NF

II- NF

III- NF

BCNF

IV-NF

V- NF


First Normal Form (1NF)

o A relation must contain an atomic value.

o Table attribute cannot hold multiple values and must hold only a single-valued attribute.

o First normal form does not allow the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is unnormalized form

EMPLOYEE table:

 

EMP_ID

EMP_NAME

EMP_PHONE

EMP_STATE

14

John

7272826385,
9064738238

UP

20

Harry

8574783832

Bihar

12

Sam

7390372389,
8589830302

Punjab

 

Normalization form of EMPLOYEE into 1NF:

EMP_ID

EMP_NAME

EMP_PHONE

EMP_STATE

14

John

7272826385

UP

14

John

9064738238

UP

20

Harry

8574783832

Bihar

12

Sam

7390372389

Punjab

12

Sam

8589830302

Punjab

 ============================================================

Second Normal Form (2NF)

  • Relational must be in 1NF.
  • All non-key attributes are fully functional and dependent on the primary key.
  • Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.
  • Non-prime attribute − An attribute, which is not a part of the prime key, is known as non-prime attribute.

Example:- Student_Project (stu_id, proj_id, stu_name, proj_name)

There are two prime key attributes Stu_ID and Proj_ID and non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both. Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in the Second Normal Form.

 

Student(stu_id, stu_name, proj_id)

Project (proj_id, proj_name)

 

According to II-NF, break this the relation into two 1. Student and 2. Project.

These relations remove the partial dependency.

 

o    No Partial Dependency, i.e.no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.

o    Partial Dependency – If the proper subset of candidate key determines a non-prime attribute, it is called partial dependency

 

Example 1 – Consider table-3 as following below.

STUD_NO                   

COURSE_NO

COURSE_FEE

1                    

C1                 

1000

2                    

C2                

 1500

1                    

C4                 

2000

4                    

C3                 

1000

4                    

C1                 

1000

2                    

C5                 

2000

 

COURSE_FEE is a non-prime attribute, not belong to the one only candidate key {STUD_NO, COURSE_NO}. But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, it is a Non-prime attribute COURSE_FEE.

 

Example: TEACHER table

TEACHER_ID

SUBJECT

TEACHER_AGE

25

Chemistry

30

25

Biology

30

47

English

35

83

Math

38

83

Computer

38

TEACHER_AGE is the non-prime attribute, dependent on TEACHER_ID .

To convert the given table into 2NF, decompose into two tables:

TEACHER_DETAIL table:

TEACHER_ID

TEACHER_AGE

25

30

47

35

83

38

TEACHER_SUBJECT table:

TEACHER_ID

SUBJECT

25

Chemistry

25

Biology

47

English

83

Math

83

Computer

======================================================================

Third Normal Form (3NF)

  • A relation is in 2NF and does not contain any transitive partial dependency of non-prime attributes.
  • 3NF reduces data duplication. It maintains data integrity.
  • Relation has only single candidate keys( i.e. every candidate key consists of only 1 attribute)
  • Holds at least one of the following conditions for every non-trivial function dependency X → Y.
  1. X is a super key.
  2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

 

Example:

EMPLOYEE_DETAIL table:

EMP_ID

EMP_NAME

EMP_ZIP

EMP_STATE

EMP_CITY

222

Harry

201010

UP

Noida

333

Stephan

02228

US

Boston

444

Lan

60007

US

Chicago

555

Katharine

06389

UK

Norwich

666

John

462007

MP

Bhopal

Super key in the table above:

    1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on  

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

EMP_STATE & EMP_CITY dependent on EMP_ZIP .

EMP_ZIP dependent on EMP_ID.

The non-prime attributes (EMP_STATE, EMP_CITY) are transitively dependent on the super key(EMP_ID). It violates the rule of the third normal form, so split this relation into two sub relations EMPLOYEE table, EMPLOYEE_ZIP table.

 

EMPLOYEE table:

EMP_ID

EMP_NAME

EMP_ZIP

222

Harry

201010

333

Stephan

02228

444

Lan

60007

555

Katharine

06389

666

John

462007

 

EMPLOYEE_ZIP table:

EMP_ZIP

EMP_STATE

EMP_CITY

201010

UP

Noida

02228

US

Boston

60007

US

Chicago

06389

UK

Norwich

462007

MP

Bhopal

 

======================================================================

Boyce Codd normal form (BCNF)

  • The advance version of 3NF.
  • It is stricter than 3NF.
  • Every functional dependency X → Y, X is the super key of the table.

 

Example: EMPLOYEE table:

EMP_ID

EMP_COUNTRY

EMP_DEPT

DEPT_TYPE

EMP_DEPT_NO

264

India

Designing

D394

283

264

India

Testing

D394

300

364

UK

Stores

D283

232

364

UK

Developing

D283

549

In the above table Functional dependencies are as follows:

  1. EMP_ID  →  EMP_COUNTRY  
  2. EMP_DEPT  →   {DEPT_TYPE, EMP_DEPT_NO}  

 

Candidate key: {EMP-ID, EMP-DEPT}

To convert the given table into BCNF, decompose it into three tables: EMP_COUNTRY, EMP_DEPT, EMP_DEPT_MAPPING.

 

EMP_COUNTRY table:

EMP_ID

EMP_COUNTRY

264

India

264

India

 

EMP_DEPT table:

EMP_DEPT

DEPT_TYPE

EMP_DEPT_NO

Designing

D394

283

Testing

D394

300

Stores

D283

232

Developing

D283

549

 

EMP_DEPT_MAPPING table:

EMP_ID

EMP_DEPT

D394

283

D394

300

D283

232

D283

549

 

Functional dependencies:

  1. EMP_ID   →    EMP_COUNTRY  
  2. EMP_DEPT   →   {DEPT_TYPE, EMP_DEPT_NO}  

 

Candidate keys:

1.    first table: EMP_ID

2.    second table: EMP_DEPT

3.    third table: {EMP_ID, EMP_DEPT}

 

Now, this is in BCNF because the left side part of both the functional dependencies is a key.

1.    BCNF is free from redundancy.

2.    If a relation is in BCNF, then 3NF is also satisfied.

3.     If all attributes of relation are prime attributes, then the relation is always in 3NF.

4.    Every Binary Relation ( only 2 attributes ) is always in BCNF.

 

================================================================

Summary

Normal Form

Description

1NF

A relation contains an atomic value.

2NF

In this relation, all non-key attributes are fully functional and dependent on the primary key. (full functional dependency)

3NF

A relation is not in transition dependency. (remove transition dependency)

BCNF

In this relation, all attributes of the relation are prime attribute. (no candidate key)

 

=========================================================== 

Post a Comment

0 Comments