Friday, April 6, 2012

DATABASE NORMALIZATION

Database normalization is defined as the rules that help in building relational database. In the database building process, the redundant data is eliminated and the related data is stored in the table. Basically, this process is used to organize data so that its performance is optimum. The process of normalization involves breaking a single table into two or more tables and defining the relationships between them. The purpose of isolating data in more than one table is to add, delete, and modify fields in one table and then propagate the  data through the rest of the database.


Normalization guidelines are divided into five normal forms:
  • 1st Normal Form
  • 2nd Normal Form
  • 3rd Normal Form
  • 4th Normal Form
  • 5th Normal Form
First Normal Form
The first normal form describes the basic rules to normalize a database. It follows the rules listed below:
  • Remove duplicate columns from a table.
  • Segregate tables with a specific group of data.


Consider the following Student table:


StudentID Course
12345 4100,4200,4300
54321 2100,2200,2300


In Student table, the Course field is a multi-valued attribute and has no single value for any field. Now, consider this Student table:


StudentID Course1 Course2 Course3
12345 4100 4200 4300
54321 2100 2200 2300


The Course1, Course2, Course3 fields represent repeating groups and to brought it to First Normal Form we will store the data in following ways.


StudentID Course
12345 4100
12345 4200
12345 4300
54321 2100
54321 2200
54321 2300


In the first two designs of the data, the students enrolled for a certain course is difficult. Say I want to do the following:


Tell me all of the students enrolled in course 4100.  In the first design, you'll have to pull all of the course data and parse it somehow. And in the second design, you'll have to check 3 different fields for course 4100. In the final design, a simple Select StudentID from Student where Course=100.

Second Normal Form
The second normal form is defined to remove duplicate data from a table. These are as follows:
  1. It includes all rules of the first normal form.
  2. Remove the data from the table that applies to multiple rows and then arrange it in the other table by creating relations.

Third Normal Form

The third normal form is defined to remove the columns that do not depend upon constraints. It has the following rules:
  1. It includes all rules of the second normal form.
  2. It removes those columns that do not depend upon the Primary keys.


No comments:

Post a Comment