Normalization is the process of organizing data in a database to eliminate redundancy and ensure data integrity. The primary goal is to reduce the chances of anomalies when inserting, updating, or deleting data. The process involves dividing a database into tables and using relationships to maintain the integrity of the data.
Data is organized into tables with no repeating groups. Each column must contain atomic (indivisible) values. Example:
Student ID | Courses |
---|---|
1 | Math, Science |
2 | English, History |
In 1NF, the above table is converted to:
Student ID | Course |
---|---|
1 | Math |
1 | Science |
2 | English |
2 | History |
In 2NF, the table must be in 1NF and all non-key attributes must be fully functionally dependent on the primary key. This eliminates partial dependency. Example:
A table with a composite key (Student ID, Course) would require separate tables for student information and courses to eliminate redundancy.
In 3NF, the table must be in 2NF, and there must be no transitive dependency (i.e., non-key attributes should not depend on other non-key attributes). This ensures that each column is only dependent on the primary key.
Normalization helps in organizing data to minimize redundancy and ensure data integrity. By following 1NF, 2NF, and 3NF, databases become more efficient and maintainable.