Thursday, July 28, 2011

Third Normal Form

Third Normal Form

By Tom Thomson, 2011/07/28


Third Normal Form (3NF) is the last of the three usually remembered normal forms defined back in 1971. Like second normal form (2NF), its purpose is to ensure that the database schema definition prevents certain errors from occurring.

The idea of third normal form is that the value of a non-key attribute should not be deducible from the values of other non-key attributes. A non-key attribute is an attribute that isn't in any candidate key. A candidate key is a set of attributes that could be used as the primary key.

Like 2NF, 3NF is about real world properties of the objects being modelled by data in the database. It is an example of a class of real world business rules being encoded in the schema definition so that the database can enforce them directly.

Definition of 3NF

The easiest definition to understand is

A relation R is in 3NF if and only if (i) it is in 2NF and (ii) whenever X is a set of attributes of R and A is a non-key attribute of R that is not in X, the value of A can be determined from the values of X only if X includes a candidate key.

Actually, the first clause of this could refer to 1NF instead of 2NF, because given 1NF it's clear that (ii) implies (i); but it's traditional to say for each normal form that it requires the next lower normal form, to avoid messy proofs.

That was not the original definition, although it defines exactly the same property. The original definition said that the relation must be in 2NF and must not have any non-trivial transitive functional dependencies. Using that terminology would require too much definition for an introductory note like this.

Why go to 3NF?

To see what sort of problems can be prevented by using 3NF, we need to define some business rules and show how they can be violated in a table that doesn't conform to 3NF but not when the schema is changed so that all tables conform to 3NF.

We'll use the same Pest Control Product company as in the 2NF article, but look at part of the sales operation rather than development this time. The Sales Division operates several subdivisions, determined by the type of pest that a product is used for. Three of these are the Insects group, the Weeds group, and the Rodents group. The groups don't share storage depots because some years back some insecticide accidentally got contaminated by weed killer and the resulting compensation costs were rather large when the roses died along with the greenfly. The company has decided that only one group will have a storage depot in any given city, and that each group will have a single telephone ordering system with a single toll-free number (which doesn't call a fixed location: it calls the depot belonging to the group that is nearest to the calling phone's area code). The CFO's Personal Assistant has designed thrown together a set of tables to support this, part of which is the Sales_Depot table, which at some point in time holds the following data (amongst others: neither all columns nor all rows are shown).

Sales_Depot Table
City Group Address Order_phone
Toytown Insects 40 Tree Rd 800765
Bigtown Rodents 3 Main St 800210
Oldtown Weeds The Orchard 800284
Newtown Insects 1 New Way 800765
Anytown Rodents 4 Old Rd 800210
Erehwon Weeds 2 South St 800284

That all looks OK, until one day the phone system is upgraded to use 7 digit numbers instead of 6 digits, and everyone has to change. The company headquarters "carefully" checks that all the new numbers work before updating the database and telling the depots to advise their customers of the new numbers. But "carefully" was't carefully enough. When someone is calling out the new numbers from a list for someone else to type them in, the Rodents group number for Bigtown gets misheard, with a "01" instead of a "10" somewhere in it. The testing is done only using the data for the first city (in alphabetical order) in each group, so for the Rodents group Anytown is tested but Bigtown isn't and the error isn't spotted. Here is how that part of the table ends up, with the incorrect data highlighted in yellow:

Sales_depot Table (new phone numbers)
City Group Address Order_Phone
Toytown Insects 40 Tree Rd 8007650
Bigtown Rodents 3 Main St 8002010
Oldtown Weeds The Orchard 8002840
Newtown Insects 1 New Way 8007650
Anytown Rodents 4 Old Rd 8002100
Erehwon Weeds 2 South St 8002840

Therefore, 8002010 is the number sent out to customers of the Rodents group in Bigtown in a notice advising that the new number is available, and that some time after that, the old number will stop working. Of course all the customers carry on using the old number, until it stops working; and then they have a new number that doesn't work as expected (it is the number of a pet shop specialising in white mice). This is far from good for sales, for customer relations, or for the company's reputation as a careful and competent supplier.

The business rule states that a group has only one order phone, so the order phone number can be deduced from the Group attribute of the row, but here we have two rows with the same group but different order phones - the business rule has been broken. This can only happen because the order phone depends directly on the group according to the business rule, but the group attribute is not part of any candidate key so that means that the table is not in 3rd normal form. The two rows for the Rodents group have different phone numbers, which violates the business rule but does not violate any key constraints. The same order phone information has to be recorded more than once, in however many rows refer to the group that uses that phone. This failure to achieve 3NF means that the datum can end up being different in two places, despite the business rule saying it must not be. This is unwanted and dangerous redundancy, and permits inconsistencies that can have undesirable consequences.

The 3NF Version

To get this part of the schema into 3NF, it is necessary to move the data that is dependent on something other than the key into a separate table where what it is dependent on is the key. Removing that column from the Sales-Depot table leaves this (before the phone numbers update)

Sales_Depot Table
City Group Address
Toytown Insects 30 Tree Rd
Bigtown Rodents 3 Main St
Oldtown Weeds The Orchard
Newtown Insects 1 New Way
Anytown Rodents 4 Old Rd
Erehwon Weeds 2 South St

A new table, the Sales-Group table is needed to hold the Order_phone information; this looks like this

Before phone system upgrade
Group Order_phone
Insects 800765
Rodents 800210
Weeds 800284

After phone system upgrade
Group Order_phone
Insects 8007650
Rodents 8002100
Weeds 8002840


Now when the phone number update takes place, the order phone number for the Rodents sales group is only in one place in the database, so inconsistencies like the one described above can no longer occur. It is either right or wrong, and because it is in only one place it can't be right in one place but wrong in another.

Of course, errors can be avoided by insisting that things are updated only using stored procedures that update this column for every row containing the affected group but that means winning the fight to ban ad-hoc updates, which is often unwinnable. Perhaps they can be avoided by having a trigger that updates all the remaining rows that contain this number whenever one of them is updated, but that requires you to make sure RECURSIVE TRIGGERS is OFF, which may not be acceptable. Of course these approaches only work when there not too many rows. If a large number of rows are affected, there may be a performance issue as well. Besides, it is usually a bad idea to write extra code to fix problems that wouldn't exist if you chose the right table structures.

Another thing that isn't made clear by a small scale example like this one is that going to 3NF often makes a significant reduction in the size of the data, because it reduces data redundancy. With real world data, enforcing 3NF can sometimes deliver extra performance because the IO volume is vastly reduced and, when store is in short supply, it gives even more extra performance because working sets are reduced.

Source : http://www.sqlservercentral.com/articles/Normalization/74437/

Wednesday, July 27, 2011

Surah Yunus [Jonah] Ayat 96-97

Ayat 96. Truly! Those, against whom the Word (Wrath) of your Lord has been justified, will not believe.

Ayat 97. Even if every sign should come to them, - until they see the painful torment.


Hazrat Abu Bakr As-Siddiq (May Allah be pleased with him) reported: "O you people! You recite this Verse: `O you who believe! Take care of your own selves. If you follow the (right) guidance [and enjoin what is right (Islamic Monotheism and all that Islam orders one to do) and forbid what is wrong (polytheism, disbelief and all that Islam has forbidden)] no hurt can come to you from those who are in error.' (5:105) But I have heard Messenger of Allah [SAWW](PBUH) saying: "When people see an oppressor but do not prevent him from (doing evil), it is likely that Allah will punish them all.''

[Abu Dawud Hadith # 4338 and At-Tirmidhi Hadith # 2168].


Lesson : as mentioned above in Surah Yunus Ayat 97. "Even if every sign should come to them, - until they see the painful torment."

This Hadith brings out the following points: What Abu Bakr As-Siddiq (May Allah be pleased with him) had stated was that the Ayah mentioned in this Hadith was generally understood to mean that if a person is on the Straight Path then it is not necessary for him to preach virtue and forbid vice because in that case he neither has the fear of being harmed by the disobedience and sins of others nor is he answerable to Allah for their misdeeds. Thus, what it implies is that Muslim Ummah is not responsible for the enforcement of Islamic Shari`ah in the world and all it is concerned with is its own improvement only. By narrating this Hadith, Abu Bakr As-Siddiq (May Allah be pleased with him) refuted this interpretation of the Ayah and made it clear that it is the duty of the Ummah and each of its member to make best possible efforts to check the vices prevalent in the society. So much so that if a Muslim has the ability to check the vices by means of force or speech but he does not do it, his negligence in this regard will become a cause of Divine wrath and punishment. The true significance of this ayah is that if, in spite of the best efforts on the part of the pious ones there is no improvement, and people do not abstain from sins and disobedience of Allah, then the pious persons will not be held responsible for what the defaulters do, nor would they suffer on this account. This interpretation of the ¢yah removes the conflict that seems to exist between the meaning of the Ayah and the Hadith.

Surah Yunus [Jonah] Ayat 96-97

Ayat 96. Truly! Those, against whom the Word (Wrath) of your Lord has been justified, will not believe.

Ayat 97. Even if every sign should come to them, - until they see the painful torment.


Hazrat Abu Bakr As-Siddiq (May Allah be pleased with him) reported: "O you people! You recite this Verse: `O you who believe! Take care of your own selves. If you follow the (right) guidance [and enjoin what is right (Islamic Monotheism and all that Islam orders one to do) and forbid what is wrong (polytheism, disbelief and all that Islam has forbidden)] no hurt can come to you from those who are in error.' (5:105) But I have heard Messenger of Allah [SAWW](PBUH) saying: "When people see an oppressor but do not prevent him from (doing evil), it is likely that Allah will punish them all.''

[Abu Dawud Hadith # 4338 and At-Tirmidhi Hadith # 2168].


Lesson : as mentioned above in Surah Yunus Ayat 97. "Even if every sign should come to them, - until they see the painful torment."

This Hadith brings out the following points: What Abu Bakr As-Siddiq (May Allah be pleased with him) had stated was that the Ayah mentioned in this Hadith was generally understood to mean that if a person is on the Straight Path then it is not necessary for him to preach virtue and forbid vice because in that case he neither has the fear of being harmed by the disobedience and sins of others nor is he answerable to Allah for their misdeeds. Thus, what it implies is that Muslim Ummah is not responsible for the enforcement of Islamic Shari`ah in the world and all it is concerned with is its own improvement only. By narrating this Hadith, Abu Bakr As-Siddiq (May Allah be pleased with him) refuted this interpretation of the Ayah and made it clear that it is the duty of the Ummah and each of its member to make best possible efforts to check the vices prevalent in the society. So much so that if a Muslim has the ability to check the vices by means of force or speech but he does not do it, his negligence in this regard will become a cause of Divine wrath and punishment. The true significance of this ayah is that if, in spite of the best efforts on the part of the pious ones there is no improvement, and people do not abstain from sins and disobedience of Allah, then the pious persons will not be held responsible for what the defaulters do, nor would they suffer on this account. This interpretation of the ¢yah removes the conflict that seems to exist between the meaning of the Ayah and the Hadith.

Tuesday, July 26, 2011

The future of IT will be reduced to three kinds of jobs

July 22, 2011, 10:04 PM PDT

Takeaway: The IT profession and the IT job market are in the midst of seismic changes that are going to shift the focus to three types of jobs.

There’s a general anxiety that has settled over much of the IT profession in recent years. It’s a stark contrast to the situation just over a decade ago. At the end of the 1990s, IT pros were the belles of the ball. The IT labor shortage regularly made headlines and IT pros were able to command excellent salaries by getting training and certification, job hopping, and, in many cases, being the only qualified candidate for a key position in a thinly-stretched job market. At the time, IT was held up as one of the professions of the future, where more and more of the best jobs would be migrating as computer-automated processes replaced manual ones.

Unfortunately, that idea of the future has disappeared, or at least morphed into something much different.

The glory days when IT pros could name their ticket evaporated when the Y2K crisis passed and then the dot com implosion happened. Suddenly, companies didn’t need as many coders on staff. Suddenly, there were a lot fewer startups buying servers and hiring sysadmins to run them.

Around the same time, there was also a general backlash against IT in corporate America. Many companies had been throwing nearly-endless amounts of money at IT projects in the belief that tech was the answer to all problems. Because IT had driven major productivity improvements during the 1990s, a lot of companies over-invested in IT and tried to take it too far too fast. As a result, there were a lot of very large, very expensive IT projects that crashed and burned.

When the recession of 2001 hit, these massively overbuilt IT departments were huge targets for budget cuts and many of them got hit hard. As the recession dragged out in 2002 and 2003, IT pros mostly told each other that they needed to ride out the storm and that things would bounce back. But, a strange thing happened. IT budgets remained flat year after year. The rebound never happened.

Fast forward to 2011. Most IT departments are a shadow of their former selves. They’ve drastically reduced the number of tech support professionals, or outsourced the help desk entirely. They have a lot fewer administrators running around to manage the network and the servers, or they’ve outsourced much of the data center altogether. These were the jobs that were at the center of the IT pro boom in 1999. Today, they haven’t totally disappeared, but there certainly isn’t a shortage of available workers or a high demand for those skill sets.

That’s because the IT environment has changed dramatically. More and more of traditional software has moved to the web, or at least to internal servers and served through a web browser. Many technophobic Baby Boomers have left the workforce and been replaced by Millennials who not only don’t need as much tech support, but often want to choose their own equipment and view the IT department as an obstacle to productivity. In other words, today’s users don’t need as much help as they used to. Cynical IT pros will argue this until they are blue in the face, but it’s true. Most workers have now been using technology for a decade or more and have become more proficient than they were a decade ago. Plus, the software itself has gotten better. It’s still horribly imperfect, but it’s better.

So where does that leave today’s IT professionals? Where will the IT jobs of the future be?

1. Consultants

Let’s face it, all but the largest enterprises would prefer to not to have any IT professionals on staff, or at least as few as possible. It’s nothing personal against geeks, it’s just that IT pros are expensive and when IT departments get too big and centralized they tend to become experts at saying, “No.” They block more progress than they enable. As a result, we’re going to see most of traditional IT administration and support functions outsourced to third-party consultants. This includes a wide range from huge multi-national consultancies to the one person consultancy who serves as the rented IT department for local SMBs. I’m also lumping in companies like IBM, HP, Amazon AWS, and Rackspace, who will rent out both data center capacity and IT professionals to help deploy, manage, and troubleshoot solutions. Many of the IT administrators and support professionals who currently work directly for corporations will transition to working for big vendors or consultancies in the future as companies switch to purchasing IT services on an as-needed basis in order to lower costs, get a higher level of expertise, and get 24/7/365 coverage.

2. Project managers

Most of the IT workers that survive and remain as employees in traditional companies will be project managers. They will not be part of a centralized IT department, but will be spread out in the various business units and departments. They will be business analysts who will help the company leaders and managers make good technology decisions. They will gather business requirements and communicate with stakeholders about the technology solutions they need, and will also be proactive in looking for new technologies that can transform the business. These project managers will also serve as the company’s point of contact with technology vendors and consultants. If you look closely, you can already see a lot of current IT managers morphing in this direction.

3. Developers

By far, the area where the largest number of IT jobs is going to move is into developer, programmer, and coder jobs. While IT used to be about managing and deploying hardware and software, it’s going to increasingly be about web-based applications that will be expected to work smoothly, be self-evident, and require very little training or intervention from tech support. The other piece of the pie will be mobile applications — both native apps and mobile web apps. As I wrote in my article, We’re entering the decade of the developer, the current changes in IT are “shifting more of the power in the tech industry away from those who deploy and support apps to those who build them.” This trend is already underway and it’s only going to accelerate over the next decade.


source: http://www.techrepublic.com/blog/hiner/the-future-of-it-will-be-reduced-to-three-kinds-of-jobs/8717?tag=nl.e101

Sunday, July 17, 2011

Second Normal Form

Second Normal Form
By Tom Thomson, 14/07/2011

Second normal form (2NF) requires first normal form and one extra condition: if X is a set of prime attributes and A is a non-prime attribute and the value of A is determined by the values of X then X contains the whole of at least one candidate key. An attribute is prime if it is part of any candidate key, and non-prime otherwise. A Candidate Key is a set of columns that could reasonably be used as the Primary Key. The Primary Key is a set of columns which don't permit NULLs and which is guaranteed not to have the same set of values in two different rows, and for which no subset has these properties. This ensures that certain kinds of incorrectness can't creep into the database through erroneous inserts or updates though it cannot, of course, prevent all erroneous inserts and updates.

The extra condition for 2NF is sometimes informally expressed as saying that the value of every non-key column depends on the whole of the key. Saying that each column depends on the whole key means that the real business entities which the database models are such that the non-key attributes depend on all the key attributes, not on just some of them. This is an example of a real-world business rule being modelled in the structure of the database in order to ensure that certain errors can't occur, which is what all of 2nd, 3rd, 4th and 5th normal forms are intended to do.

A table not in 2NF

To see the kind of errors that are prevented by the use of 2NF we have to look at what can go wrong when a table is not in 2NF, which means we have to look at what kind of business rule can be violated if such a table is used. We will state a business rule and show how an update can cause a table not in 2NF to violate this rule, and then show how to change the table structure so that the rule cannot be violated.

We will use a Current_Assignment table, a table which shows for each employee working on a project what proportion of his time he spends on that project. This allows that time to be charged to the project. As charges are made in money terms, we need to know what the employee costs (salary and overhead). In this company salary is determined by the Pay Grade of the employee and his time in that grade (measured in years), and overheads are charged at 100% of salary. It is of course an absolute rule of the business that an employee can have, at any given time, only one Pay Grade and only one time in grade, and that every project that uses an employee in a given pay period sees the same charging rate for that employee.

The DBA at this company (actually the CFO's Personal Assistant, whose qualification for the DBA role is that once, a quarter of a century ago, she wrote a dbase II query) decides to put all the data needed into the Current_Assignment table, as this seems the simplest solution. She scripts the table like this:

CREATE TABLE Current_Assignment (
EmployeeID
int NOT Null,
ProjectID
int NOT Null,
TimeProportion
NOT Null, -- percent of employee's time allocated to project
PayGrade
int NOT Null,
YearsInGrade
int NOT Null,
CONSTRAINT PK_Current_Assignment Primary Key Clustered
(ProjectID, EmployeeID)
)

Since an employee can be working on several projects, and each project can be using several employees, (EmployeeID ,ProjectID) is the only candidate key; so it has been assigned as the primary key.

Here the PayGrade and TimeInGrade columns are dependent on only part of the key (the EmployeeID); they are not dependent on all of it, because they have to be the same for a given employee whichever project he works on: so the table is not in 2NF. Let's see how a problem can arise as a result.

At some point in time, part of the contents of the table are as follows:

Current_Assignment Table
EmployeeID ProjectID TimeProportion PayGrade YearsInGrade
Bill Hacker New MouseTrap 50 15 3
Buck Bossman New MouseTrap 40 21 1
Penny Drudge New MouseTrap 100 12 2
Bill Hacker Better Flypaper 50 15 3
Buck Bossman Better Flypaper 50 21 1
David Drudge Better Flypaper 100 13 1
Charlie Fixit Better Flypaper 10 12 3
Buck Bossman Roach Eliminator 10 21 1
Charlie Fixit Roach Eliminator 90 12 3
Joe Bottom Roach Eliminator 100 9 12

Employee Buck Bossman (the group manager in charge of the projects and people shown) decides to give employee Charlie Fixit a pay raise, and updates the table so that the row with primary key (Roach Eliminator,Charlie Fixit) has PayGrade 13 and YearsInGrade 0. This happens because Buck associates Charlie with the Roach Eliminator Project, where he spends 90% of his time, and forgets that he also does some work on the Better Flypaper project. So now Charlie has different Pay Grade and Years in Grade in different rows of the table.

When a query is made of the updated table to discover Charlie Fixit's PayGrade and YearsInGrade, there are two rows with two different values, and it is a toss-up which pair of values will be delivered as the result. That's rather unfortunate if this is to be used to look up an actual salary in a table mapping grade and years in grade to money (Charlie will not be pleased if he gets paid at his old, lower, rate this month). When Charlie's time is charged to projects, Better Flypaper and Roach Eliminator will see different charge rates for the same person. The update has introduced a serious error into the database, violating a simple business rule.

Fixing the problem

If the schema is modified so that the table is in 2NF, this error can no longer occur. To effect this, the information which is dependent on only part of the key must be split out into a separate table, where that part key is the whole key. The new table definition scripts are

CREATE TABLE PayGrade_Detail (
EmployeeID
int NOT Null Primary Key Clustered,
PayGrade
int NOT Null,
TimeInGrade
int NOT Null,
)
CREATE TABLE Current_Assignment (
EmployeeID
int NOT Null REFERENCES PayGrade_Details (EmployeeID),
ProjectID
int NOT Null,
TimeProportion
NOT Null, -- percent of employee's time allocated to project
CONSTRAINT PK_Current_Assignment Primary Key Clustered
(ProjectID, EmployeeID)
)

The content corresponding to that shown above will now be

Current_Assignment
EmployeeID ProjectID TimeProportion
Bill Hacker New MouseTrap 50
Buck Bossman New MouseTrap 40
Penny Drudge New MouseTrap 100
Bill Hacker Better Flypaper 50
Buck Bossman Better Flypaper 50
David Drudge Better Flypaper 100
Charlie Fixit Better Flypaper 20
Buck Bossman Roach Eliminator 10
Charlie Fixit Roach Eliminator 80
Joe Bottom Roach Eliminator 100


PayGrade_Details Table
EmployeeID PayGrade YearsInGrade
Bill Hacker 15 3
Buck Bossman 21 1
Penny Drudge 12 2
David Drudge 13 1
Charlie Fixit 12 3
Joe Bottom 9 12


PayGrade and YearsInGrade are now in a table where the only candidate key is (EmployeeID); and both are dependent on that key, not just on some part of it (in this case that's obvious: the key doesn't have multiple parts). So with this pair of tables instead of the original single table, all tables are in 2NF.

As a result, the update to change the PayGrade for employee Charlie now affects only one row, so it can't introduce contradictory information; the information is in one place only, so it can't have different values in different places. Changing to 2NF has eliminated some error-prone redundancy, and it is no longer possible to make the data fail to conform to the business rule that could be broken before.

Of course in this example there may well be an employee table into which the PayGrade information can be moved, instead of creating a new table; and people often think that 2NF is a lot of fuss about nothing when given examples like this, often because "no experienced DBA would make an error like that" and just as often because "you can fix it trivially by writing a single update statement which updates all the rows that should be affected"; the first is just plain wrong: people do make this mistake, and the second is not so easy if the app is to display a view on a screen and update is by overtype. If the view is a not-2NF relational view built by joining the two 2NF tables, an update through the view does indeed solve the problem, but that requires the base tables to be in 2NF.

Multiple candidate keys

The example above is simple partly because there is only one Candidate Key in the table. When checking for violations of 2NF it is essential to look at all Candidate Keys, not just the Primary Key, because the same error-prone redundancy can arise there too. If there is a UNIQUE constraint or index on a set of columns that don't permit NULL values that indicates that there is a Candidate Key other than the Primary Key. Of course the DBA may forget to specify a UNIQUE constraint when some column set is actually unique (and free of NULLs) according to the business rules (this is bad schema design - such constraints should always be specified to let the database prevent what errors it can, and having appropriate constraints can ensure that database operations can't cause this column set to contain duplicates) but that column set is still a Candidate Key because the business rules say it is, even though the constraint wasn't specified in the schema.