RDBMS = DBMS + REFERENTIAL INTEGRITY
For example, in the above Figure 1.1, every person should have an
Address. This is a referential integrity between
Address. If we break this referential integrity in DBMS and files, it will not complain, but RDBMS will not allow you to save this data if you have defined the relation integrity between person and addresses. These relations are defined by using "Foreign Keys" in any RDBMS.
Many DBMS companies claimed that their DBMS product was RDBMS compliant, but according to industry rules and regulations, if the DBMS fulfills the twelve CODD rules, it's truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc.) fulfill all the twelve CODD rules and are considered truly as RDBMS.
Note: One of the biggest debates is whether Microsoft Access is an RDBMS? We will be answering this question in later section.
Twist: Does SQL SERVER support all the twelve CODD rules?
Note: This question can only be asked on two conditions when the interviewer is expecting you to be at a DBA job or you are complete fresher, yes and not to mention the last one he treats CODD rules as a religion. We will try to answer this question from the perspective of SQL SERVER.
In 1969, Dr. E. F. Codd laid down 12 rules, which a DBMS should adhere to in order to get the logo of a true RDBMS.
Rule 1: Information Rule
"All information in a relational database is represented explicitly at the logical level and in exactly one way - by values in tables."
In SQL SERVER, all data exists in tables and are accessed only by querying the tables.
Rule 2: Guaranteed Access Rule
"Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name."
In flat files, we have to parse and know the exact location of field values. But if a DBMS is truly an RDBMS, you can access the value by specifying the table name, field name, for instance
Customers.Fields ['Customer Name'].
SQL SERVER also satisfies this rule. In ADO.NET we can access field information using table name and field names.
Rule 3: Systematic Treatment of Null Values
"Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
In SQL SERVER, if there is no data existing,
NULL values are assigned to it. Note
NULL values in SQL SERVER do not represent spaces, blanks or a zero value; it is a distinct representation of missing information and thus satisfies rule 3 of CODD.
Rule 4: Dynamic On-line Catalog Based on the Relational Model
"The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data."
The Data Dictionary is held within the RDBMS. Thus, there is no need for off-line volumes to tell you the structure of the database.
Rule 5: Comprehensive Data Sub-language Rule
"A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all the following items:
SQL SERVER uses SQL to query and manipulate data, which has a well-defined syntax and is being accepted as an international standard for RDBMS.
Note: According to this rule, CODD has only mentioned that some language should be present to support it, but not necessary that it should be SQL. Before the 80's, different's database vendors were providing their own flavor of syntax until in 1980, ANSI-SQL came in to standardize this variation between vendors. As ANSI-SQL is quite limited, every vendor including Microsoft introduced their additional SQL syntax in addition to the support of ANSI-SQL. You can see SQL syntax varying from vendor to vendor.
Rule 6: View-updating Rule
"All views that are theoretically updatable are also updatable by the system."
In SQL SERVER, not only views can be updated by the user, but also by SQL SERVER itself.
Rule 7: High-level Insert, Update and Delete
"The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data, but also to the insertion, update and deletion of data."
SQL SERVER allows you to update views that in turn affect the base tables.
Rule 8: Physical Data Independence
"Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods."
Any application program (C#, VB.NET, VB6, VC++ etc) does not need to be aware of where the SQL SERVER is physically stored or what type of protocol it is using, the database connection string encapsulates everything.
Rule 9: Logical Data Independence
"Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit un-impairment are made to the base tables."
Application programs written in C# or VB.NET do not need to know about any structure changes in SQL SERVER database. Example: adding of new field etc.
Rule 10: Integrity Independence
"Integrity constraints specific to a particular relational database must be definable in the relational data sub-language and storable in the catalog, not in the application programs."
In SQL SERVER, you can specify data types (
Boolean etc.) which put in data type checks in SQL SERVER rather than through application programs.
Rule 11: Distribution Independence
"A relational DBMS has distribution independence."
SQL SERVER can spread across more than one physical computer and across several networks; but from application programs, it has not a big difference but just specifying the SQL SERVER name and the computer on which it is located.
Rule 12: Non-subversion Rule
"If a relational system has a low-level (single-record-at-a-time) language, that low level cannot be used to subvert or bypass the integrity Rules and constraints expressed in the higher level relational language (multiple-records-at-a-time)."
In SQL SERVER whatever integrity rules are applied on every record are also applicable when you process a group of records using application program in any other language (example: C#, VB.NET, J# etc.).
Readers can see from the above explanation that SQL SERVER satisfies all the CODD rules, some database gurus consider SQL SERVER as not truly being an RDBMS, but that's a matter of debate.
Access fulfills all rules of CODD, so from this point of view, yes it's truly an RDBMS. However, many people can contradict it as a large community of Microsoft professionals think that Access is not an RDBMS.
As mentioned before, Access fulfills all the CODD rules and behaves as a true RDBMS. But there's a huge difference from an architecture perspective, due to which many developers prefer to use SQL SERVER as the major database rather than Access. Following is the list of architecture differences between them:
In Main Frame architecture, all the processing happens on central host server. User interacts through a dumb terminal that only sends keystrokes and information to the host. All the main processing happens on the central host server. So the advantage in such type of architecture is that you need least configuration clients. But the disadvantage is that you need a robust central host server like Main Frames.
In File sharing architecture, which is followed by Access database, all the data is sent to the client terminal and then processed. For instance, if you want to see customers who stay in India, in File Sharing architecture all customer records will be sent to the client PC regardless whether the customer belongs to India or not. On the client PC customer records from India are sorted/filtered out and displayed, in short all processing logic happens on the client PC. Therefore, in this architecture, the client PC should have heavy configuration and it increases network traffic as a lot of data is sent to the client PC. However, the advantage of this architecture is that your server can be of a low configuration.
Figure 1.2: File Server Architecture of Access
In client server architecture, the above limitation of the file server architecture is removed. In client server architecture, you have two entities, client and the database server. File server is now replaced by database server. Database server takes up the load of processing any database related activity and the client does any validation aspect of database. As the work is distributed between the entities it increases scalability and reliability. Second, the network traffic also comes down as compared to file server. For example if you are requesting customers from India, database server will sort/ filter and send only Indian customer details to the client, thus bringing down the network traffic tremendously. SQL SERVER follows the client-server architecture.
Figure 1.3: Client Server Architecture of SQL SERVER
The second issue comes in terms of reliability. In Access, the client directly interacts with the Access file, in case there is some problem in the middle of a transaction, there are chances that an Access file can get corrupt. But in SQL SERVER, the engine sits in between the client and the database, so in case of any problems in the middle of a transaction, it can revert back to its original state.
Note: SQL SERVER maintains a transaction log by which you can revert back to your original state in case of any crash.
When your application has to cater to a huge load demand, highly transactional environment and high concurrency, then its better to go for SQL SERVER or MSDE.
But when it comes to cost and support, Access stands better than SQL SERVER. In case of SQL SERVER, you have to pay for per client license, but Access runtime is free.
Summarizing: SQL SERVER gains points in terms of network traffic, reliability and scalability whereas Access gains points in terms of cost factor.
MSDE is a royalty free, redistributable and cut short version of the giant SQL SERVER database. It is primarily provided as a low cost option for developers who need a database server, which can easily be shipped and installed. It can serve as a good alternative for Microsoft Access database as it overcomes quite a few problems which Access has.
Below is a complete list, which can give you a good idea of the differences:
Size of database: Microsoft Access and MSDE have a limitation of 2GB while SQL SERVER has 1,048,516 TB1.
Performance degrades in MSDE 2000 when maximum number of concurrent operations goes above 8 or is equal to 8. It does not mean that you cannot have more than eight concurrent operations but the performance degrades. Eight-connection performance degradation is implemented by using SQL SERVER 2000 workload governor (we will be looking into more detail of how it works). As compared to SQL SERVER 2000, you can have 32,767 concurrent connections.
MSDE does not provide OLAP and Data warehousing capabilities.
MSDE does not have support facility for SQL mail.
MSDE 2000 does not have GUI administrative tool such as enterprise manager, Query analyzer or Profiler. But there are roundabout ways by which you can manage MSDE 2000:
Old command line utility OSQL.EXE
VS.NET IDE Server Explorer: Inside VS.NET IDE, you have a functionality which can give you a nice GUI administrative tool to manage IDE.
SQL SERVER WEB Data administrator installs a web based GUI which you can use to manage your database.
For any details refer here.
SQL-DMO objects can be used to build your custom UI
There are many third party tools, which provide administrative capability GUI, which is out of scope of the book as it is only meant for interview questions.
MSDE does not support Full text search.
Summarizing: There are two major differences: The first is the size limitation (2 GB) of the database and second is the concurrent connections (eight concurrent connections) which are limited by using the workload governor. During an interview, this answer will suffice if the interviewer is really testing your knowledge.
Twist: What is the difference between SQL SERVER Express 2005 and MSDE 2000?
Note: Normally comparison is when the product is migrating from one version to other version. When SQL SERVER 7.0 was migrating to SQL 2000, asking differences was one of the favorite questions.
SQL SERVER Express edition is a scaled down version of SQL SERVER 2005 and the next evolution of MSDE.
Listed below are some major differences between them:
MSDE maximum database size is 2GB while SQL SERVER Express has around 4GB.
In terms of programming language support MSDE has only TSQL, but SQLSERVER Express has TSQL and .NET. In SQL SERVER Express 2005, you can write your stored procedures using .NET.
SQL SERVER Express does not have connection limitation, which MSDE had and was controlled through the workload governor.
There was no XCOPY support for MSDE, SQL SERVER Express has it.
DTS is not present in SQL SERVER express while MSDE has it.
SQL SERVER Express has reporting services while MSDE does not.
SQL SERVER Express has native XML support and MSDE does not.
Note: Native XML support means now in SQL SERVER 2005:
You can create a field with data type
You can provide SCHEMA to the SQL SERVER fields with
XML data type.
You can use new XML manipulation techniques like
XQUERY also called as
There is a complete chapter on SQL SERVER XML Support, so till then this will suffice.
Summarizing: The major difference is the database size (2 GB and 4 GB), support of .NET support in stored procedures and native support for XML. This much can convince the interviewer that you are clear about the differences.
Workload governor limits the performance of SQL SERVER Desktop engine (MSDE) if the SQL engine receives more load than what is meant for MSDE. MSDE was always meant for trial purpose and non-critical projects. Microsoft always wanted companies to buy their full blow version of SQL SERVER, so in order that they can put limitation on MSDE performance and number of connections, they introduced Workload governor.
Workload governor sits between the client and the database engine and counts the number of connections per database instance. If Workload governor finds that the number of connections exceeds eight connections, it starts stalling the connections and slowing down the database engine.
Note: It does not limit the number of connections but makes the connection request go slow. By default 32,767 connections are allowed both for SQL SERVER and MSDE. But it just makes the database engine go slow above eight connections.
Twist: What is the difference between Yukon and SQL SERVER 2000?
Note: This question will be one of the favorites during SQL SERVER interviews. I have marked the points which should be mentioned by developers as PG and DBA for Database Administrator.
Following are some major differences between the two versions:
(PG) The most significant change is the .NET integration with SQL SERVER 2005. Stored procedures, user-defined functions, triggers, aggregates, and user-defined types can now be written using your own favorite .NET language (VB.NET, C#, J# etc.). This support was not there in SQL SERVER 2000 where the only language was T-SQL. In SQL 2005, you have support for two languages T-SQL and .NET.
(PG) SQL SERVER 2005 has reporting services for reports which is a newly added feature and does not exist for SQL SERVER 2000. It was a separate installation for SQL Server 2000.
(PG) SQL SERVER 2005 has introduced two new data types
varbinary (max) and
XML. If you remember in SQL SERVER 2000, we had image and text data types. Problem with
text data types is that they assign the same amount of storage irrespective of what the actual data size is. This problem is solved using
varbinary (max) which acts depending on amount of data. One more new data type is included
XML which enables you to store XML documents and does schema verification. In SQL SERVER 2000, developers used
text data type and all validation had to be done programmatically.
(PG) SQL SERVER 2005 can now process direct incoming HTTP request without IIS Web server. In addition, stored procedure invocation is enabled using the SOAP protocol.
(PG) Asynchronous mechanism is introduced using server events. In Server event model the server posts an event to the SQL Broker service, later the client can come and retrieve the status by querying the broker.
For huge databases, SQLSERVER has provided a cool feature called Ã¢â‚¬Å“Data partitioningÃ¢â‚¬Â. In data partitioning, you break a single database object such as a table or an index into multiple pieces. But for the client application accessing the single database object, Ã¢â‚¬Å“partitioningÃ¢â‚¬Â is transparent.
In SQL SERVER 2000, if you rebuilt clustered indexes even the non-clustered indexes where rebuilt. But in SQL SERVER 2005 building the clustered indexes does not build the non-clustered indexes.
Bulk data uploading in SQL SERVER 2000 was done using BCP (Bulk copy program's) format files. Now in SQL SERVER 2005 bulk, data uploading uses XML file format.
In SQL SERVER 2000 there were maximum 16 instances, but in 2005 you can have up to 50 instances.
SERVER 2005 has support of Ã¢â‚¬Å“Multiple Active Result SetsÃ¢â‚¬Â also called as Ã¢â‚¬Å“MARSÃ¢â‚¬Â. In previous versions of SQL SERVER 2000 in one connection, you could only have one result set. Now in one SQL connection, you can query and have multiple results set.
In previous versions of SQL SERVER 2000, system catalog was stored in the master database. In SQL SERVER 2005, it's stored in a resource database which is stored as sys object. You cannot access the sys object directly as in the older version we were accessing the master database.
This is one of the hardware benefits which SQL SERVER 2005 has over SQSERVER 2000 Ã¢â‚¬" support of hyper threading. WINDOWS 2003 supports hyper threading; SQL SERVER 2005 can take advantage of the feature unlike SQL SERVER 2000 which did not support hyper threading.
Note: Hyper threading is a technology developed by INTEL which creates two logical processors on a single physical hardware processor.
SMO will be used for SQL Server Management.
AMO (Analysis Management Objects) to manage Analysis Services servers, data sources, cubes, dimensions, measures, and data mining models. You can mapm AMO in old SQL SERVER with DSO (Decision Support Objects).
Replication is now managed by RMO (Replication Management Objects).
Note: SMO, AMO and RMO are all using .NET Framework.
SQL SERVER 2005 uses current user execution context to check rights rather than ownership link chain, which was done in SQL SERVER 2000.
Note: There is a question on this later see for execution context questions.
In previous versions of SQL SERVER the schema and the user name was same, but in current, the schema is separated from the user. Now the user owns schema.
Note: There are questions on this, refer Ã¢â‚¬Å“SchemaÃ¢â‚¬Â later.
Note: Ok below are some GUI changes.
Query analyzer is now replaced by query editor.
Business Intelligence development studio will be used to create Business intelligence solutions.
OSQL and ISQL command line utility is replaced by SQLCMD utility.
SQL SERVER Enterprise manager is now replaced by SQL SERVER Management studio.
SERVER Manager which was running in system tray is now replaced by SQL Computer manager.
Database mirror concept is supported in SQL SERVER 2005, which was not present in SQL SERVER 2000.
In SQL SERVER 2005 Indexes can be rebuilt online when the database is in actual production. If you look back in SQL SERVER 2000, you cannot do insert, update, and delete operations when you are building indexes.
(PG) Other than Serializable, Repeatable Read, Read Committed, and Read Uncommitted isolation levels, there is one more new isolation level Ã¢â‚¬Å“Snapshot Isolation levelÃ¢â‚¬Â.
Note: We will see Ã¢â‚¬Å“Snapshot Isolation levelÃ¢â‚¬Â in detail in the coming questions.
Summarizing: The major significant difference between SQL SERVER 2000 and SQL SERVER 2005 is in terms of support of .NET Integration, Snap shot isolation level, Native XML support, handling HTTP request, Web service support and Data partitioning. You do not have to really say all the above points during an interview. A sweet summary and you will rock.
E-R diagram also termed as Entity-Relationship diagram shows the relationship between various tables in the database. Example: Tables
Customer Addresses have a one to many relationship (i.e. one customer can have multiple addresses) this can be shown using the ER diagram. ER diagrams are drawn during the initial stages of a project to forecast how the database structure will shape up. Below is a screen shot of a sample ER diagram of 'Asset Management' which ships free with Access.
Figure 1.4: Asset management ER diagram.
There are three major relationship models:
Figure 1.5: One-to-One relationship ER diagram
In this many records in one table correspond to the one record in another table.
Example: Every one customer can have multiple sales. So there exist one-to-many relationships between customer and sales table.
Asset can have multiple
Asset entity has one-to-many relationship between them as the ER model shows below.
Figure 1.6: One-to-Many Relationship ER diagram
In this, one record in one table corresponds to many rows in another table and also vice-versa.
For instance: In a company, one employee can have many skills like Java , C# etc. and also one skill can belong to many employees.
Given below is a sample of many-to-many relationship. One employee can have knowledge of multiple
Technology. So in order to implement this, we have one more table
Employee Technology which is linked to the primary key of
Figure 1.7: Many-to-Many Relationship ER diagram
Note: A regular .NET programmer working on projects often stumbles on this question, which is but obvious. The bad part is sometimes the interviewer can take this as a very basic question to be answered and it can be a turning point for the interview. So let's cram it.
It is set of rules that have been established to aid in the design of tables that are meant to be connected through relationships. This set of rules is known as Normalization.
Benefits of Normalizing your database include:
Avoiding repetitive entries
Reducing required storage space
Preventing the need to restructure existing tables to accommodate new data
Increased speed and flexibility of queries, sorts, and summaries
Note: During an interview, people expect to answer a maximum of three normal forms and that's what is expected practically. Actually you can normalize database to fifth normal form. But believe this book, answering three normal forms will put you in a decent shape during an interview.
The three normal forms as follows:
First Normal Form
For a table to be in first normal form, data must be broken up into the smallest units possible. In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields.
Figure 1.8: Repeating groups example
In the above example,
city2 are repeating. In order for these tables to be in First normal form, you have to modify the table structure as follows. Also note that the Customer Name is now broken down to first name and last name (First normal form data should be broken down to the smallest unit).
Figure 1.9: Customer table normalized to first normal form
Second Normal Form
The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. In other words, each non-key field should be a fact about all the fields in the primary key.
In the above table of
city is not linked to any primary field.
Figure 1.10: Normalized customer table.
Figure 1.11: City is now shifted to a different master table.
That takes our database to a second normal form.
Third Normal Form
A non-key field should not depend on another Non-key field. The field
Total is dependent on
Unit price and
Figure 1.12: Fill third normal form
So now the
Total field is removed and is the multiplication of
Unit price * Qty.
Denormalization is the process of putting one fact in numerous places (it is vice-versa of normalization). Only one valid reason exists for denormalizing a relational design - to enhance performance. The sacrifice to performance is that you increase redundancy in a database.
Note: Whenever the interviewer is trying to go above the third normal form, there can be two reasons, ego or to fail you. Three normal forms are really enough, practically anything more than that is an overdose.
In fourth normal form, it should not contain two or more independent multi-valued facts about an entity and it should satisfy "Third Normal form".
So let us try to see what multi-valued facts are. If there are two or more many-to-many relationship in one entity and they tend to come to one place, it is termed as "multi-valued facts".
Figure 1.13: Multi-valued facts
In the above table, you can see that there are two many-to-many relationships between
Product and Ã¢â‚¬Å“
Location (or in short multi-valued facts). In order for the above example to satisfy the fourth normal form, both the many-to-many relationships should go in different tables.
Figure 1.14: Normalized to Fourth Normal form.
Note: UUUHHH if you get this question after joining the company, do ask him if he himself really uses it?
Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.
Product which can be manufactured by various
Dealers in order to sell the
Product should be registered with the
Company. So these three entities have a mutual relationship within them.
Figure 1.15: Not in Fifth Normal Form.
The above table shows some sample data. If you observe closely, a single record is created using lot of small information. For instance:
JM Associate can sell sweets under the following two conditions:
These two smaller bits of information form one record of the above given table. So in order for the above information to be Ã¢â‚¬Å“Fifth Normal FormÃ¢â‚¬Â all the smaller information should be in three different places. Below is the complete fifth normal form of the database.
Figure 1.16: Complete Fifth Normal Form
Note: There is a huge similarity between Fourth and Fifth normal form, i.e. they address the problem of 'Multi-Valued fact's.
"Fifth normal form" multi-valued facts are interlinked and "Fourth normal form" values are independent. For instance in the above two questions
Location are not linked. While in fifth form, the
Companies are completely linked.
Note: Arrrrggghhh yes there exists a sixth normal form also. But note guys you can skip this statement. Just in case you want to impress the interviewer...
If you want a relational system in conjunction with time, you use sixth normal form. At this moment SQL Server does not support it directly.
Twist: What is the relationship between Extent and Page?
Extent is a basic unit of storage to provide space for tables. Every extent has a number of data pages. As new records are inserted new data, pages are allocated. There are eight data pages in an extent. So as soon as the eight pages are consumed, it allocates a new extent with data pages.
While extent is basic unit storage from a database point of view, page is a unit of allocation within extent.
Page has three important sections:
Page header has information like timestamp, next page number, previous page number etc.
Data rows are where your actual row data is stored. For every data row, there is a row offset which points to that data row.
Figure 1.17: General view of a Extent
Pages are contained in extent. Every extent will have around eight data pages. But all the eight data pages are not created at once; they are created depending on data demand. So when a page becomes full it creates a new page, this process is called as "Page Split".
Any SQL Server database is associated with two kinds of files: *.mdf and *.ldf. *.mdf files are actual physical database files where your data is stored finally. *.ldf (LOG) files are actually data, which is recorded from the last time data was committed in the database.
Figure 1.18: MDF and LDF files.
(Q) What is Collation in SQL Server?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence, with options for specifying case-sensitivity, accent marks, kana character types, and character width.
Figure 1.19: Collation according to language
Note: Different languages will have different sort orders.
b, etc. are treated in the same way, then it is case-insensitive. A computer treats
a differently because it uses ASCII code to differentiate the input. The ASCII value of
A is 65, while
a is 97. The ASCII value of
B is 66 and
b is 98.
O are treated in the same way, then it is accent-insensitive. A computer treats
A differently because it uses ASCII code for differentiating the input. The ASCII value of
a is 97 and
A 225. The ASCII value of
o is 111 and
O is 243.
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana sensitive.
When a single-byte character (half-width) and the same character when represented as a double-byte character (full-width) are treated differently then it is width sensitive.
Yes, you can specify different collation sequence for both the entities differently.