Friday, 24 March 2017

Differentiate DBMS with File System

Some of my readers asked me a question - What is the difference between DBMS and File System and which one to use where.

This topic has been discussed within forums millions of times, still there are many newbies who get stuck with the terms. They often ask why to invest time/energy over learning something new while we already can store data using our favourite programming language. Some Java Developers also think that Serialization is there to save us, why to again learn DBMS for our development activities ?

To answer this question in the simplest way possible is, DBMS is a wrapper over File Systems. This wrapper provides functionalities apart from the storing, retrieving and manipulating data.

Now keeping this mind, I will try to find what DBMS and File System is -

File System: File system is a way Operating System uses to organise, store and retrieve data with the help of logical grouping. Using programming languages, we ask Operating System to provide the control of the file system, so that we can use it as per our purpose.

Database: Database is an organised collection of data. Essentially it means that anything can be used in context of Database. In reality, a database is Data with its Meta-Data.

DBMS: DBMS is an application that interacts with database and aids us with different data related solutions.

That's one part of it, very simple. Isn't it ?

No, actually not. A DBMS is actually an application which deal with multiple sub systems. Now let us look what are the things DBMS comprise of -

  1. Bare minimum is a backing file system where all data resides.
  2. A network system which connects to different clients who essentially requests for CRUD Operation.
  3. A Query Language processing system. This subsystem takes care of the standard communication system between any programming language and the database itself.
  4. Data Management System. This thing actually deals with the data and processes it accordingly.
Major four components are listed above. With these things a database is functional. Now, apart from these basic features DBMS also provides more features like security, concurrency control, reporting, administration etc.

Actually within a database server, many components run on its own space and they all work together to perform operations as requested by a client.

Now, for a moment, let's think of a situation where no database exists. So, how would that system look like ?

You have one application which has to store some data and also retrieve from it for future uses or you want to share the data across multiple machines. How would you do that ?

Well, let's take this part by part.
  • Write an application programm which deals with storing the data.
  • Write another application programm which reads the data.
  • As you have the option to read, write and modify the data while on the go, you need another application to keep track of read and write. Also, if required, this can block one or more clients for read/write operation.
  • You need another application which keep track of memory and disk usage. Other than this module, you are supposed to end up with memory over flow or heavy disk usage or both.
  • You need another application to deal with sharing the data across clients.
Now, think of the complexity of a very basic database. Actually these are the bare minimum features a DBMS must have.

Now, you have completed the basics of a DBMS. Now, let's dig deeper and think of some more features,
  • You don't want all the data to be shared to all the clients. So, you need authentication and authorization.
  • When you are dealing with Data Security, you must not ignore data integrity.
  • When Data Security and Data Integrity comes into picture, you need to think of Concurrency.
  • Once all these are done, you need to think of reporting the usage of DBMS.
Once you are done with all the above sections, you must integrate all of them into a single hood to operate in a better way.

Now, you have implemented all of them and your system works fine. Now, the biggest nightmare....

You ended up creating a very large file while appending your data in it. Now, your system looks up the file system, picks the file and now reads records one by one sequentially. To find a row, it takes almost 2-3 minutes. Or may be your system hangs while querying a partiular row. Or the worst, it crashes. How will you handle this ?
  • You decided to create multiple chunks of data in multple files. Again add another application which takes care of data storage and retrieval using chunks.
What next ???

Phew....

I cannot think of this anymore....

But a DBMS handles many other features than mentioned above.

But you might think that, OK, I don't need all these, I have a smaller use case and really don't want to have another segment to deal with DBMS.

For a smaller user case, handling the data with File System or Network Transfer is a good choice. There is no single silver bullet for this discussion about how much data you can deal in file system/network transfer, but as a general suggestion if you answer the following questions, you can find which one perfectly suits your need - File System/Main Memory/Network Transfer/DBMS or a combination of one or more -
  1. How much data to handle ?
  2. How many applications use the same data ?
  3. How applications interchange the data ?
  4. How much probablity of manipulating the data ?
  5. How secured data it is ?
  6. What if data mismatch happens to occur in the system ?
Generally, answering these questions can answer if you really need a DBMS or not.

No comments:

Post a Comment