Monday 24 December 2018

What is HDFS?

HDFS-Hadoop Distributed FileSystem is the primary storage of Hadoop. It is designed for storing very large files running on a cluster of commodity hardware. HDFS works on the principle of storage of less number of large files rather than the huge number of small files. It stores data reliably even in the case of hardware failure. It provides high throughput access to the application by accessing in parallel.

HDFS

Components of HDFS are:

  • NameNode- It works as Master in Hadoop cluster. Namenode stores meta-data i.e. number of Blocks, their replicas and other details. It manages filesystem namespace. It executes file system execution such as naming, closing, and opening files/directories.

  • DataNode- It works as Slave in Hadoop Cluster. It is responsible for storing actual data. DataNode performs read -write operation as per request for the clients in HDFS.

  • Block- HDFS stores data in Blocks. Block is the smallest unit of data that the file system stores. By default HDFS block size is 128Mb which can be configured as per the requirement.

HDFS is highly fault tolerant. Replication of data helps us to attain this feature. Default replication factor is 3, so each block is replicated three times. So, if any machine in the cluster goes down due to unfavorable conditions, then client can easily access data from other machines which contains the same copy of data block.