Where is the data stored in the HDFS and how? Where is meta-data about the file
ID: 3806528 • Letter: W
Question
Where is the data stored in the HDFS and how? Where is meta-data about the file system stored in the HDFS? A single NameNode tracks where data is housed in the cluster of servers, known as DataNodes. Data is stored in data blocks on the DataNodes. HDFS replicates those data blocks, usually 128MB in size, and distributes them so they are replicated within multiple nodes across the cluster How does HDFS realize "reliability"? (Mention two reliability features of HDFS.) What is a heartbeat BlockReport with respect to HDFS architecture? What is the typical block size in HDFS? Why is fault-tolerance indispensable (/important) in a HDFS?Explanation / Answer
B.
There are many features associated with HDFS which makes it suitable for storing large sets of data.
Fault Tolerance and Reliability : first and foremost the data is replicated and its is
stored in multiple nodes in a hadoop cluster to achieve Fault Tolerance and Reliability
High Throughput: The data is transfered by multiple node in parallel which allows us
to have high throughput.That means parallell processing of data takes place.
Data Integrity: internally HDFS checks the data whether the stored data is correct or not.
C)
heartbeat:
Several things can cause loss of connectivity between name and data nodes.
Therefore, each data node sends periodic heartbeat messages to its name node, so the
latter can detect loss of connectivity if it stops receiving them.
The name node marks as a dead datanodes and they will not respond to the
heartbeats and stop sending furthur requests to them.Datastored on a deadnode is no
longer avaliable to an HDFS client from that node means totally removed from the
system.
Blockreport:
HDFS data blocks might not always be placed uniformly across data nodes meaning that
the used space for one or more data nodes can be underutilized.Therefore HDFS supports
rebalancing datablocks using various models.if the freespace on a datanode falls low
then one model might move datablocks from one datanode to another automatically.
another model may dynamically creates additional replicas and balance the other
datablocks if sudden increase in demand for a given file occurs.
D)
block size in HDFS
A typical block size that you’d see in a file system under Linux is 4KB, whereas a typical block size in Hadoop is 128MB.
Fault tolerance importance in HDFS:
The ability of a system to work correctly and not lose any data even if some components of that system were failed.It is very difficult to achieve 100 percent tolerance.The goal of fault tolerance is to plan for all common failures.It is important to eliminate single points of failure (SPOF) while managing fault tolerence.
A.
Data is stored in data blocks on the DataNodes. HDFS replicates those data blocks,
HDFS instances are divided into two components:
the namenode, in this mode the actually maintains metadata to track the placement of physical data across the Hadoop instance .
datanodes, which actually store the data.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.