HDFS (Hadoop Distributed File System)
Hadoop is specially designed in UNIX. So it is Platform Dependent.
How Unix OS manage process the data ??
Existing file system is working on memory chunks. ext3, ext4 and xfs are some of File Systems currently being used.
The default memory block size is 4kb (OS level) with max 20kb and it is dynamically changes.
And the disk (hardware) processes 512 bytes at a time.Once Data received into OS chunks which holds 4 kb data and processor will process(writes) parallel to the disk with 512 bytes at a time.
HDFS is a wrapper on top of existing unix file system. Default memory chunk size on HDFS is 64 mb (Configurable). Client submits data will store in HDFS buffer then given to OS buffer from OS buffer data will be processed to disk at a rate of 512 bytes.
Data from Front End
|
HDFS (64mb - buffer)
|
OS (4kb - buffer)
|
Writes into DISK(512 bytes at a time)
So Memory won't be wasted (as many of us think if a block size is 64 mb (it's configurable) and I've a file with 32 mb the remaining 32 mb will be wasted, no it's not)
Files in HDFS are “write once”, No random writes to files are allowed
If block size is configurable then i will choose 256 mb as a block size what wrong dude ?
Disk Seek Time : Seek Time is measured defines the amount of time it takes a hard drive's read/write head to find the physical location of a piece of data on the disk.So if you take block size as 64mb or 128mb or 256mb doesn't matter as if your disk seek time is not faster.
Hadoop is specially designed in UNIX. So it is Platform Dependent.
How Unix OS manage process the data ??
Existing file system is working on memory chunks. ext3, ext4 and xfs are some of File Systems currently being used.
The default memory block size is 4kb (OS level) with max 20kb and it is dynamically changes.
And the disk (hardware) processes 512 bytes at a time.Once Data received into OS chunks which holds 4 kb data and processor will process(writes) parallel to the disk with 512 bytes at a time.
HDFS is a wrapper on top of existing unix file system. Default memory chunk size on HDFS is 64 mb (Configurable). Client submits data will store in HDFS buffer then given to OS buffer from OS buffer data will be processed to disk at a rate of 512 bytes.
Data from Front End
|
HDFS (64mb - buffer)
|
OS (4kb - buffer)
|
Writes into DISK(512 bytes at a time)
So Memory won't be wasted (as many of us think if a block size is 64 mb (it's configurable) and I've a file with 32 mb the remaining 32 mb will be wasted, no it's not)
Files in HDFS are “write once”, No random writes to files are allowed
If block size is configurable then i will choose 256 mb as a block size what wrong dude ?
Disk Seek Time : Seek Time is measured defines the amount of time it takes a hard drive's read/write head to find the physical location of a piece of data on the disk.So if you take block size as 64mb or 128mb or 256mb doesn't matter as if your disk seek time is not faster.
HDFS Daemons (Threads) are namely 3, they are
–DataNode (Slave)
–Secondary NameNode (master)
NameNode – NN (master)
–Stores all metadata
•Information about file ownership and permissions
•Information about file locations in HDFS
•Names of the individual blocks
•Locations of the blocks
–Holds metadata in memory for fast access
–Changes to the metadata are made in RAM and are also written to a log file on disk called Edit Logs
–Metadata is stored on disk and read when the NN starts up
–Collect block reports from DataNodes on block locations
–Replicate missing blocks (Under Replicated)
–Authorization and authentication
–Single Point Of Failure (SPOF)
DataNode – DN (slave)
–Actual contents of the files are stored as blocks on the slave nodes
–Each block is stored on multiple different nodes for redundancy (Block Replication)
–Communicates with the NN and periodically send block reports
–Clients access the blocks directly from data nodes for read and write through NN
Secondary NameNode – 2NN (master)
–Secondary NameNode (2NN) is not a failover NameNode
–Periodically combines a prior file system snapshot and editlog into a new snapshot
–New snapshot is transmitted back to the NameNode
Whenever client submits a file to store on HDFS the first contact will be with NameNode. Name Node already has the information about all the Data Nodes in the cluster. Based on File size file will splits into small chunks and each will be stored in different Data Nodes. Here Data Node is nothing but a commodity server which is collection of memory chunks/blocks. So NameNode will maintain the data of the block location, Data Node ip address... ie; metadata.
With this we achieved Distributed Storage (wait for Distributed Parallel processing we will cover this in our MapReduce Topic).
Now another advantage of Hadoop we learned is Fault Tolerant. To achieve this we are storing each piece of File in 3 different blocks(Block Replication) of 3 different DataNodes. If any machine failure we can still fetch it from another Data Node. (Here 3 replicas are default & its configurable).
The Data Node will copy the file into the first listed replica's memory and closes connection. That replica block will write the file into Disk and repeats the process for rest of the replica Data Nodes.
Under Replicated: If replication was not successful, Name Node will raise request to replicate remaining to reach the replication factor.
While retrieving files first it will fetch metadata information and then client will directly contact with Data Node and collects the data.
So ever wonder how our dear NameNode (The Master) came to know about all the DataNodes(Slaves) availability ?
Well for every 3 seconds (configurable) all slave nodes sends a message to his Master about their availability and this process is called Heart Beat. If Master doesn't receive any Heart Beat from any Slave Node then it will consider that slave is No longer available and stops sending data to that slave to Store. Once the slave sends Heart Beat back then only it will consider.
So with this you came to know metadata is precious. If you loss metadata gameover. To continue the game NameNode will write all the metadata into a file periodically(again configurable). We call the file as Edit Logs. Writing huge metadata into Edit logs will cause slow performance to NameNode (write optimisation),so it will partitioned to small files so when system startup happening reading all small small Edit logs will cause again performance issue(read optimisation).
So one background process(Secondary Node)is there to merge all those small Edit logs into one bigger log. That bigger log is nothing but FSImage. And will give it back to NameNode and it stores in Disk.
So with this we came to know we are taking care of metadata but dude if NameNode it self fails then?
In Hadoop 2 they came with a concept called Journal Node.
Journal Node is back up of Name Node.
Previous Post Next Post
With this we achieved Distributed Storage (wait for Distributed Parallel processing we will cover this in our MapReduce Topic).
Now another advantage of Hadoop we learned is Fault Tolerant. To achieve this we are storing each piece of File in 3 different blocks(Block Replication) of 3 different DataNodes. If any machine failure we can still fetch it from another Data Node. (Here 3 replicas are default & its configurable).
The Data Node will copy the file into the first listed replica's memory and closes connection. That replica block will write the file into Disk and repeats the process for rest of the replica Data Nodes.
Under Replicated: If replication was not successful, Name Node will raise request to replicate remaining to reach the replication factor.
While retrieving files first it will fetch metadata information and then client will directly contact with Data Node and collects the data.
So ever wonder how our dear NameNode (The Master) came to know about all the DataNodes(Slaves) availability ?
Well for every 3 seconds (configurable) all slave nodes sends a message to his Master about their availability and this process is called Heart Beat. If Master doesn't receive any Heart Beat from any Slave Node then it will consider that slave is No longer available and stops sending data to that slave to Store. Once the slave sends Heart Beat back then only it will consider.
So with this you came to know metadata is precious. If you loss metadata gameover. To continue the game NameNode will write all the metadata into a file periodically(again configurable). We call the file as Edit Logs. Writing huge metadata into Edit logs will cause slow performance to NameNode (write optimisation),so it will partitioned to small files so when system startup happening reading all small small Edit logs will cause again performance issue(read optimisation).
So one background process(Secondary Node)is there to merge all those small Edit logs into one bigger log. That bigger log is nothing but FSImage. And will give it back to NameNode and it stores in Disk.
So with this we came to know we are taking care of metadata but dude if NameNode it self fails then?
In Hadoop 2 they came with a concept called Journal Node.
Journal Node is back up of Name Node.
No comments:
Post a Comment