Hadoop Distributed File System (HDFS) Architectural Documentation

 

4         Communication among HDFS elements.

This section describes the communication among the HDFS elements. The application process has two elements for which we will discuss the communication—the application code and the client library. We will also discuss the communication between the various processes. Each of the three elements of HDFS is described in a different section, see NameNode Decomposition, Client decomposition, and DataNode Block Management.

4.1       Application  code <-> Client

HDFS provides a Java API for applications to use. The API that the application uses is described in http://hadoop.apache.org/core/docs/current/api/. Fundamentally, the application uses the standard java.io interface.  A C language wrapper for this Java API is also available.

The client and the application code are bound into the same address space.


4.2       Client <-> NameNode

The connection between NameNode and the client is governed by the Client Protocol documented in …\hdfs\protocol\ClientProtocol.java.  A client establishes a connection to a configurable TCP port on the NameNode machine. This is an RPC connection. The communication between the Client and the NameNode uses the ClientProtocol.

The major functions of the Client Protocol are:

  1. Create. Create a new file in the name space.
  2. Append. After a file has been created and closed, it is still possible to add data to the end of the file. This is the purpose of the append function.
  3. Add block. Add a new block to an existing file.
  4. Complete (close). The client has finished writing to this file.
  5. Read. A client wishes to read from the file.
  6. Error reporting. Report bad blocks detected by the client
  7. Lease management. Leases are created automatically on writing and renewed explicitly by the client.
  8. Bookkeeping. Affect the state of the NameNode, check on DataNodes, get a list of files in a particular directory.
  9. Directory management – rename, delete, copy

4.3       Client <-> DataNode

A client communicates with a DataNode directly to transfer (send/receive) data using the DataTransferProtocol, defined in DataTransferProtocol.java.  For performance purposes this protocol is a streaming protocol, not RPC. The client buffers data until a full block (the default is 64 Mbytes) has been created and then the block is streamed to the DataNode.

The DataTransferProtocol defines operations to read a block (opReadBlock()), write a block (opWriteBlock()), replace a block (opReplaceBlock()), copy a block (opCopyBlock()), and to get a block’s Checksum (opBlockChecksum()).

4.4       NameNode <-> DataNode

All communication between Namenode and Datanode is initiated by the Datanode, and responded to by the Namenode. The Namenode never initiates communication to the Datanode, although Namenode responses may include commands to the Datanode that cause it to send further communications.

DataNode sends information to NameNode through four major interfaces defined in the DataNodeProtocol. These four are

  1. DataNode Registration. The DataNode informs NameNode of its existence. NameNode returns its registration id. This registration id is a parameter of other DataNode functions.

Registration is triggered when a new DataNode is initiated, an old one is re-initiated, or when a new NameNode is initiated.

  1. DataNode sends heartbeat. The DataNode sends a heartbeat message every few seconds. This includes some information statistics about capacity and current activity. NameNode returns a list of block oriented commands for DataNode to execute. These commands primarily consist of instructions to transfer blocks to other DataNodes for replication purposes, or instructions to delete blocks.  The NameNode can also command an immediate Block Report from the DataNode, but this is only done to recover from severe problems.
  2. DataNode sends block report. DataNode periodically reports the blocks contained in its storage.  The period is typically configured to hourly.
  3. DataNode notifies BlockReceived. DataNode reports that it has received a new block, either from a Client (during file write) or from another DataNode (during replication).  It reports each block immediately upon receipt.

 

In addition, there are several bookkeeping interfaces such as verify request, verify version, and get names or locations of various files or images.