Hadoop Distributed File System (HDFS) Architectural Documentation


1       Introduction. 5

2       HDFS Assumptions and Goals
    2.1 Hardware Failures
    2.2 Streaming Data Access
    2.3 Large Data Sets
    2.4 Simple Coherency Model
. 6

3       Overview of the HDFS Architecture
    3.1 HDFS Files
    3.2 Block Allocation
. 7

4       Communication Among HDFS Elements
    4.1 Application Code <-> Client
    4.2 Client <-> NameNode
    4.3 Client <-> DataNode
    4.4 NameNode <-> DataNode 10

5       Decomposition and Basic Concepts of HDFS Elements
    5.1 Client
    5.2 NameNode Decomposition
    5.3 DataNode Block Management. 12

6       Use Cases
    6.1 Create
    6.2 Write
    6.3 Read
    6.4 Complete. 23

7       Module View
    7.1 Module Descriptions
    7.2 Modularity Risks. 12


Executive Summary

This document captures the major architectural decisions in HDFS 0.21. The purpose of the document provide a guide to the overall structure of the HDFS code so that contributors can more effectively understand how changes that they are considering can be made, and the consequences of those changes.

The audience for this report is both contributors (who will use the document to gain an understanding of the structure of HDFS and its design rationale) and committers who will use the document to reason about future changes and who will update the document as the system evolves.