×
思维导图备注
Hadoop_TheDefinitiveGuide
首页
收藏书籍
阅读记录
书签管理
我的书签
添加书签
移除书签
8. MapReduce Types and Formats
浏览
9
扫码
小字体
中字体
大字体
2022-01-24 09:47:16
请
登录
再阅读
上一篇:
下一篇:
Hadoop: The Definitive Guide
Dedication
Foreword
Preface
Administrative Notes
What’s New in the Fourth Edition?
What’s New in the Third Edition?
What’s New in the Second Edition?
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
I. Hadoop Fundamentals
1. Meet Hadoop
Data!
Data Storage and Analysis
Querying All Your Data
Beyond Batch
Comparison with Other Systems
A Brief History of Apache Hadoop
What’s in This Book?
2. MapReduce
A Weather Dataset
Analyzing the Data with Unix Tools
Analyzing the Data with Hadoop
Scaling Out
Hadoop Streaming
3. The Hadoop Distributed Filesystem
The Design of HDFS
HDFS Concepts
The Command-Line Interface
Hadoop Filesystems
The Java Interface
Data Flow
Parallel Copying with distcp
4. YARN
Anatomy of a YARN Application Run
YARN Compared to MapReduce 1
Scheduling in YARN
Further Reading
5. Hadoop I/O
Data Integrity
Compression
Serialization
File-Based Data Structures
II. MapReduce
6. Developing a MapReduce Application
The Configuration API
Setting Up the Development Environment
Writing a Unit Test with MRUnit
Running Locally on Test Data
Running on a Cluster
Tuning a Job
MapReduce Workflows
7. How MapReduce Works
Anatomy of a MapReduce Job Run
Failures
Shuffle and Sort
Task Execution
8. MapReduce Types and Formats
MapReduce Types
Input Formats
Output Formats
9. MapReduce Features
Counters
Sorting
Joins
Side Data Distribution
MapReduce Library Classes
III. Hadoop Operations
10. Setting Up a Hadoop Cluster
Cluster Specification
Cluster Setup and Installation
Hadoop Configuration
Security
Benchmarking a Hadoop Cluster
11. Administering Hadoop
HDFS
Monitoring
Maintenance
IV. Related Projects
12. Avro
Avro Data Types and Schemas
In-Memory Serialization and Deserialization
Avro Datafiles
Interoperability
Schema Resolution
Sort Order
Avro MapReduce
Sorting Using Avro MapReduce
Avro in Other Languages
13. Parquet
Data Model
Parquet File Format
Parquet Configuration
Writing and Reading Parquet Files
Parquet MapReduce
14. Flume
Installing Flume
An Example
Transactions and Reliability
The HDFS Sink
Fan Out
Distribution: Agent Tiers
Sink Groups
Integrating Flume with Applications
Component Catalog
Further Reading
15. Sqoop
Getting Sqoop
Sqoop Connectors
A Sample Import
Generated Code
Imports: A Deeper Look
Working with Imported Data
Importing Large Objects
Performing an Export
Exports: A Deeper Look
Further Reading
16. Pig
Installing and Running Pig
An Example
Comparison with Databases
Pig Latin
User-Defined Functions
Data Processing Operators
Pig in Practice
Further Reading
17. Hive
Installing Hive
An Example
Running Hive
Comparison with Traditional Databases
HiveQL
Tables
Querying Data
User-Defined Functions
Further Reading
18. Crunch
An Example
The Core Crunch API
Pipeline Execution
Crunch Libraries
Further Reading
19. Spark
Installing Spark
An Example
Resilient Distributed Datasets
Shared Variables
Anatomy of a Spark Job Run
Executors and Cluster Managers
Further Reading
20. HBase
HBasics
Concepts
Installation
Clients
Building an Online Query Application
HBase Versus RDBMS
Praxis
Further Reading
21. ZooKeeper
Installing and Running ZooKeeper
An Example
The ZooKeeper Service
Building Applications with ZooKeeper
ZooKeeper in Production
Further Reading
V. Case Studies
22. Composable Data at Cerner
From CPUs to Semantic Integration
Enter Apache Crunch
Building a Complete Picture
Integrating Healthcare Data
Composability over Frameworks
Moving Forward
23. Biological Data Science: Saving Lives with Software
The Structure of DNA
The Genetic Code: Turning DNA Letters into Proteins
Thinking of DNA as Source Code
The Human Genome Project and Reference Genomes
Sequencing and Aligning DNA
ADAM, A Scalable Genome Analysis Platform
From Personalized Ads to Personalized Medicine
Join In
24. Cascading
Fields, Tuples, and Pipes
Operations
Taps, Schemes, and Flows
Cascading in Practice
Flexibility
Hadoop and Cascading at ShareThis
Summary
A. Installing Apache Hadoop
Prerequisites
Installation
Configuration
B. Cloudera’s Distribution Including Apache Hadoop
C. Preparing the NCDC Weather Data
D. The Old and New Java MapReduce APIs
Index
Colophon
Copyright
暂无相关搜索结果!
×
二维码
手机扫一扫,轻松掌上学
×
《Hadoop_TheDefinitiveGuide》电子书下载
请下载您需要的格式的电子书,随时随地,享受学习的乐趣!
EPUB 电子书
×
书签列表
×
阅读记录
阅读进度:
0.00%
(
0/0
)
重置阅读进度