思维导图备注

Hadoop_TheDefinitiveGuide
首页 收藏书籍 阅读记录
  • 书签 我的书签
  • 添加书签 添加书签 移除书签 移除书签

Writing and Reading Parquet Files

浏览 9 扫码
  • 小字体
  • 中字体
  • 大字体
2022-01-24 09:47:17
请 登录 再阅读
上一篇:
下一篇:
  • 书签
  • 添加书签 移除书签
  • Hadoop: The Definitive Guide
  • Dedication
  • Foreword
  • Preface
    • Administrative Notes
    • What’s New in the Fourth Edition?
    • What’s New in the Third Edition?
    • What’s New in the Second Edition?
    • Conventions Used in This Book
    • Using Code Examples
    • Safari® Books Online
    • How to Contact Us
    • Acknowledgments
  • I. Hadoop Fundamentals
    • 1. Meet Hadoop
      • Data!
      • Data Storage and Analysis
      • Querying All Your Data
      • Beyond Batch
      • Comparison with Other Systems
      • A Brief History of Apache Hadoop
      • What’s in This Book?
    • 2. MapReduce
      • A Weather Dataset
      • Analyzing the Data with Unix Tools
      • Analyzing the Data with Hadoop
      • Scaling Out
      • Hadoop Streaming
    • 3. The Hadoop Distributed Filesystem
      • The Design of HDFS
      • HDFS Concepts
      • The Command-Line Interface
      • Hadoop Filesystems
      • The Java Interface
      • Data Flow
      • Parallel Copying with distcp
    • 4. YARN
      • Anatomy of a YARN Application Run
      • YARN Compared to MapReduce 1
      • Scheduling in YARN
      • Further Reading
    • 5. Hadoop I/O
      • Data Integrity
      • Compression
      • Serialization
      • File-Based Data Structures
  • II. MapReduce
    • 6. Developing a MapReduce Application
      • The Configuration API
      • Setting Up the Development Environment
      • Writing a Unit Test with MRUnit
      • Running Locally on Test Data
      • Running on a Cluster
      • Tuning a Job
      • MapReduce Workflows
    • 7. How MapReduce Works
      • Anatomy of a MapReduce Job Run
      • Failures
      • Shuffle and Sort
      • Task Execution
    • 8. MapReduce Types and Formats
      • MapReduce Types
      • Input Formats
      • Output Formats
    • 9. MapReduce Features
      • Counters
      • Sorting
      • Joins
      • Side Data Distribution
      • MapReduce Library Classes
  • III. Hadoop Operations
    • 10. Setting Up a Hadoop Cluster
      • Cluster Specification
      • Cluster Setup and Installation
      • Hadoop Configuration
      • Security
      • Benchmarking a Hadoop Cluster
    • 11. Administering Hadoop
      • HDFS
      • Monitoring
      • Maintenance
  • IV. Related Projects
    • 12. Avro
      • Avro Data Types and Schemas
      • In-Memory Serialization and Deserialization
      • Avro Datafiles
      • Interoperability
      • Schema Resolution
      • Sort Order
      • Avro MapReduce
      • Sorting Using Avro MapReduce
      • Avro in Other Languages
    • 13. Parquet
      • Data Model
      • Parquet File Format
      • Parquet Configuration
      • Writing and Reading Parquet Files
      • Parquet MapReduce
    • 14. Flume
      • Installing Flume
      • An Example
      • Transactions and Reliability
      • The HDFS Sink
      • Fan Out
      • Distribution: Agent Tiers
      • Sink Groups
      • Integrating Flume with Applications
      • Component Catalog
      • Further Reading
    • 15. Sqoop
      • Getting Sqoop
      • Sqoop Connectors
      • A Sample Import
      • Generated Code
      • Imports: A Deeper Look
      • Working with Imported Data
      • Importing Large Objects
      • Performing an Export
      • Exports: A Deeper Look
      • Further Reading
    • 16. Pig
      • Installing and Running Pig
      • An Example
      • Comparison with Databases
      • Pig Latin
      • User-Defined Functions
      • Data Processing Operators
      • Pig in Practice
      • Further Reading
    • 17. Hive
      • Installing Hive
      • An Example
      • Running Hive
      • Comparison with Traditional Databases
      • HiveQL
      • Tables
      • Querying Data
      • User-Defined Functions
      • Further Reading
    • 18. Crunch
      • An Example
      • The Core Crunch API
      • Pipeline Execution
      • Crunch Libraries
      • Further Reading
    • 19. Spark
      • Installing Spark
      • An Example
      • Resilient Distributed Datasets
      • Shared Variables
      • Anatomy of a Spark Job Run
      • Executors and Cluster Managers
      • Further Reading
    • 20. HBase
      • HBasics
      • Concepts
      • Installation
      • Clients
      • Building an Online Query Application
      • HBase Versus RDBMS
      • Praxis
      • Further Reading
    • 21. ZooKeeper
      • Installing and Running ZooKeeper
      • An Example
      • The ZooKeeper Service
      • Building Applications with ZooKeeper
      • ZooKeeper in Production
      • Further Reading
  • V. Case Studies
    • 22. Composable Data at Cerner
      • From CPUs to Semantic Integration
      • Enter Apache Crunch
      • Building a Complete Picture
      • Integrating Healthcare Data
      • Composability over Frameworks
      • Moving Forward
    • 23. Biological Data Science: Saving Lives with Software
      • The Structure of DNA
      • The Genetic Code: Turning DNA Letters into Proteins
      • Thinking of DNA as Source Code
      • The Human Genome Project and Reference Genomes
      • Sequencing and Aligning DNA
      • ADAM, A Scalable Genome Analysis Platform
      • From Personalized Ads to Personalized Medicine
      • Join In
    • 24. Cascading
      • Fields, Tuples, and Pipes
      • Operations
      • Taps, Schemes, and Flows
      • Cascading in Practice
      • Flexibility
      • Hadoop and Cascading at ShareThis
      • Summary
  • A. Installing Apache Hadoop
    • Prerequisites
    • Installation
    • Configuration
  • B. Cloudera’s Distribution Including Apache Hadoop
  • C. Preparing the NCDC Weather Data
  • D. The Old and New Java MapReduce APIs
  • Index
  • Colophon
  • Copyright
暂无相关搜索结果!
    展开/收起文章目录

    二维码

    手机扫一扫,轻松掌上学

    《Hadoop_TheDefinitiveGuide》电子书下载

    请下载您需要的格式的电子书,随时随地,享受学习的乐趣!
    EPUB 电子书

    书签列表

      阅读记录

      阅读进度: 0.00% ( 0/0 ) 重置阅读进度