Cloudera CDH简介

 

马马虎虎学了30多天的大数据课程,从最开始的不明所以到现在略知一二,准备将所学的内容进行一下梳理。

 

CDH同级的概念是 HDP,Apache
Hadoop.

 

本文讲讲CDH相关的概念.

CDHCloudera这个公司发布的产品,官网地址https://www.cloudera.com/

到官方文档地址https://www.cloudera.com/documentation.html 可知CDHCloudera Enterprise产品中的一员。

 

Cloudera CDH简介

 

查看Cloudera Enterprise文档的Introduction(当前5.12为最高版本) https://www.cloudera.com/documentation/enterprise/latest/topics/introduction.html

 

Cloudera CDH简介

Cloudera provides a scalable, flexible, integrated
platform that makes it easy to manage rapidly increasing volumes and varieties
of data in your enterprise. Cloudera products and solutions enable you to
deploy and manage Apache Hadoop and related projects, manipulate and analyze
your data, and keep that data secure and protected.

Cloudera provides the following products
and tools:

  • CDH—The
    Cloudera distribution of Apache Hadoop and other related open-source
    projects, including Apache Impala (incubating) and Cloudera Search. CDH
    also provides security and integration with numerous hardware and software
    solutions.
  • Apache
    Impala (incubating)
    —A massively parallel processing SQL engine for
    interactive analytics and business intelligence. Its highly optimized
    architecture makes it ideally suited for traditional BI-style queries with
    joins, aggregations, and subqueries. It can query Hadoop data files from a
    variety of sources, including those produced by MapReduce jobs or loaded
    into Hive tables. The YARN resource management component lets Impala
    coexist on clusters running batch workloads concurrently with Impala SQL
    queries. You can manage Impala alongside other Hadoop components through
    the Cloudera Manager user interface, and secure its data through the
    Sentry authorization framework.
  • Cloudera
    Search
    —Provides near real-time access to data stored in or ingested
    into Hadoop and HBase. Search provides near real-time indexing, batch
    indexing, full-text exploration and navigated drill-down, as well as a
    simple, full-text interface that requires no SQL or programming skills.
    Fully integrated in the data-processing platform, Search uses the
    flexible, scalable, and robust storage system included with CDH. This
    eliminates the need to move large data sets across infrastructures to
    perform business tasks.
  • Cloudera
    Manager
    —A sophisticated application used to deploy, manage, monitor,
    and diagnose issues with your CDH deployments. Cloudera Manager provides
    the Admin Console, a web-based user interface that makes administration of
    your enterprise data simple and straightforward. It also includes the
    Cloudera Manager API, which you can use to obtain cluster health information
    and metrics, as well as configure Cloudera Manager.
  • Cloudera
    Navigator
    —End-to-end data management and security for the CDH
    platform. Cloudera Navigator Data Management enables administrators, data
    managers, and analysts explore vast data collections in Hadoop. Cloudera
    Navigator Encrypt and simplifies the storage and management of encryption
    keys. The robust auditing, data management, lineage management, lifecycle
    management, and encryption key management in Cloudera Navigator allow
    enterprises to adhere to stringent compliance and regulatory requirements.

 

看完说明后,大体了解到Cloudera提供如下产品和工具:CDH,Apache Impala,Cloudera Search,Cloudera Manager,Cloudera Navigator
. 
其中CDH包含Apache ImpalaCloudera Search. 总结起来,Cloudera提供CDH,Cloudera Manager,Cloudera
Navigator
三大件.

文档后面章节对这三大件各做了简介

 

CDH Overview

CDH delivers the core elements of Hadoop

Introduction文档中有提到,关于CDH各组件的信息,超出了Cloudera文档的范围。各组件的使用我以后会在使用中编写相应的文档。

Cloudera CDH简介

 

Cloudera Manager 5 Overview

With Cloudera Manager, you can easily
deploy and centrally operate the complete CDH stack and other managed services.

说白了CM可以使CDH安装管理简化.

Terminology

Cloudera CDH简介

 

 

Architecture

Cloudera CDH简介

 

 

Cloudera Navigator Data Management Overview

Cloudera Navigator Data Management is a
complete solution for data governance, auditing, and related data management
tasks that is fully integrated with the Hadoop platform.

这个解释有些抽象,后面FAQ中有一个问题回复比较简明

Is Cloudera Navigator a module of Cloudera Manager?

Not exactly. Cloudera Navigator is installed separately, after Cloudera
Manager is installed, and it interacts behind the scenes with Cloudera Manager
to deliver some of its core functionality. Cloudera Manager is used by cluster
administrators to manage the cluster and all its services. Cloudera Navigator
is used by administrators but also by security and governance teams, data
stewards, and others to audit, trace data lineage from source raw data through
final form, and perform other comprehensive data governance and stewardship
tasks.

 

 

如果不涉及到数据安全审计等方面,Cloudera Navigator可以不用安装。

 

了解了CDH相关的概念后,开始准备安装。安装会单独写个文档,网上可参考的安装文档也很多。我准备参考官方文档,依照官方文档中的步骤内容进行。

https://www.cloudera.com/documentation/enterprise/latest/topics/introduction.html

 

 

PS:如果您想和业内技术大牛交流的话,请加qq群(527933790)或者关注微信公众 号(AskHarries),谢谢!

转载请注明原文出处:Harries Blog™ » Cloudera CDH简介

赞 (0)

分享到:更多 ()

评论 0

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址