hadoop2包结构及包功能大致介绍-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

hadoop2包结构及包功能大致介绍

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 03:42 PM

introduceFunctionOverviewstructure

一、概述 hadoop2的设计细想比hadoop1进步了很多，毕竟hadoop1是几年前的东西了。软件设计的理论在这几年中发展很快，出现了很多的软件设计理论如：领域驱动模型、事件驱动模型、状生命周期管理等，也出现了很多的开源的解决方案，当然开源的方案很多都是起

一、概述

hadoop2的设计细想比hadoop1进步了很多，毕竟hadoop1是几年前的东西了。软件设计的理论在这几年中发展很快，出现了很多的软件设计理论如：领域驱动模型、事件驱动模型、状生命周期管理等，也出现了很多的开源的解决方案，当然开源的方案很多都是起源apache社区。在hadoop2中，采取了maven的工程管理结构，把以前的单一工程换成了多工程结构模式，现在估计有45个（pom.xml文件的个数）project，以后会不会更多或者合并一些，这个就要持续关注hadoop开源社区的发展了。当然project也不是越多越好，我见过的最多的包工程有几百个，eclipse刷新的时候就需要几个小时，这个对于开发效能是一个极大的挑战。一般的项目的project个数大约在10个左右。hadoop大约有45个，我感觉有点多，当然hadoop2把project合理的分层了，这个确实是非常清晰的。

关于多个project的好处，我认为主要是：其一project之间的依赖的关系是单向的，包之间的功能是隔离的，这个不同于package，package是可以互相依赖的，对于隔离主要看设计者；其二就是管理方便，开发方便。

二、hadoop2工程分析

我们接下来对release-2.0.0-alpha分析，源码来自：http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.0-alpha，其他的一些版本包会有一些变化。

分析的工具是：structure101，可以google看下。

第一层：hadoop主要有四部分：hadoop-common-project、hadoop-hdfs-project、hadoop-mapreduce、hadoop-tools。他们之间的依赖关系如下图1所示：

图1

每部分的功能从名称就看得出来，这样的依赖关系很清晰。我们将从低往上去看下依赖关系。

第二层：图2展示了第二层的结构。

图2

我们可以看出四个包的下属工程，其中hadoop-mapreduce的工程hadoop-yarn有一个向上依赖hadoop-mapreduce-client 依赖是pom.xml的scope为test的依赖。我认为这个也是不对的，应该是去掉。 tools工程下面的子工程互相之间是独立的，因为他们都是工程包。

第三层：我们再看下其中hadoop-yarn与hadoop-mapreduce-client，如图3所示：

图3

我们看到hadoop-yarn还是很清楚的。对于hadoop-mapreduce-client有6个project不过还是比较清楚。

第四层：看下 hadoop-yarn-server下属的包，建图4

图4

在此图中，我们看到了 hadoop-yarn-server-namemanger及hadoop-yarn-server-resourcemanager，在计算中，也就是这两大守护进程了。

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the role of InnoDB redo logs and undo logs.Apr 15, 2025 am 12:16 AM

InnoDB uses redologs and undologs to ensure data consistency and reliability. 1.redologs record data page modification to ensure crash recovery and transaction persistence. 2.undologs records the original data value and supports transaction rollback and MVCC.

What are the key metrics to look for in an EXPLAIN output (type, key, rows, Extra)?Apr 15, 2025 am 12:15 AM

Key metrics for EXPLAIN commands include type, key, rows, and Extra. 1) The type reflects the access type of the query. The higher the value, the higher the efficiency, such as const is better than ALL. 2) The key displays the index used, and NULL indicates no index. 3) rows estimates the number of scanned rows, affecting query performance. 4) Extra provides additional information, such as Usingfilesort prompts that it needs to be optimized.

What is the Using temporary status in EXPLAIN and how to avoid it?Apr 15, 2025 am 12:14 AM

Usingtemporary indicates that the need to create temporary tables in MySQL queries, which are commonly found in ORDERBY using DISTINCT, GROUPBY, or non-indexed columns. You can avoid the occurrence of indexes and rewrite queries and improve query performance. Specifically, when Usingtemporary appears in EXPLAIN output, it means that MySQL needs to create temporary tables to handle queries. This usually occurs when: 1) deduplication or grouping when using DISTINCT or GROUPBY; 2) sort when ORDERBY contains non-index columns; 3) use complex subquery or join operations. Optimization methods include: 1) ORDERBY and GROUPB

Describe the different SQL transaction isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) and their implications in MySQL/InnoDB.Apr 15, 2025 am 12:11 AM

MySQL/InnoDB supports four transaction isolation levels: ReadUncommitted, ReadCommitted, RepeatableRead and Serializable. 1.ReadUncommitted allows reading of uncommitted data, which may cause dirty reading. 2. ReadCommitted avoids dirty reading, but non-repeatable reading may occur. 3.RepeatableRead is the default level, avoiding dirty reading and non-repeatable reading, but phantom reading may occur. 4. Serializable avoids all concurrency problems but reduces concurrency. Choosing the appropriate isolation level requires balancing data consistency and performance requirements.

MySQL vs. Other Databases: Comparing the OptionsApr 15, 2025 am 12:08 AM

MySQL is suitable for web applications and content management systems and is popular for its open source, high performance and ease of use. 1) Compared with PostgreSQL, MySQL performs better in simple queries and high concurrent read operations. 2) Compared with Oracle, MySQL is more popular among small and medium-sized enterprises because of its open source and low cost. 3) Compared with Microsoft SQL Server, MySQL is more suitable for cross-platform applications. 4) Unlike MongoDB, MySQL is more suitable for structured data and transaction processing.

How does MySQL index cardinality affect query performance?Apr 14, 2025 am 12:18 AM

MySQL index cardinality has a significant impact on query performance: 1. High cardinality index can more effectively narrow the data range and improve query efficiency; 2. Low cardinality index may lead to full table scanning and reduce query performance; 3. In joint index, high cardinality sequences should be placed in front to optimize query.

MySQL: Resources and Tutorials for New UsersApr 14, 2025 am 12:16 AM

The MySQL learning path includes basic knowledge, core concepts, usage examples, and optimization techniques. 1) Understand basic concepts such as tables, rows, columns, and SQL queries. 2) Learn the definition, working principles and advantages of MySQL. 3) Master basic CRUD operations and advanced usage, such as indexes and stored procedures. 4) Familiar with common error debugging and performance optimization suggestions, such as rational use of indexes and optimization queries. Through these steps, you will have a full grasp of the use and optimization of MySQL.

Real-World MySQL: Examples and Use CasesApr 14, 2025 am 12:15 AM

MySQL's real-world applications include basic database design and complex query optimization. 1) Basic usage: used to store and manage user data, such as inserting, querying, updating and deleting user information. 2) Advanced usage: Handle complex business logic, such as order and inventory management of e-commerce platforms. 3) Performance optimization: Improve performance by rationally using indexes, partition tables and query caches.

See all articles