search
HomeDatabaseMysql TutorialHadoop简单介绍

Hadoop简单介绍

Jun 07, 2016 pm 02:58 PM
hadooptwointroduceSimplesolve

Hadoop简单介绍 一、Hadoop要解决的两个问题: 首先我们撇开Hadoop的历史、概念,我们先了解Hadoop是用来干啥的。 Hadoop解决两个问题: 1.海量数据存储 HDFS 2.海量的数据分析 MapReduce 二、Hadoop历史: 2002年的apache项目Nutch 2003年Google发表了关于G

Hadoop简单介绍

 

一、Hadoop要解决的两个问题:

首先我们撇开Hadoop的历史、概念,我们先了解Hadoop是用来干啥的。

    Hadoop解决两个问题:

    1.海量数据存储 HDFS

    2.海量的数据分析 MapReduce

二、Hadoop历史:

2002年的apache项目Nutch

2003年Google发表了关于GFS的论文

2004年Nutch的开发者开发了NDFS

2004年Google发表了关于MapReduce的论文

2005年MapR被引入了NDFS

2006年改名为Hadoop,NDFS创始人加入了yahoo,yahoo成立了一个专门的小组发展Hadoop

三、学习Hadoop的目的:

Hadoop是IT行业一个新的热点,是云计算的一个具体实现

Hadoop本身具有很高的技术含量,是IT工程师学习的首选

四、HDFS设计目标:

1.Very large files

2.Streaming data access

      write-once read-many-times

3.Commodity hardware

五、Hadoop不适合的场景:

1.low-latency data access

2.Lots of small files

3.Multiple writers,arbitrary file modifications

六、HDFS架构:

(1)假设有一个 600G的文件a.txt,由于我们的Hadoop默认一个块的大小是64M,故将这600G文件以64M为一块分别存储到所有的集群的主机上,这样我们的读取速度将会大大提高。

(2)同一个文件块在不同的节点中有多个副本,这样当集群里某一文件块损坏或者数据丢失时,会在另外一个节点得到补充。另外这些副本和原本都是在一个配置文件里配置的,Hadoop会根据配置信息自动寻找备份的内容块。

(3)刚刚我们提到的配置文件,我们需要一个集中的地方保存文件的分块信息:

  /home/asdf/a.txt.part1,3,(dm1,dm2,dm3)

  /home/asdf/a.txt.part2,3,(dm2,dm3,dm4)

  /home/asdf/a.txt.part3,3,(dm6,dm11,dm28)

  这里边的3是指加上备份有三份。

(4)Block:一个文件分块,默认64M

          NameNode:保存整个文件系统的目录信息,文件信息以及文件相应的分块信息。

          DataNode:用于存储Blocks

          HDFS的HA策略:NameNode一旦宕机,整个文件系统将无法工作。  如果NameNode中的数据丢失,整个文件系统也就丢失了。 2.x开始,HDFS支持NameNode的active-standy模式。

  

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
What are some tools you can use to monitor MySQL performance?What are some tools you can use to monitor MySQL performance?Apr 23, 2025 am 12:21 AM

How to effectively monitor MySQL performance? Use tools such as mysqladmin, SHOWGLOBALSTATUS, PerconaMonitoring and Management (PMM), and MySQL EnterpriseMonitor. 1. Use mysqladmin to view the number of connections. 2. Use SHOWGLOBALSTATUS to view the query number. 3.PMM provides detailed performance data and graphical interface. 4.MySQLEnterpriseMonitor provides rich monitoring functions and alarm mechanisms.

How does MySQL differ from SQL Server?How does MySQL differ from SQL Server?Apr 23, 2025 am 12:20 AM

The difference between MySQL and SQLServer is: 1) MySQL is open source and suitable for web and embedded systems, 2) SQLServer is a commercial product of Microsoft and is suitable for enterprise-level applications. There are significant differences between the two in storage engine, performance optimization and application scenarios. When choosing, you need to consider project size and future scalability.

In what scenarios might you choose SQL Server over MySQL?In what scenarios might you choose SQL Server over MySQL?Apr 23, 2025 am 12:20 AM

In enterprise-level application scenarios that require high availability, advanced security and good integration, SQLServer should be chosen instead of MySQL. 1) SQLServer provides enterprise-level features such as high availability and advanced security. 2) It is closely integrated with Microsoft ecosystems such as VisualStudio and PowerBI. 3) SQLServer performs excellent in performance optimization and supports memory-optimized tables and column storage indexes.

How does MySQL handle character sets and collations?How does MySQL handle character sets and collations?Apr 23, 2025 am 12:19 AM

MySQLmanagescharactersetsandcollationsbyusingUTF-8asthedefault,allowingconfigurationatdatabase,table,andcolumnlevels,andrequiringcarefulalignmenttoavoidmismatches.1)Setdefaultcharactersetandcollationforadatabase.2)Configurecharactersetandcollationfor

What are triggers in MySQL?What are triggers in MySQL?Apr 23, 2025 am 12:11 AM

A MySQL trigger is an automatically executed stored procedure associated with a table that is used to perform a series of operations when a specific data operation is performed. 1) Trigger definition and function: used for data verification, logging, etc. 2) Working principle: It is divided into BEFORE and AFTER, and supports row-level triggering. 3) Example of use: Can be used to record salary changes or update inventory. 4) Debugging skills: Use SHOWTRIGGERS and SHOWCREATETRIGGER commands. 5) Performance optimization: Avoid complex operations, use indexes, and manage transactions.

How do you create and manage user accounts in MySQL?How do you create and manage user accounts in MySQL?Apr 22, 2025 pm 06:05 PM

The steps to create and manage user accounts in MySQL are as follows: 1. Create a user: Use CREATEUSER'newuser'@'localhost'IDENTIFIEDBY'password'; 2. Assign permissions: Use GRANTSELECT, INSERT, UPDATEONmydatabase.TO'newuser'@'localhost'; 3. Fix permission error: Use REVOKEALLPRIVILEGESONmydatabase.FROM'newuser'@'localhost'; then reassign permissions; 4. Optimization permissions: Use SHOWGRA

How does MySQL differ from Oracle?How does MySQL differ from Oracle?Apr 22, 2025 pm 05:57 PM

MySQL is suitable for rapid development and small and medium-sized applications, while Oracle is suitable for large enterprises and high availability needs. 1) MySQL is open source and easy to use, suitable for web applications and small and medium-sized enterprises. 2) Oracle is powerful and suitable for large enterprises and government agencies. 3) MySQL supports a variety of storage engines, and Oracle provides rich enterprise-level functions.

What are the disadvantages of using MySQL compared to other relational databases?What are the disadvantages of using MySQL compared to other relational databases?Apr 22, 2025 pm 05:49 PM

The disadvantages of MySQL compared to other relational databases include: 1. Performance issues: You may encounter bottlenecks when processing large-scale data, and PostgreSQL performs better in complex queries and big data processing. 2. Scalability: The horizontal scaling ability is not as good as Google Spanner and Amazon Aurora. 3. Functional limitations: Not as good as PostgreSQL and Oracle in advanced functions, some functions require more custom code and maintenance.

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version