A brief discussion on what Hadoop is and its learning route-HTML Tutorial-php.cn

Home

Web Front-end

HTML Tutorial

A brief discussion on what Hadoop is and its learning route

巴扎黑

Mar 14, 2017 am 09:46 AM

Hadoop implements a distributed file system(HadoopDistributedFileSystem), referred to as HDFS. HDFS has high fault tolerance and is designed to be deployed on low-cost hardware; and it provides high throughput (highthroughput) to access application data, suitable for those with Applications with large datasets (largedataset) HDFS relaxes POSIX requirements and can access data in the file system in the form of streams
Hadoop's framework. The core design is: HDFS and MapReduce. HDFS provides storage for massive data, and MapReduce provides computing for massive data. In a word, Hadoop is storage plus calculation. ## The name Hadoop is not an abbreviation, but a fictitious name. The creator of the project, Doug Cutting, explained how Hadoop got its name: "This name was given to a brown elephant toy by my child.
Hadoop is a distributed computing platform that allows users to easily
structure and use it. Users can easily develop and run applications that handle massive amounts of data on Hadoop. It mainly has the following advantages: 　1. High reliability Hadoop's ability to store and process data bit by bit is worthy of people's trust.
　2. Highly scalable Hadoop distributes data and completes computing tasks among available computer clusters. These clusters can be easily expanded to thousands of nodes.
　3. Efficiency Hadoop can dynamically move data between nodes and ensure the dynamic balance of each node, so the processing speed is very fast.
　4. Highly fault-tolerant Hadoop can automatically save multiple copies of data and automatically redistribute failed tasks.
　5. Low cost Compared with all-in-one computers, commercial data warehouses, and data marts such as QlikView and YonghongZ-Suite, hadoop is open source, so the software cost of the project will be greatly reduced.
Hadoop comes with a framework written in java language, so it is ideal to run on
Linux production platform. Applications on Hadoop can also be written in other languages, such as C++. The significance of Hadoop big data processing
Hadoop’s wide application in big data processing applications benefits from its natural advantages in data extraction, transformation and loading (ETL). The distributed architecture of Hadoop places the big data processing engine as close to the storage as possible, which is relatively suitable for batch processing operations such as ETL, because the batch processing results of such operations can go directly to storage. Hadoop's MapReduce function breaks a single task into pieces and sends the fragmented tasks (Map) to multiple nodes, and then loads (Reduce) them into the data warehouse in the form of a single data set.
PHP Chinese website Hadoop learning route information:
1. HadoopCommon: a module at the bottom of the Hadoop system, providing various tools for Hadoop sub-projects, such as:
Configuration files and log operations, etc. . 　2. HDFS: Distributed file system, providing high-throughput application data access. To external clients, HDFS is like a traditional hierarchical file system. Files can be created,
delete, moved or renamed, and more. However, the architecture of HDFS is built based on a specific set of nodes (see Figure 1), which is determined by its own characteristics. These nodes include NameNode (just one), which provides metadata services inside HDFS; DataNode, which provides storage blocks to HDFS. This is a drawback (single point of failure) of HDFS since only one NameNode exists. Files stored in HDFS are divided into blocks, and these blocks are then copied to multiple computers (DataNode). This is very different from traditional RAID architecture. The block size (usually 64MB) and the number of blocks copied are determined by the client when the file is created. NameNode can control all file operations. All communications within HDFS are based on the standard
TCP/IP protocol. 　3. MapReduce: A software framework set for distributed massive data processing computing cluster.
4. Avro: RPC project hosted by dougcutting, mainly responsible for
data serialization. Somewhat similar to Google's protobuf and Facebook's thrift. Avro will be used for Hadoop's RPC in the future, making Hadoop's RPC module communication faster and the data structure more compact. 5. Hive: Similar to CloudBase, it is also a set of software based on the Hadoop distributed computing platform that provides the SQL function of datawarehouse. It simplifies the summary and ad hoc query of massive data stored in Hadoop. hive provides a set of QL query language, based on sql, which is very convenient to use. 6. HBase: Based on HadoopDistributedFileSystem, it is an open source, scalable distributed database
based on column storage model , and supports the storage of structured data in large tables. 7. Pig: It is an advanced data flow language and execution framework for parallel computing. The SQL-like language is an advanced query language built on MapReduce. It compiles some operations into the Map and Reduce of the MapReduce model. , and users can define their own functions. 8. ZooKeeper: An open source implementation of Google’s Chubby. It is a reliable coordination system for large-scale distributed systems. It provides functions including: configuration maintenance, name service, distributed synchronization, group service, etc. The goal of ZooKeeper is to encapsulate complex and error-prone key services and provide users with a simple and easy-to-use
interface
and a system with efficient performance and stable functions. 　9. Chukwa: A data collection system for managing large-scale distributed systems contributed by yahoo. 　10. Cassandra: A scalable multi-master database with no single point of failure.
　11. Mahout: A scalable machine learning and data mining library.
The initial design goals of Hadoop were high reliability, high scalability, high fault tolerance and efficiency. It is these inherent advantages in design that made Hadoop popular with many large companies as soon as it appeared. favored, and also attracted widespread attention from the research community. So far, Hadoop technology has been widely used in the Internet field.
The above is a detailed introduction to what Hadoop is and the Hadoop learning route. If you want to know more news and information about Hadoop, please pay attention to the official website of the platform, WeChat and other platforms. The platform IT career online learning and education platform provides you with authority. Big data Hadoop training course and
video
tutorial system, the first set of adaptive Hadoop online video course system recorded online by a gold medal lecturer on the big platform, allowing you to quickly master the practical skills of Hadoop from entry to proficiency in big data development .

The above is the detailed content of A brief discussion on what Hadoop is and its learning route. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

HTML as a Markup Language: Its Function and PurposeApr 22, 2025 am 12:02 AM

The function of HTML is to define the structure and content of a web page, and its purpose is to provide a standardized way to display information. 1) HTML organizes various parts of the web page through tags and attributes, such as titles and paragraphs. 2) It supports the separation of content and performance and improves maintenance efficiency. 3) HTML is extensible, allowing custom tags to enhance SEO.

The Future of HTML, CSS, and JavaScript: Web Development TrendsApr 19, 2025 am 12:02 AM

The future trends of HTML are semantics and web components, the future trends of CSS are CSS-in-JS and CSSHoudini, and the future trends of JavaScript are WebAssembly and Serverless. 1. HTML semantics improve accessibility and SEO effects, and Web components improve development efficiency, but attention should be paid to browser compatibility. 2. CSS-in-JS enhances style management flexibility but may increase file size. CSSHoudini allows direct operation of CSS rendering. 3.WebAssembly optimizes browser application performance but has a steep learning curve, and Serverless simplifies development but requires optimization of cold start problems.

HTML: The Structure, CSS: The Style, JavaScript: The BehaviorApr 18, 2025 am 12:09 AM

The roles of HTML, CSS and JavaScript in web development are: 1. HTML defines the web page structure, 2. CSS controls the web page style, and 3. JavaScript adds dynamic behavior. Together, they build the framework, aesthetics and interactivity of modern websites.

The Future of HTML: Evolution and Trends in Web DesignApr 17, 2025 am 12:12 AM

The future of HTML is full of infinite possibilities. 1) New features and standards will include more semantic tags and the popularity of WebComponents. 2) The web design trend will continue to develop towards responsive and accessible design. 3) Performance optimization will improve the user experience through responsive image loading and lazy loading technologies.

HTML vs. CSS vs. JavaScript: A Comparative OverviewApr 16, 2025 am 12:04 AM

The roles of HTML, CSS and JavaScript in web development are: HTML is responsible for content structure, CSS is responsible for style, and JavaScript is responsible for dynamic behavior. 1. HTML defines the web page structure and content through tags to ensure semantics. 2. CSS controls the web page style through selectors and attributes to make it beautiful and easy to read. 3. JavaScript controls web page behavior through scripts to achieve dynamic and interactive functions.

HTML: Is It a Programming Language or Something Else?Apr 15, 2025 am 12:13 AM

HTMLisnotaprogramminglanguage;itisamarkuplanguage.1)HTMLstructuresandformatswebcontentusingtags.2)ItworkswithCSSforstylingandJavaScriptforinteractivity,enhancingwebdevelopment.

HTML: Building the Structure of Web PagesApr 14, 2025 am 12:14 AM

HTML is the cornerstone of building web page structure. 1. HTML defines the content structure and semantics, and uses, etc. tags. 2. Provide semantic markers, such as, etc., to improve SEO effect. 3. To realize user interaction through tags, pay attention to form verification. 4. Use advanced elements such as, combined with JavaScript to achieve dynamic effects. 5. Common errors include unclosed labels and unquoted attribute values, and verification tools are required. 6. Optimization strategies include reducing HTTP requests, compressing HTML, using semantic tags, etc.

From Text to Websites: The Power of HTMLApr 13, 2025 am 12:07 AM

HTML is a language used to build web pages, defining web page structure and content through tags and attributes. 1) HTML organizes document structure through tags, such as,. 2) The browser parses HTML to build the DOM and renders the web page. 3) New features of HTML5, such as, enhance multimedia functions. 4) Common errors include unclosed labels and unquoted attribute values. 5) Optimization suggestions include using semantic tags and reducing file size.

See all articles