MongoDB Connector for Hadoop-mysql教程-PHP中文网

首页

数据库

mysql教程

MongoDB Connector for Hadoop

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 04:31 PM

forhadoopmmongodb

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB

Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics and processing tasks on very large data sets.

Since MongoDB excels at storing large operational data sets for applications, it makes sense to explore using these together - MongoDB for storage and querying, and Hadoop for batch processing.

The MongoDB Connector for Hadoop

We recently released the 1.1 release of the MongoDB Connector for Hadoop. The MongoDB Connector for Hadoop makes it easy to use Mongo databases, or MongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.

The MongoDB Connector for Hadoop also includes support for Pig and Hive, which allow very sophisticated MapReduce workflows to be executed just by writing very simple scripts.

Pig is a high-level scripting language for data analysis and building map/reduce workflows
Hive is a SQL-like language for ad-hoc queries and analysis of data sets on Hadoop-compatible file systems.

Hadoop streaming is also supported, so map/reduce functions can be written in any language besides Java. Right now the MongoDB Connector for Hadoop supports streaming in Ruby, Node.js and Python.

How it Works

How the Hadoop connector works

The adapter examines the MongoDB Collection and calculates a set of splits from the data
Each of the splits gets assigned to a node in Hadoop cluster
In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally
Hadoop merges results and streams output back to MongoDB or BSON

I’ll be giving an hour-long webinar on What’s New with the Mongo-Hadoop integration. The webinar will cover

Using Java MapReduce with the MongoDB Connector for Hadoop
Using Hadoop Streaming for other non-JVM languages
Writing Pig Scripts with the MongoDB Connector for Hadoop
MongoDB and Hadoop usage with Elastic MapReduce to easily kick off your Hadoop jobs
Overview of MongoUpdateWriteable: Using the result output from Hadoop to modify an existing output collection

The webinar will be offered twice on August 8:

8 am PDT / 11 am EDT / 3pm UTC
11am PDT / 2pm EDT / 6pm UTC

Update: Watch the webinar recording

原文地址：MongoDB Connector for Hadoop, 感谢原作者分享。

声明

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

MySQL如何处理数据复制？Apr 28, 2025 am 12:25 AM

MySQL通过异步、半同步和组复制三种模式处理数据复制。1）异步复制性能高但可能丢失数据。2）半同步复制提高数据安全性但增加延迟。3）组复制支持多主复制和故障转移，适用于高可用性需求。

您如何使用解释性语句分析查询性能？Apr 28, 2025 am 12:24 AM

EXPLAIN语句可用于分析和提升SQL查询性能。1.执行EXPLAIN语句查看查询计划。2.分析输出结果，关注访问类型、索引使用情况和JOIN顺序。3.根据分析结果，创建或调整索引，优化JOIN操作，避免全表扫描，以提升查询效率。

您如何备份并还原MySQL数据库？Apr 28, 2025 am 12:23 AM

使用mysqldump进行逻辑备份和MySQLEnterpriseBackup进行热备份是备份MySQL数据库的有效方法。1.使用mysqldump备份数据库：mysqldump-uroot-pmydatabase>mydatabase_backup.sql。2.使用MySQLEnterpriseBackup进行热备份：mysqlbackup--user=root--password=password--backup-dir=/path/to/backupbackup。恢复时，使用相应的命

MySQL中慢速查询的常见原因是什么？Apr 28, 2025 am 12:18 AM

MySQL慢查询的主要原因包括索引缺失或不当使用、查询复杂度、数据量过大和硬件资源不足。优化建议包括：1.创建合适的索引；2.优化查询语句；3.使用分表分区技术；4.适当升级硬件。

MySQL中有什么看法？Apr 28, 2025 am 12:04 AM

MySQL视图是基于SQL查询结果的虚拟表，不存储数据。1)视图简化复杂查询，2)增强数据安全性，3)维护数据一致性。视图是数据库中的存储查询，可像表一样使用，但数据动态生成。

MySQL和其他SQL方言之间的语法有什么区别？Apr 27, 2025 am 12:26 AM

mysqldiffersfromothersqldialectsinsyntaxforlimit，自动启动，弦乐范围，子征服和表面上分析。1）MySqluessLipslimit，whilesqlserverusestopopandoraclesrontersrontsrontsrontsronnum.2）

什么是mysql分区？Apr 27, 2025 am 12:23 AM

MySQL分区能提升性能和简化维护。1）通过按特定标准（如日期范围）将大表分成小块，2）物理上将数据分成独立文件，3）查询时MySQL可专注于相关分区，4）查询优化器可跳过不相关分区，5）选择合适的分区策略并定期维护是关键。

您如何在MySQL中授予和撤销特权？Apr 27, 2025 am 12:21 AM

在MySQL中，如何授予和撤销权限？1.使用GRANT语句授予权限，如GRANTALLPRIVILEGESONdatabase_name.TO'username'@'host'；2.使用REVOKE语句撤销权限，如REVOKEALLPRIVILEGESONdatabase_name.FROM'username'@'host'，确保及时沟通权限变更。

See all articles