搜尋
首頁資料庫mysql教程MongoDB Connector for Hadoop

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics

by Mike O’Brien, MongoDB Kernel Tools Lead and maintainer of Mongo-Hadoop, the Hadoop Adapter for MongoDB

Hadoop is a powerful, JVM-based platform for running Map/Reduce jobs on clusters of many machines, and it excels at doing analytics and processing tasks on very large data sets.

Since MongoDB excels at storing large operational data sets for applications, it makes sense to explore using these together - MongoDB for storage and querying, and Hadoop for batch processing.

The MongoDB Connector for Hadoop

We recently released the 1.1 release of the MongoDB Connector for Hadoop. The MongoDB Connector for Hadoop makes it easy to use Mongo databases, or MongoDB backup files in .bson format, as the input source or output destination for Hadoop Map/Reduce jobs. By inspecting the data and computing input splits, Hadoop can process the data in parallel so that very large datasets can be processed quickly.

The MongoDB Connector for Hadoop also includes support for Pig and Hive, which allow very sophisticated MapReduce workflows to be executed just by writing very simple scripts.

  • Pig is a high-level scripting language for data analysis and building map/reduce workflows
  • Hive is a SQL-like language for ad-hoc queries and analysis of data sets on Hadoop-compatible file systems.

Hadoop streaming is also supported, so map/reduce functions can be written in any language besides Java. Right now the MongoDB Connector for Hadoop supports streaming in Ruby, Node.js and Python.

How it Works

How the Hadoop connector works

  • The adapter examines the MongoDB Collection and calculates a set of splits from the data
  • Each of the splits gets assigned to a node in Hadoop cluster
  • In parallel, Hadoop nodes pull data for their splits from MongoDB (or BSON) and process them locally
  • Hadoop merges results and streams output back to MongoDB or BSON

I’ll be giving an hour-long webinar on What’s New with the Mongo-Hadoop integration. The webinar will cover

  • Using Java MapReduce with the MongoDB Connector for Hadoop
  • Using Hadoop Streaming for other non-JVM languages
  • Writing Pig Scripts with the MongoDB Connector for Hadoop
  • MongoDB and Hadoop usage with Elastic MapReduce to easily kick off your Hadoop jobs

  • Overview of MongoUpdateWriteable: Using the result output from Hadoop to modify an existing output collection

The webinar will be offered twice on August 8:

  • 8 am PDT / 11 am EDT / 3pm UTC
  • 11am PDT / 2pm EDT / 6pm UTC

Register for the Webinar on August 8

Update: Watch the webinar recording

陳述
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn
與其他RDBM相比,MySQL如何處理並發?與其他RDBM相比,MySQL如何處理並發?Apr 29, 2025 am 12:44 AM

MySQLhandlesconcurrencyusingamixofrow-levelandtable-levellocking,primarilythroughInnoDB'srow-levellocking.ComparedtootherRDBMS,MySQL'sapproachisefficientformanyusecasesbutmayfacechallengeswithdeadlocksandlacksadvancedfeatureslikePostgreSQL'sSerializa

MySQL與其他關係數據庫相比如何處理交易?MySQL與其他關係數據庫相比如何處理交易?Apr 29, 2025 am 12:37 AM

mySqlHandLestActionSefectefectionalytheinnodbengine,supportingAcidPropertiessimilartopostgresqlesqlandoracle.1)mySqluessRepeTableReadAbereadasTheDefaultIsolationLeleleteLevel,whatcanBeadJustEdToreDtoreDtoreDtoreadCommittedCommittenCommententCommittedForHigh-TrafficsCenarios.2)

MySQL中有哪些數據類型?MySQL中有哪些數據類型?Apr 29, 2025 am 12:28 AM

MySQL的數據類型分為數值、日期和時間、字符串、二進制和空間類型。選擇正確的類型可以優化數據庫性能和數據存儲。

在MySQL中編寫有效的SQL查詢的最佳實踐是什麼?在MySQL中編寫有效的SQL查詢的最佳實踐是什麼?Apr 29, 2025 am 12:24 AM

最佳實踐包括:1)理解數據結構和MySQL處理方式,2)適當索引,3)避免SELECT*,4)使用合適的JOIN類型,5)謹慎使用子查詢,6)使用EXPLAIN分析查詢,7)考慮查詢對服務器資源的影響,8)定期維護數據庫。這些做法能使MySQL查詢不僅快速,還具備可維護性、可擴展性和資源效率。

MySQL與PostgreSQL有何不同?MySQL與PostgreSQL有何不同?Apr 29, 2025 am 12:23 AM

MySQLisbetterforspeedandsimplicity,suitableforwebapplications;PostgreSQLexcelsincomplexdatascenarioswithrobustfeatures.MySQLisidealforquickprojectsandread-heavytasks,whilePostgreSQLispreferredforapplicationsrequiringstrictdataintegrityandadvancedSQLf

MySQL如何處理數據複製?MySQL如何處理數據複製?Apr 28, 2025 am 12:25 AM

MySQL通過異步、半同步和組複製三種模式處理數據複製。 1)異步複製性能高但可能丟失數據。 2)半同步複製提高數據安全性但增加延遲。 3)組複製支持多主複製和故障轉移,適用於高可用性需求。

您如何使用解釋性語句分析查詢性能?您如何使用解釋性語句分析查詢性能?Apr 28, 2025 am 12:24 AM

EXPLAIN語句可用於分析和提升SQL查詢性能。 1.執行EXPLAIN語句查看查詢計劃。 2.分析輸出結果,關注訪問類型、索引使用情況和JOIN順序。 3.根據分析結果,創建或調整索引,優化JOIN操作,避免全表掃描,以提升查詢效率。

您如何備份並還原MySQL數據庫?您如何備份並還原MySQL數據庫?Apr 28, 2025 am 12:23 AM

使用mysqldump進行邏輯備份和MySQLEnterpriseBackup進行熱備份是備份MySQL數據庫的有效方法。 1.使用mysqldump備份數據庫:mysqldump-uroot-pmydatabase>mydatabase_backup.sql。 2.使用MySQLEnterpriseBackup進行熱備份:mysqlbackup--user=root--password=password--backup-dir=/path/to/backupbackup。恢復時,使用相應的命

See all articles

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

Video Face Swap

Video Face Swap

使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱工具

EditPlus 中文破解版

EditPlus 中文破解版

體積小,語法高亮,不支援程式碼提示功能

SublimeText3漢化版

SublimeText3漢化版

中文版,非常好用

WebStorm Mac版

WebStorm Mac版

好用的JavaScript開發工具

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

強大的PHP整合開發環境

SublimeText3 Mac版

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)