The Platform and Infrastructure team at eBay Inc. is happy to announce the open-sourcing of Oink – a self-service solution to Apache Pig. Pig and Hadoop overview Apache Pig?is a platform for analyzing large data sets. It uses a high-level
The Platform and Infrastructure team at eBay Inc. is happy to announce the open-sourcing of Oink – a self-service solution to Apache Pig.
Pig and Hadoop overview
Apache Pig?is a platform for analyzing large data sets. It uses a high-level language for expressing data analysis programs, coupled with the infrastructure for evaluating these programs. Pig abstracts the Map/Reduce paradigm, making it very easy for users to write complex tasks using Pig’s language, called Pig Latin. Because execution of tasks can be optimized automatically, Pig Latin allows users to focus on semantics rather than efficiency. Another key benefit of Pig Latin is extensibility:? users can do special-purpose processing by creating their own functions.
Apache Hadoop and Pig provide an excellent platform for extracting and analyzing data from very large application logs. At eBay, we on the Platform and Infrastructure team are responsible for storing TBs of logs that are generated every day from thousands of eBay application servers. Hadoop and Pig offer us an array of tools to search and view logs and to generate reports on application behavior. As the logs are available in Hadoop, engineers (users of applications) also have the ability to use Hadoop and Pig to do custom processing, such as Pig scripting to extract useful information.
The problem
Today, Pig is primarily used through the command line to spawn jobs. This model wasn’t well suited to the Platform team at eBay, as the cluster that housed the application logs was shared with other teams. This situation created a number of issues:
- Governance – In a shared-cluster scenario, governance is critically important to attain. Pig scripts and requests of one customer should not impact those of other customers and stakeholders of the cluster. In addition, providing CLI access would make governance difficult in terms of controlling the number of job submissions.
- Scalability – CLI access to all Pig customers created another challenge:? scalability. Onboarding customers takes time and is a cumbersome process.
- Change management – No easy means existed to upgrade or modify common libraries.
Hence, we needed a solution that acted as a gateway to Pig job submission, provided QoS, and abstracted the user from cluster configuration.
The solution:? Oink
Oink solves the above challenges not only by allowing execution of Pig requests through a REST interface, but also by enabling users to register jars, view the status of Pig requests, view Pig request output, and even cancel a running Pig request. With the REST interface, the user has a cleaner way to submit Pig requests compared to CLI access. Oink serves as a single point of entry for Pig requests, thereby facilitating rate limiting and QoS enforcement for different customers.
Oink runs as a servlet inside a web container and allows users to run multiple requests in parallel within a single JVM instance. This capability was not supported initially, but rather required the help of the patch found in PIG-3866. This patch provides multi-tenant environment support so that different users can share the same instance.
With Oink, eBay’s Platform and Infrastructure team has been able to onboard 100-plus different use cases onto its cluster. Currently, more than 6000 Pig jobs run every day without any manual intervention from the team.
Special thanks to Vijay Samuel, Ruchir Shah, Mahesh Somani, and Raju Kolluru for open-sourcing Oink. If you have any queries related to Oink, please submit an issue through GitHub.
原文地址:Oink : Making Pig Self-Service, 感谢原作者分享。

本文討論了使用MySQL的Alter Table語句修改表,包括添加/刪除列,重命名表/列以及更改列數據類型。

文章討論了為MySQL配置SSL/TLS加密,包括證書生成和驗證。主要問題是使用自簽名證書的安全含義。[角色計數:159]

文章討論了流行的MySQL GUI工具,例如MySQL Workbench和PhpMyAdmin,比較了它們對初學者和高級用戶的功能和適合性。[159個字符]

本文討論了使用Drop Table語句在MySQL中放下表,並強調了預防措施和風險。它強調,沒有備份,該動作是不可逆轉的,詳細介紹了恢復方法和潛在的生產環境危害。

本文討論了在PostgreSQL,MySQL和MongoDB等各個數據庫中的JSON列上創建索引,以增強查詢性能。它解釋了索引特定的JSON路徑的語法和好處,並列出了支持的數據庫系統。

文章討論了使用準備好的語句,輸入驗證和強密碼策略確保針對SQL注入和蠻力攻擊的MySQL。(159個字符)


熱AI工具

Undresser.AI Undress
人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover
用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool
免費脫衣圖片

Clothoff.io
AI脫衣器

AI Hentai Generator
免費產生 AI 無盡。

熱門文章

熱工具

WebStorm Mac版
好用的JavaScript開發工具

mPDF
mPDF是一個PHP庫,可以從UTF-8編碼的HTML產生PDF檔案。原作者Ian Back編寫mPDF以從他的網站上「即時」輸出PDF文件,並處理不同的語言。與原始腳本如HTML2FPDF相比,它的速度較慢,並且在使用Unicode字體時產生的檔案較大,但支援CSS樣式等,並進行了大量增強。支援幾乎所有語言,包括RTL(阿拉伯語和希伯來語)和CJK(中日韓)。支援嵌套的區塊級元素(如P、DIV),

Dreamweaver CS6
視覺化網頁開發工具

PhpStorm Mac 版本
最新(2018.2.1 )專業的PHP整合開發工具

MinGW - Minimalist GNU for Windows
這個專案正在遷移到osdn.net/projects/mingw的過程中,你可以繼續在那裡關注我們。 MinGW:GNU編譯器集合(GCC)的本機Windows移植版本,可自由分發的導入函式庫和用於建置本機Windows應用程式的頭檔;包括對MSVC執行時間的擴展,以支援C99功能。 MinGW的所有軟體都可以在64位元Windows平台上運作。