首頁 >資料庫 >mysql教程 >Scaling Big Data Mining Infrastructure at Twitter

Scaling Big Data Mining Infrastructure at Twitter

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB原創: 2016-06-07 16:36:16983瀏覽

I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two thing

DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”

and then the reality check:

Your boss says something vague
You think very hard on how to move the needle
Where’s the data?
What’s in this dataset?
What’s all the f#$#$ crap in the data?
Clean the data
Run some off-the-shelf data mining algorithm
…
Productionize, act on the insight
Rinse, repeat

Enjoy!

Scaling Big Data Mining Infrastructure Twitter Experience

Original title and link: Scaling Big Data Mining Infrastructure at Twitter (NoSQL database?myNoSQL)

原文地址：Scaling Big Data Mining Infrastructure at Twitter, 感谢原作者分享。

陳述：

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

上一篇：Mysql主从复制，单台服务器上实施下一篇：《高性能MySQL》第三版

看更多

Scaling Big Data Mining Infrastructure at Twitter

相關文章