首页 >数据库 >mysql教程 >Scaling Big Data Mining Infrastructure at Twitter

Scaling Big Data Mining Infrastructure at Twitter

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB原创: 2016-06-07 16:36:16981浏览

I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two thing

DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”

and then the reality check:

Your boss says something vague
You think very hard on how to move the needle
Where’s the data?
What’s in this dataset?
What’s all the f#$#$ crap in the data?
Clean the data
Run some off-the-shelf data mining algorithm
…
Productionize, act on the insight
Rinse, repeat

Enjoy!

Scaling Big Data Mining Infrastructure Twitter Experience

Original title and link: Scaling Big Data Mining Infrastructure at Twitter (NoSQL database?myNoSQL)

原文地址：Scaling Big Data Mining Infrastructure at Twitter, 感谢原作者分享。

声明：

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系admin@php.cn

上一篇：Mysql主从复制，单台服务器上实施下一篇：《高性能MySQL》第三版

查看更多

Scaling Big Data Mining Infrastructure at Twitter

相关文章