집 >데이터 베이스 >MySQL 튜토리얼 >Scaling Big Data Mining Infrastructure at Twitter

Scaling Big Data Mining Infrastructure at Twitter

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB원래의: 2016-06-07 16:36:16979검색

I’m almost always enjoying the lessons learned-style presentations from Twitter’s people. The slides below, by Jimmy Lin and Dmitriy Ryaboy, have been used at HadoopSummit. Besides the technical and practical details, there are two thing

DJ Patil: “It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data”

and then the reality check:

Your boss says something vague
You think very hard on how to move the needle
Where’s the data?
What’s in this dataset?
What’s all the f#$#$ crap in the data?
Clean the data
Run some off-the-shelf data mining algorithm
…
Productionize, act on the insight
Rinse, repeat

Enjoy!

Scaling Big Data Mining Infrastructure Twitter Experience

Original title and link: Scaling Big Data Mining Infrastructure at Twitter (NoSQL database?myNoSQL)

原文地址：Scaling Big Data Mining Infrastructure at Twitter, 感谢原作者分享。

성명：

이전 기사：Mysql主从复制，单台服务器上实施다음 기사：《高性能MySQL》第三版

Scaling Big Data Mining Infrastructure at Twitter

관련 기사