python - scrapy 采集数据的时候直接入主库，还是先入临时库呢？

Question

我的网站是用来展示一些信息的 我打算去采集一些网站的数据，使用 scrapy 不过我有一些担心，就是担心 如果对方发现我采集，他对他的数据进行一些变动：比如内容超长，改变编码，改变什么 然后，我的采集程序傻傻...

PHPz · Answer

Finally I decided to use option 2

高洛峰 · Answer

You can consider cooperating with redis, first write the content to redis for caching, and then import the redis content into mongodb or the like.

You can write some rules during the import process to filter unsatisfied records. This should meet the needs. At the same time, the cache and main table in redis can have different table structures.

python - scrapy 采集数据的时候直接入主库，还是先入临时库呢？

reply all(2)I'll reply