大乐透python预测程序_scrapy框架爬取大乐透数据

[Python] 纯文本查看复制代码# -*- coding: utf-8 -*-# Scrapy settings for lottery_spider project## For simplicity, this file contains only settings considered important or# commonly used. You can find more sett

weixin_39649660

487人浏览 · 2020-11-28 21:11:01

weixin_39649660 · 2020-11-28 21:11:01 发布

[Python] 纯文本查看复制代码# -*- coding: utf-8 -*-

# Scrapy settings for lottery_spider project

# For simplicity, this file contains only settings considered important or

# commonly used. You can find more settings consulting the documentation:

# https://docs.scrapy.org/en/latest/topics/settings.html

# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html

# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'lottery_spider'

SPIDER_MODULES = ['lottery_spider.spiders']

NEWSPIDER_MODULE = 'lottery_spider.spiders'

# Crawl responsibly by identifying yourself (and your website) on the user-agent

#USER_AGENT = 'lottery_spider (+http://www.yourdomain.com)'

# Obey robots.txt rules

ROBOTSTXT_OBEY = False #False，不去寻找网站设置的rebots.txt文件；

# Configure maximum concurrent requests performed by Scrapy (default: 16)

#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)

# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay

# See also autothrottle settings and docs

DOWNLOAD_DELAY = 1 #配置爬虫速度，1秒一次

# The download delay setting will honor only one of:

#CONCURRENT_REQUESTS_PER_DOMAIN = 16

#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)

#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)

#TELNETCONSOLE_ENABLED = False

# Override the default request headers:

DEFAULT_REQUEST_HEADERS = { #配置爬虫的请求头，模拟浏览器请求；

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

'Accept-Language': 'en',

'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36'

}

# Enable or disable spider middlewares

# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html

#SPIDER_MIDDLEWARES = {

# 'lottery_spider.middlewares.LotterySpiderSpiderMiddleware': 543,

# Enable or disable downloader middlewares

# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html

#DOWNLOADER_MIDDLEWARES = {

# 'lottery_spider.middlewares.LotterySpiderDownloaderMiddleware': 543,

# Enable or disable extensions

# See https://docs.scrapy.org/en/latest/topics/extensions.html

#EXTENSIONS = {

# 'scrapy.extensions.telnet.TelnetConsole': None,

# Configure item pipelines

# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html

ITEM_PIPELINES = { #取消此配置的注释，让pipelines.py可以运行；

'lottery_spider.pipelines.LotterySpiderPipeline': 300,

}

# Enable and configure the AutoThrottle extension (disabled by default)

# See https://docs.scrapy.org/en/latest/topics/autothrottle.html

#AUTOTHROTTLE_ENABLED = True

# The initial download delay

#AUTOTHROTTLE_START_DELAY = 5

# The maximum download delay to be set in case of high latencies

#AUTOTHROTTLE_MAX_DELAY = 60

# The average number of requests Scrapy should be sending in parallel to

# each remote server

#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

# Enable showing throttling stats for every response received:

#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)

# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings

#HTTPCACHE_ENABLED = True

#HTTPCACHE_EXPIRATION_SECS = 0

#HTTPCACHE_DIR = 'httpcache'

#HTTPCACHE_IGNORE_HTTP_CODES = []

#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

OpenLoong项目通过技术监督委员会（TOC）评审

开放原子开发者工作坊

开发者谈开源：KWDB开源数据库的未来路径与生态构建实践

开放原子开发者工作坊

开发者谈开源：洞悉协作创新背后的机遇与挑战

近日，在2024开放原子开发者大会暨首届开源技术学术大会开幕式上，开放原子开源基金会与openKylin、EasyAda、KWDB开源项目举行捐赠签约仪式。一场捐赠签约仪式，让三个开源项目及其背后的开发者们受到瞩目。本次，我们与“龘”（EasyAda）核心维护者王伶卓开启了对话。

开放原子开发者工作坊

所有评论(0)

查看更多评论

weixin_39649660

@weixin_39649660

已为社区贡献8条内容