博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
安装python爬虫scrapy踩过的那些坑(转载+整理)
阅读量:5058 次
发布时间:2019-06-12

本文共 10953 字,大约阅读时间需要 36 分钟。

运行环境:CentOS 6.0

需要软件:python 2.7  pip  scrapy


1、升级python

  • 下载python2.7并安装
#wget https://www.python.org/ftp/python/2.7.10/Python-2.7.10.tgz#tar -zxvf Python-2.7.10.tgz#cd Python-2.7.10#./configure  #make all             #make install  #make clean  #make distclean
  • 检查python版本
#python --version

发现还是2.6

  • 更改python命令指向
#mv /usr/bin/python /usr/bin/python2.6.6_bak#ln -s /usr/local/bin/python2.7 /usr/bin/python
  • 再次检查版本
# python --versionPython 2.7.10

到这里,python算是升级完成了,继续安装scrapy。于是pip install scrapy,还是报错。

Collecting Twisted>=10.0.0 (from scrapy)  Could not find a version that satisfies the requirement Twisted>=10.0.0 (from scrapy) (from versions: )No matching distribution found for Twisted>=10.0.0 (from scrapy)

少了 Twisted,于是安装 Twisted

2、安装Twisted

  • 下载Twisted(https://pypi.python.org/packages/source/T/Twisted/Twisted-15.2.1.tar.bz2#md5=4be066a899c714e18af1ecfcb01cfef7)
  • 安装
cd Twisted-15.2.1python setup.py install
  • 查看是否安装成功
pythonPython 2.7.10 (default, Jun  5 2015, 17:56:24) [GCC 4.4.4 20100726 (Red Hat 4.4.4-13)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import twisted>>>

此时索命twisted已经安装成功。于是继续pip install scrapy,还是报错。

3、安装libxlst、libxml2和xslt-config

Collecting libxlst  Could not find a version that satisfies the requirement libxlst (from versions: )No matching distribution found for libxlstCollecting libxml2  Could not find a version that satisfies the requirement libxml2 (from versions: )No matching distribution found for libxml2wget http://xmlsoft.org/sources/libxslt-1.1.28.tar.gzcd libxslt-1.1.28/./configuremakemake installwget ftp://xmlsoft.org/libxml2/libxml2-git-snapshot.tar.gzcd libxml2-2.9.2/./configuremakemake install

安装好以后继续pip install scrapy,幸运之星仍未降临

4、安装cryptography

Failed building wheel for cryptography

下载cryptography(https://pypi.python.org/packages/source/c/cryptography/cryptography-0.4.tar.gz)

安装

cd cryptography-0.4python setup.py buildpython setup.py install

发现安装的时候报错:

No package 'libffi' found

于是下载libffi下载并安装

wget ftp://sourceware.org/pub/libffi/libffi-3.2.1.tar.gzcd libffi-3.2.1makemake install

安装后发现仍然报错

Package libffi was not found in the pkg-config search path.    Perhaps you should add the directory containing `libffi.pc'    to the PKG_CONFIG_PATH environment variable    No package 'libffi' found

于是设置:PKG_CONFIG_PATH

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH

再次安装scrapy

pip install scrapy

幸运女神都去哪儿了?

ImportError: libffi.so.6: cannot open shared object file: No such file or directory

于是

whereis libffilibffi: /usr/local/lib/libffi.a /usr/local/lib/libffi.la /usr/local/lib/libffi.so
已经正常安装,网上搜索了一通,发现是LD_LIBRARY_PATH没设置,于是
export LD_LIBRARY_PATH=/usr/local/lib

于是继续安装cryptography-0.4

./configuremakemake install

此时正确安装,没有报错信息了。

5、继续安装scrapy

pip install scrapy

看着提示信息:

Building wheels for collected packages: cryptography  Running setup.py bdist_wheel for cryptography

在这里停了好久,在想幸运女神是不是到了。等了一会

Requirement already satisfied (use --upgrade to upgrade): zope.interface>=3.6.0 in /usr/local/lib/python2.7/site-packages/zope.interface-4.1.2-py2.7-linux-i686.egg (from Twisted>=10.0.0->scrapy)Collecting cryptography>=0.7 (from pyOpenSSL->scrapy)  Using cached cryptography-0.9.tar.gzRequirement already satisfied (use --upgrade to upgrade): setuptools in /usr/local/lib/python2.7/site-packages (from zope.interface>=3.6.0->Twisted>=10.0.0->scrapy)Requirement already satisfied (use --upgrade to upgrade): idna in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)Requirement already satisfied (use --upgrade to upgrade): pyasn1 in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)Requirement already satisfied (use --upgrade to upgrade): enum34 in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)Requirement already satisfied (use --upgrade to upgrade): ipaddress in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)Requirement already satisfied (use --upgrade to upgrade): cffi>=0.8 in /usr/local/lib/python2.7/site-packages (from cryptography>=0.7->pyOpenSSL->scrapy)Requirement already satisfied (use --upgrade to upgrade): ordereddict in /usr/local/lib/python2.7/site-packages (from enum34->cryptography>=0.7->pyOpenSSL->scrapy)Requirement already satisfied (use --upgrade to upgrade): pycparser in /usr/local/lib/python2.7/site-packages (from cffi>=0.8->cryptography>=0.7->pyOpenSSL->scrapy)Building wheels for collected packages: cryptography  Running setup.py bdist_wheel for cryptography  Stored in directory: /root/.cache/pip/wheels/d7/64/02/7258f08eae0b9c930c04209959c9a0794b9729c2b64258117eSuccessfully built cryptographyInstalling collected packages: cryptography  Found existing installation: cryptography 0.4    Uninstalling cryptography-0.4:      Successfully uninstalled cryptography-0.4Successfully installed cryptography-0.9

显示如此的信息。看到此刻,内流马面。谢谢CCAV,感谢MTV,钓鱼岛是中国的。终于安装成功了。

6、测试scrapy

创建测试脚本

cat > myspider.py <

 

测试脚本能否正常运行

scrapy runspider myspider.py2015-06-06 20:25:16 [scrapy] INFO: Scrapy 1.0.0rc2 started (bot: scrapybot)2015-06-06 20:25:16 [scrapy] INFO: Optional features available: ssl, http112015-06-06 20:25:16 [scrapy] INFO: Overridden settings: {}2015-06-06 20:25:16 [py.warnings] WARNING: :0: UserWarning: You do not have a working installation of the service_identity module: 'No module named service_identity'.  Please install it from 
and make sure all of its dependencies are satisfied. Without the service_identity module and a recent enough pyOpenSSL to support it, Twisted can perform only rudimentary TLS client hostname verification. Many valid certificate/hostname mappings may be rejected.2015-06-06 20:25:16 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState2015-06-06 20:25:16 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats2015-06-06 20:25:16 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware2015-06-06 20:25:16 [scrapy] INFO: Enabled item pipelines: 2015-06-06 20:25:16 [scrapy] INFO: Spider opened2015-06-06 20:25:16 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)2015-06-06 20:25:16 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:60232015-06-06 20:25:17 [scrapy] DEBUG: Crawled (200)
(referer: None)2015-06-06 20:25:17 [scrapy] INFO: Closing spider (finished)2015-06-06 20:25:17 [scrapy] INFO: Dumping Scrapy stats:{'downloader/request_bytes': 226, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 5383, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2015, 6, 6, 12, 25, 17, 310084), 'log_count/DEBUG': 2, 'log_count/INFO': 7, 'log_count/WARNING': 1, 'response_received_count': 1, 'scheduler/dequeued': 1, 'scheduler/dequeued/memory': 1, 'scheduler/enqueued': 1, 'scheduler/enqueued/memory': 1, 'start_time': datetime.datetime(2015, 6, 6, 12, 25, 16, 863599)}2015-06-06 20:25:17 [scrapy] INFO: Spider closed (finished)
运行正常(此时心中窃喜,^_^....)。

7、创建自己的scrapy项目(此时换了一个会话)

scrapy startproject tutorial

输出以下信息

Traceback (most recent call last):  File "/usr/local/bin/scrapy", line 9, in 
load_entry_point('Scrapy==1.0.0rc2', 'console_scripts', 'scrapy')() File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 552, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2672, in load_entry_point return ep.load() File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2345, in load return self.resolve() File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2351, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/__init__.py", line 48, in
from scrapy.spiders import Spider File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/spiders/__init__.py", line 10, in
from scrapy.http import Request File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/http/__init__.py", line 11, in
from scrapy.http.request.form import FormRequest File "/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/http/request/form.py", line 9, in
import lxml.html File "/usr/local/lib/python2.7/site-packages/lxml/html/__init__.py", line 42, in
from lxml import etreeImportError: /usr/lib/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by /usr/local/lib/python2.7/site-packages/lxml/etree.so)
心中无数个草泥马再次狂奔。怎么又不行了?难道会变戏法?定定神看了下:ImportError: /usr/lib/libxml2.so.2: version `LIBXML2_2.9.0' not found (required by /usr/local/lib/python2.7/site-packages/lxml/etree.so)。这是那样的熟悉呀!想了想,这怎么和前面的ImportError: libffi.so.6: cannot open shared object file: No such file or directory那么类似呢?于是

8、添加环境变量

export LD_LIBRARY_PATH=/usr/local/lib

再次运行:

scrapy startproject tutorial

输出以下信息:

[root@bogon scrapy]# scrapy startproject tutorial2015-06-06 20:35:43 [scrapy] INFO: Scrapy 1.0.0rc2 started (bot: scrapybot)2015-06-06 20:35:43 [scrapy] INFO: Optional features available: ssl, http112015-06-06 20:35:43 [scrapy] INFO: Overridden settings: {}New Scrapy project 'tutorial' created in:    /root/scrapy/tutorialYou can start your first spider with:    cd tutorial    scrapy genspider example example.com

尼玛的终于成功了。由此可见,scrapy运行的时候需要 LD_LIBRARY_PATH 环境变量的支持。可以考虑将其加入环境变量中

vi /etc/profile

添加:export LD_LIBRARY_PATH=/usr/local/lib 这行(前面的PKG_CONFIG_PATH也可以考虑添加进来,export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH)

保存后检查是否存在异常:

source /etc/profile

开一个新的会话运行

scrapy runspider myspider.py

发现正常运行,可见LD_LIBRARY_PATH是生效的。至此scrapy就算正式安装成功了。

查看scrapy版本:运行scrapy version,看了下scrapy的版本为“Scrapy 1.0.0rc2”

9、problems with python 2.7 on Centeros whith sqlite3 module

ln -s /usr/lib64/python2.6/lib-dynload/_sqlite3.so /usr/local/lib/python2.7/lib-dynload/

 

10、参考文档

http://scrapy.org/

http://doc.scrapy.org/en/master/

http://blog.csdn.net/slvher/article/details/42346887

http://blog.csdn.net/niying/article/details/27103081

http://www.cnblogs.com/xiaoruoen/archive/2013/02/27/2933854.html

 

 

 

http://mongodb.github.io/node-mongodb-native/api-generated/

 

http://www.tuicool.com/topics/11020113

http://api.mongodb.com/python/current/tutorial.html?_ga=1.121639885.739648235.1464769227

 

转载于:https://www.cnblogs.com/tatamizzz/p/5588047.html

你可能感兴趣的文章
Android实现 ScrollView + ListView无滚动条滚动
查看>>
java学习笔记之String类
查看>>
UVA 11082 Matrix Decompressing 矩阵解压(最大流,经典)
查看>>
jdk从1.8降到jdk1.7失败
查看>>
一些关于IO流的问题
查看>>
mongo备份操作
查看>>
8 -- 深入使用Spring -- 3...1 Resource实现类InputStreamResource、ByteArrayResource
查看>>
硬件笔记之Thinkpad T470P更换2K屏幕
查看>>
一个关于vue+mysql+express的全栈项目(六)------ 聊天模型的设计
查看>>
【知识库】-数据库_MySQL 的七种 join
查看>>
.net 写文件上传下载webservice
查看>>
noSQL数据库相关软件介绍(大数据存储时候,必须使用)
查看>>
iOS开发——缩放图片
查看>>
HTTP之URL的快捷方式
查看>>
满世界都是图论
查看>>
配置链路聚合中极小错误——失之毫厘谬以千里
查看>>
代码整洁
查看>>
蓝桥杯-分小组-java
查看>>
Java基础--面向对象编程1(类与对象)
查看>>
Android Toast
查看>>