在抓取网页时遇到了一段报错信息:
Traceback (mostrecentcall last):
File "D:/pythonDevelop/spider/pic_grab.py", line 14, in <module>
print(get("http://pp.163.com/longer-yowoo/pp/10069141.html"))
File "D:/pythonDevelop/spider/pic_grab.py", line 8, in get
content = response.read().decode("utf8")
UnicodeDecodeError: 'utf-8' codeccan't decodebyte 0xc7 in position 69: invalidcontinuationbyte
抓取网页的代码及网址如下:
#!python
# encoding: utf-8
from urllib.requestimport urlopen
def get(url):
response = urlopen(url)
content = response.read().decode("utf8")
response.close()
return content
if __name__ == '__main__':
print(get("http://pp.163.com/longer-yowoo/pp/10069141.html"))
在错误信息中提示了网页的编码不是utf-8。那么如何确认网页的编码形式呢?有如下几种方式:
检测到网页的编码是gbk。修改后就可以了。
#########