本代码作用:使用python和re正则爬取电影网站https://www.88ys.cc的图片和标题
操作步骤如下:
1、在浏览器打开https://www.88ys.cc,按F12-F5,依次解析出图片标签在img标签里,如下图所示:
2、在pycharm里编写源代码,爬取思路依次如下:
①导入相应的库
②对网站进行GET请求并按照正则表达式解析img标签的信息
③对img标签的网站进行get请求并解析
④读取img标签的网站的响应内容,并写入本机
代码如下:
import requests from bs4 import BeautifulSoup import re import random def down_movie(): fronturl="https://www.88ys.cc" #设置请求网站 headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"} #设置请求头 response=requests.get(fronturl,headers=headers) #对网站进行get请求并将请求结果赋值给response response.encoding=response.apparent_encoding #根据响应内容解析出编码格式,并赋值给response.encoding html=response.text #将响应的文本内容赋值给html pattern1=re.compile(r'<img.*?data-original="(.*?)".*?alt="(.*?)".*?>') #对网站的img标签进行正则匹配,并赋值给pattern1 result1=re.findall(pattern1,html) #使用正则匹配规则pattern1对网页源代码html进行查找,并将所有的查找结果赋值给result1 print(result1) #打印result1的值 path="F:/PPT图片/" #设置相对路径 try: for info in result1: #遍历result1里的信息info abspath=path+info[1]+".jpg" #设置绝对路径,用来保存图片 respons1=requests.get(url=info[0],headers=headers) #对info[0]即图片网址进行get请求,并将请求结果赋值给response1 respons1.encoding=respons1.apparent_encoding #根据响应内容解析出编码格式,并赋值给response1.encoding content1=respons1.content #将网页的二进制数据赋值给content1 with open(abspath,"wb") as f: #打开绝对路径,并进行写入操作,并将路径赋值给f f.write(content1) #对f进行写入操作,写入的数据为content1 f.close() #关闭f文件 print(info[1]+"爬取结束!") #打印爬取结束 except: #否则当接收到错误信息时 print("爬取失败") #打印爬取失败 finally: #最终 print("爬取结束") #打印爬取结束 if __name__ == '__main__': down_movie() #调用down_movie函数
代码运行结果如下:
[(‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2018-01/156463206966589138.jpg’, ‘斗罗大陆’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157175340201253261.jpg’, ‘谍战深海之惊蛰’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157131840172771316.jpg’, ‘没有秘密的你’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157184940206374838.jpg’, ‘初恋那件小事’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156967440136423509.jpg’, ‘少年的你’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157358280762433964.jpg’, ‘从前有座灵剑山’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2018-07/156460309738657638.jpg’, ‘西行纪’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157244160139809543.jpg’, ‘恋恋江湖’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157192200201346124.jpg’, ‘奔腾年代’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-04/201904091554804565.jpg’, ‘海贼王’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157106520226666247.jpg’, ‘明月照我心’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157097340237842775.jpg’, ‘光荣时代’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2018-12/156460175032735944.jpg’, ‘知否知否应是绿肥红瘦’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-04/201904091554804826.jpg’, ‘火影忍者’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-06/15616401227.jpg’, ‘陈情令’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157192381429409255.jpg’, ‘海棠经雨胭脂透’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2017-12/156466538939744812.jpg’, ‘鸡毛飞上天’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156852780379388757.jpg’, ‘小丑’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157133460399133921.jpg’, ‘纪实72小时[中国版]第二季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157443120140083190.jpg’, ‘快递员’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157413780207147028.jpg’, ‘伯纳黛特你去了哪’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157442640172273840.jpg’, ‘极速车王’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156941700425569619.jpg’, ‘攀登者’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157441320153968369.jpg’, ‘冰雪奇缘2’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157442460202497933.jpg’, ‘灼人秘密’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156888840279653945.jpg’, ‘第一滴血5:最后的血’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157347660377461609.jpg’, ‘准备好了没’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157397280201892499.jpg’, ‘她的马拉松’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157159140214033865.jpg’, ‘豫见后来’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-05/15567227211.jpg’, ‘再次,春天’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157383000329459880.jpg’, ‘融和不容易’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156965460173069300.jpg’, ‘完美之声’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157442040194416688.jpg’, ‘幻想工程故事第一季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157225680237185889.jpg’, ‘弗莱彻夫人’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157202460143333469.jpg’, ‘威尔和格蕾丝第十一季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157206720638521143.jpg’, ‘麻辣拳拳’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157274820199477392.jpg’, ‘为全人类第一季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157084744569146690.jpg’, ‘邪恶第一季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156958020171257769.jpg’, ‘小谢尔顿第三季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156958140199227376.jpg’, ‘邪恶’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157192381429409255.jpg’, ‘海棠经雨胭脂透’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157373605533945523.jpg’, ‘暹罗密码’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2018-02/156463167610965616.jpg’, ‘鲁豫有约’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157383780195638013.jpg’, ‘妻子的浪漫旅行3秘密版’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-08/156596580161731115.jpg’, ‘女儿们的恋爱第二季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157209780387061079.jpg’, ‘这样唱好美’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157080300200297873.jpg’, ‘演员请就位’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-08/156551882022473729.jpg’, ‘寻情记’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-08/156516853083984691.jpeg’, ‘欢乐集结号’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157330082598259219.jpg’, ‘演技派’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-04/15552969053.jpg’, ‘非常静距离2019’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157330082429548175.jpg’, ‘超次元对决’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2018-06/156460342783985821.jpg’, ‘麻辣天后传’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157259940281771556.jpg’, ‘明星大侦探第五季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2017-12/1564649546781312966.jpg’, ‘穿越时空的少女’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2017-12/1564649486952820077.jpg’, ‘红辣椒’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-09/156960900227092302.jpg’, ‘银河英雄传说 Die Neue These 星乱 第1章’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2017-12/156465170660546513.jpg’, ‘探险活宝第四季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2017-12/156465169684085056.jpg’, ‘探险活宝第一季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157199940129581641.jpg’, ‘心理测量者3’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-10/157136580261439591.jpg’, ‘非枪人生’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-08/156717360136876855.jpg’, ‘霹雳靖玄录下阕’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-08/156613020314206013.jpg’, ‘独步星海’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-04/15563647241.jpg’, ‘逆天邪神第一季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2018-06/156460346288157244.jpg’, ‘蜘蛛侠第二季’), (‘https://cdn.aqdstatic.com:966/88ys/upload/vod/2019-11/157442041597075155.jpg’, ‘全金属狂潮3’)]
斗罗大陆爬取结束!
谍战深海之惊蛰爬取结束!
没有秘密的你爬取结束!
初恋那件小事爬取结束!
少年的你爬取结束!
从前有座灵剑山爬取结束!
西行纪爬取结束!
恋恋江湖爬取结束!
奔腾年代爬取结束!
海贼王爬取结束!
明月照我心爬取结束!
光荣时代爬取结束!
知否知否应是绿肥红瘦爬取结束!
火影忍者爬取结束!
陈情令爬取结束!
海棠经雨胭脂透爬取结束!
鸡毛飞上天爬取结束!
小丑爬取结束!
纪实72小时[中国版]第二季爬取结束!
快递员爬取结束!
伯纳黛特你去了哪爬取结束!
极速车王爬取结束!
攀登者爬取结束!
冰雪奇缘2爬取结束!
灼人秘密爬取结束!
第一滴血5:最后的血爬取结束!
准备好了没爬取结束!
她的马拉松爬取结束!
豫见后来爬取结束!
再次,春天爬取结束!
融和不容易爬取结束!
完美之声爬取结束!
幻想工程故事第一季爬取结束!
弗莱彻夫人爬取结束!
威尔和格蕾丝第十一季爬取结束!
麻辣拳拳爬取结束!
为全人类第一季爬取结束!
邪恶第一季爬取结束!
小谢尔顿第三季爬取结束!
邪恶爬取结束!
海棠经雨胭脂透爬取结束!
暹罗密码爬取结束!
鲁豫有约爬取结束!
妻子的浪漫旅行3秘密版爬取结束!
女儿们的恋爱第二季爬取结束!
这样唱好美爬取结束!
演员请就位爬取结束!
寻情记爬取结束!
欢乐集结号爬取结束!
演技派爬取结束!
非常静距离2019爬取结束!
超次元对决爬取结束!
麻辣天后传爬取结束!
明星大侦探第五季爬取结束!
穿越时空的少女爬取结束!
红辣椒爬取结束!
银河英雄传说 Die Neue These 星乱 第1章爬取结束!
探险活宝第四季爬取结束!
探险活宝第一季爬取结束!
心理测量者3爬取结束!
非枪人生爬取结束!
霹雳靖玄录下阕爬取结束!
独步星海爬取结束!
逆天邪神第一季爬取结束!
蜘蛛侠第二季爬取结束!
全金属狂潮3爬取结束!
爬取结束
代码和代码运行结果图如下所示:
存入本机的图片如下图所示:
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://itzsg.com/10422.html