三国演义小说txt：python爬取三国演义小说到txt文本文件中

2024-04-01 18:36:41 0 0

爬取的网站：《三国演义》全集在线阅读_史书典籍_诗词名句网 (shicimingju.com)

找到相应的代码段：

导入的数据库：

import requestsfrom bs4 import BeautifulSoup

代码如下：

fp = open('./sanguo.txt','w',encoding='utf-8')headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36" }main_url = "https://www.shicimingju.com/book/sanguoyanyi.html"page_text = requests.get(url = main_url,headers = headers)page_text.encoding = page_text.apparent_encodingpage_text = page_text.textsoup = BeautifulSoup(page_text,'lxml')a_list = soup.select('.book-mulu > ul > li > a')for a in a_list: title = a.string detail_url = 'https://www.shicimingju.com' + a['href'] page_text_detail = requests.get(detail_url,headers = headers) page_text_detail.encoding = page_text_detail.apparent_encoding page_text_detail = page_text_detail.text soup = BeautifulSoup(page_text_detail,'lxml') div_tag = soup.find('div',class_ = 'chapter_content') content = div_tag.text fp.write(title + ':' + content + '\n') print(title,'保存成功!!!')fp.close()

运行结果如下：