补上今天的功课用etree爬虫,最近准备考雅思,爬取一些“英语例句”

爱必应

以前就一直在用的一个很好的学英语的网站,一直是人工做笔记。
最近在锻炼自己python能力的同时,也可以充实自己的英语,准备考试。

[Python] 纯文本查看 复制代码

import requestsfrom lxml import etreeimport csvurl = "https://www.youdict.com/w/"headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'}wordlist = open("wordlist.csv", "r")wordlist_dict = csv.DictReader(wordlist)def sentence_download (url_word, headers): res = requests.get(url_word, headers=headers) res.encoding = res.apparent_encoding # 编码模式默认为ISO-8859-1 但是用apparent_encoding则可以解析中文返回值:utf-8 # 所以将原本apparent_encoding = utf-8赋予原来的ISO-8859-1的编码,从而可以正常解析 html = etree.HTML(res.text) for i in range (6): i = str(i) contain1 = html.xpath('//div[@class="row"]/dl['+i+']/dt/text()[1]') contain2 = html.xpath('//div[@class="row"]/dl['+i+']/dt/b/text()') contain3 = html.xpath('//div[@class="row"]/dl['+i+']/dt/text()[2]') sentence = ''.join(contain1 + contain2 + contain3) print(sentence)if __name__ == '__main__': for word in wordlist_dict: url_word = url + word["word"] print("{}例句".format(word["word"])) # print(word["word"]+"例句") sentence_download(url_word, headers)

但是我还是遇到了问题:
1.我的python写的不是很合规矩。看见大佬们用函数什么的写,我也在尝试。希望大啦提点一下。
2.为什么我print出来的内容,存在很大的空格

比如[Plain Text] 纯文本查看 复制代码

amid例句1. They announced, amid much ballyhoo, that they had made a breakthrough. 2. Dr Amid was assisted by a young Asian nurse. 3. Amid the trees the sea mist was dripping. 4. Children were changing classrooms amid laughter and shouts. 5. Dr Amid probed around the sensitive area. 

这就看着很不舒服。所以请各位不要吝啬快来羞辱我吧

文件下载地址暂时未公布,需要的朋友请在下方留言,看到后会第一时间更新下载地址。

声明:本站所有资源均由网友分享,如有侵权内容,请在文章下方留言,本站会立即处理。

原文链接:,转发请注明来源!

发表评论