1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
| import requests as rq import re import time import os '''未来考虑加入多关键词筛选'''
keyword = input('Safebooru Search: ') keyword = str(keyword) pages = input('How many pages do you want to search?(About 40p per-page): ') pages = int(pages) url = 'http://safebooru.org/index.php?page=post&s=list' page_index = 1 img_url = [] while page_index <= pages: data = { 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding':'gzip, deflate, sdch', 'Accept-Language':'zh-CN,zh;q=0.8,en;q=0.6', 'Connection':'keep-alive', 'Host':'safebooru.org', 'Upgrade-Insecure-Requests':1, 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Mobile Safari/537.36', 'page':'post', 's':'list', 'tags':keyword, 'pid':(page_index-1)*40 } try: res = rq.get(url, data, timeout=30) res.encoding = 'utf-8' match_patter = r'src="//safebooru.org/thumbnails/([a-zA-Z0-9_/:.-?]*)"' matcher = re.compile(match_patter) img_url += matcher.findall(res.text) print('---------------------------------') print('Get page %d of %d!' %(page_index, pages)) if page_index < pages: print('Next page starts in 3 secs...') time.sleep(3) except rq.exceptions.ConnectionError as e: e_str = str(e) print('Error type: %s! Jump to next page!' %e_str) finally: page_index += 1 print('Totally find %d pics!' %len(img_url)) print('---------------------------------')
new_img_url = [] for each in img_url: new_each = each.replace('thumbnail', 'sample') new_each = 'http://safebooru.org//samples/' + new_each new_img_url.append(new_each)
if not os.path.exists('./pic/'): os.mkdir('./pic/')
img_counter = 0 for each in new_img_url: try: img_page = rq.get(each, timeout=30) if img_page.status_code == 404: new_url = each.replace('sample_', '') new_url = new_url.replace('samples', 'images') img_page = rq.get(new_url, timeout=30) with open('./pic/'+str(img_counter)+'.jpg', 'wb') as img: img.write(img_page.content) print(str(img_counter+1)+' of '+str(len(new_img_url))+' pics'+' are done!') except rq.exceptions.ConnectionError as e: e_str = str(e) print('Error type: %s! Jump to next image URL!' %e_str) finally: img_counter += 1 print('---------------------------------') print('Done with %d pics!' %img_counter) os.system('pause')
|