search
HomeBackend DevelopmentPython TutorialThose interesting and powerful Python libraries
Those interesting and powerful Python librariesApr 27, 2023 pm 08:49 PM
pythondataakshare

The python language has always been famous for its rich third-party libraries. Today I will introduce some very nice libraries, which are fun, fun and powerful!

Data Collection

In today’s Internet era, data is really important. First, let’s introduce several excellent data collection projects

AKShare

AKShare is a Python-based financial data interface library that aims to collect fundamental data, real-time and historical market data, and derivative data from financial products such as stocks, futures, options, funds, foreign exchange, bonds, indices, and cryptocurrencies. , a set of tools from data cleaning to data landing, mainly used for academic research purposes.

import akshare as ak

stock_zh_a_hist_df = ak.stock_zh_a_hist(symbol="000001", period="daily", start_date="20170301", end_date='20210907', adjust="")
print(stock_zh_a_hist_df)

Output:

日期开盘 收盘最高...振幅 涨跌幅 涨跌额 换手率
0 2017-03-01 9.49 9.49 9.55...0.840.110.010.21
1 2017-03-02 9.51 9.43 9.54...1.26 -0.63 -0.060.24
2 2017-03-03 9.41 9.40 9.43...0.74 -0.32 -0.030.20
3 2017-03-06 9.40 9.45 9.46...0.740.530.050.24
4 2017-03-07 9.44 9.45 9.46...0.630.000.000.17
............... ... ... ... ...
11002021-09-0117.4817.8817.92...5.110.450.081.19
11012021-09-0218.0018.4018.78...5.482.910.521.25
11022021-09-0318.5018.0418.50...4.35 -1.96 -0.360.72
11032021-09-0617.9318.4518.60...4.552.270.410.78
11042021-09-0718.6019.2419.56...6.564.280.790.84
[1105 rows x 11 columns]

https://github.com/akfamily/akshare

TuShare

TuShare is the implementation of A tool for data collection, cleaning, processing and data storage of financial data such as stocks/futures, which meets the data acquisition needs of financial quantitative analysts and people who study data analysis. It is characterized by wide data coverage and simple interface calls. Respond quickly.

However, some functions of this project are chargeable, please choose to use them

import tushare as ts

ts.get_hist_data('600848') #一次性获取全部数据

Output:

 openhigh close low volumep_changema5 
date 
2012-01-11 6.880 7.380 7.060 6.880 14129.96 2.62 7.060 
2012-01-12 7.050 7.100 6.980 6.9007895.19-1.13 7.020 
2012-01-13 6.950 7.000 6.700 6.6906611.87-4.01 6.913 
2012-01-16 6.680 6.750 6.510 6.4802941.63-2.84 6.813 
2012-01-17 6.660 6.880 6.860 6.4608642.57 5.38 6.822 
2012-01-18 7.000 7.300 6.890 6.880 13075.40 0.44 6.788 
2012-01-19 6.690 6.950 6.890 6.6806117.32 0.00 6.770 
2012-01-20 6.870 7.080 7.010 6.8706813.09 1.74 6.832 

 ma10ma20v_ma5 v_ma10 v_ma20 turnover
date
2012-01-11 7.060 7.060 14129.96 14129.96 14129.96 0.48
2012-01-12 7.020 7.020 11012.58 11012.58 11012.58 0.27
2012-01-13 6.913 6.9139545.679545.679545.67 0.23
2012-01-16 6.813 6.8137894.667894.667894.66 0.10
2012-01-17 6.822 6.8228044.248044.248044.24 0.30
2012-01-18 6.833 6.8337833.338882.778882.77 0.45
2012-01-19 6.841 6.8417477.768487.718487.71 0.21
2012-01-20 6.863 6.8637518.008278.388278.38 0.23

https://github.com/waditu/tushare

GoPUP

The data collected by the GoPUP project comes from public data sources and does not involve any personal privacy data or non-public data. But similarly, some interfaces require TOKEN registration before they can be used.

import gopup as gp
df = gp.weibo_index(word="疫情", time_type="1hour")
print(df)

Output:

疫情
index
2022-12-17 18:15:0018544
2022-12-17 18:20:0014927
2022-12-17 18:25:0013004
2022-12-17 18:30:0013145
2022-12-17 18:35:0013485
2022-12-17 18:40:0014091
2022-12-17 18:45:0014265
2022-12-17 18:50:0014115
2022-12-17 18:55:0015313
2022-12-17 19:00:0014346
2022-12-17 19:05:0014457
2022-12-17 19:10:0013495
2022-12-17 19:15:0014133

https://github.com/justinzm/gopup

GeneralNewsExtractor

This project is based on The paper "Webpage Text Extraction Method Based on Text and Symbol Density" uses a text extractor implemented in Python, which can be used to extract the content, author, and title of the text in HTML.

>>> from gne import GeneralNewsExtractor

>>> html = '''经过渲染的网页 HTML 代码'''

>>> extractor = GeneralNewsExtractor()
>>> result = extractor.extract(html, noise_node_list=['//div[@]'])
>>> print(result)

Output:

{"title": "xxxx", "publish_time": "2019-09-10 11:12:13", "author": "yyy", "content": "zzzz", "images": ["/xxx.jpg", "/yyy.png"]}

News page extraction example

Those interesting and powerful Python libraries

https://github.com/GeneralNewsExtractor/GeneralNewsExtractor

Crawler

Crawler is also a major application direction of Python language. Many friends also start with crawler. Let’s take a look at some excellent crawler projects

playwright-python

Microsoft's open source browser automation tool can operate the browser using Python language. Supports Chromium, Firefox and WebKit browsers under Linux, macOS, and Windows systems.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
for browser_type in [p.chromium, p.firefox, p.webkit]:
browser = browser_type.launch()
page = browser.new_page()
page.goto('http://whatsmyuseragent.org/')
page.screenshot(path=f'example-{browser_type.name}.png')
browser.close()

https://github.com/microsoft/playwright-python

awesome-python-login-model

This project collects various Login methods for large websites and crawler programs for some websites. Login methods include selenium login, direct simulated login through packet capture, etc. Helps novices research and write crawlers.

However, as we all know, crawlers are very demanding for post-maintenance. The project has not been updated for a long time, so there are still doubts whether the various login interfaces can still be used normally. Everyone chooses to use them, or develop them themselves.

Those interesting and powerful Python libraries

https://github.com/Kr1s77/awesome-python-login-model

DecryptLogin

Compared with the previous one, this project is still being updated. It also simulates logging into major websites, which is still very valuable for novices.

from DecryptLogin import login

# the instanced Login class object
lg = login.Login()
# use the provided api function to login in the target website (e.g., twitter)
infos_return, session = lg.twitter(username='Your Username', password='Your Password')

https://github.com/CharlesPikachu/DecryptLogin

Scylla

Scylla is a high-quality free proxy IP pool tool. Currently only Python 3.6 is supported.

http://localhost:8899/api/v1/stats

Output:

{
"median": 181.2566407083,
"valid_count": 1780,
"total_count": 9528,
"mean": 174.3290085201
}

https://github.com/scylladb/scylladb

ProxyPool

Crawler proxy IP pool The main function of the project is to regularly collect free proxies published online for verification and put them into the database. The proxies that are regularly verified and put into the database ensure the availability of the agents. It provides two usage methods: API and CLI. At the same time, the proxy source can also be expanded to increase the quality and quantity of proxy pool IPs. The project design document is detailed and the module structure is concise and easy to understand. It is also suitable for novice crawlers to better learn crawler technology.

import requests

def get_proxy():
return requests.get("http://127.0.0.1:5010/get/").json()

def delete_proxy(proxy):
requests.get("http://127.0.0.1:5010/delete/?proxy={}".format(proxy))

# your spider code

def getHtml():
# ....
retry_count = 5
proxy = get_proxy().get("proxy")
while retry_count > 0:
try:
html = requests.get('http://www.example.com', proxies={"http": "http://{}".format(proxy)})
# 使用代理访问
return html
except Exception:
retry_count -= 1
# 删除代理池中代理
delete_proxy(proxy)
return None

https://github.com/Python3WebSpider/ProxyPool

getproxy

getproxy is a crawling and distribution proxy website that obtains http/https The agent's program updates data every 15 minutes.

(test2.7) ➜~ getproxy
INFO:getproxy.getproxy:[*] Init
INFO:getproxy.getproxy:[*] Current Ip Address: 1.1.1.1
INFO:getproxy.getproxy:[*] Load input proxies
INFO:getproxy.getproxy:[*] Validate input proxies
INFO:getproxy.getproxy:[*] Load plugins
INFO:getproxy.getproxy:[*] Grab proxies
INFO:getproxy.getproxy:[*] Validate web proxies
INFO:getproxy.getproxy:[*] Check 6666 proxies, Got 666 valid proxies
...

https://github.com/fate0/getproxy

freeproxy

is also a project to capture free proxies. This project supports There are many proxy websites to crawl and it is easy to use.

from freeproxy import freeproxy

proxy_sources = ['proxylistplus', 'kuaidaili']
fp_client = freeproxy.FreeProxy(proxy_sources=proxy_sources)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
}
response = fp_client.get('https://space.bilibili.com/406756145', headers=headers)
print(response.text)

https://github.com/CharlesPikachu/freeproxy

fake-useragent

Disguise browser identity, often used for crawlers. The code of this project is very small, you can read it to see how ua.random returns a random browser identity.

from fake_useragent import UserAgent
ua = UserAgent()

ua.ie
# Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US);
ua.msie
# Mozilla/5.0 (compatible; MSIE 10.0; Macintosh; Intel Mac OS X 10_7_3; Trident/6.0)'
ua['Internet Explorer']
# Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; GTB7.4; InfoPath.2; SV1; .NET CLR 3.3.69573; WOW64; en-US)
ua.opera
# Opera/9.80 (X11; Linux i686; U; ru) Presto/2.8.131 Version/11.11
ua.chrome
# Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.2 (KHTML, like Gecko) Chrome/22.0.1216.0 Safari/537.2'
ua.google
# Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1290.1 Safari/537.13
ua['google chrome']
# Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11
ua.firefox
# Mozilla/5.0 (Windows NT 6.2; Win64; x64; rv:16.0.1) Gecko/20121011 Firefox/16.0.1
ua.ff
# Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:15.0) Gecko/20100101 Firefox/15.0.1
ua.safari
# Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25

# and the best one, get a random browser user-agent string
ua.random

https://github.com/fake-useragent/fake-useragent

Python Web has too many excellent and veteran There are many libraries, such as Django and Flask. I won’t talk about them as everyone knows them. We will introduce a few niche but easy-to-use ones.

streamlit

streamlit is a Python framework that can quickly turn data into visual and interactive pages. Turn our data into graphs in minutes.

import streamlit as st

x = st.slider('Select a value')
st.write(x, 'squared is', x * x)

Output:

Those interesting and powerful Python libraries

https://github.com/streamlit/streamlit

wagtail

是一个强大的开源 Django CMS(内容管理系统)。首先该项目更新、迭代活跃,其次项目首页提到的功能都是免费的,没有付费解锁的骚操作。专注于内容管理,不束缚前端实现。

Those interesting and powerful Python libraries

https://github.com/wagtail/wagtail

fastapi

基于 Python 3.6+ 的高性能 Web 框架。“人如其名”用 FastAPI 写接口那叫一个快、调试方便,Python 在进步而它基于这些进步,让 Web 开发变得更快、更强。

from typing import Union

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
return {"Hello": "World"}


@app.get("/items/{item_id}")
def read_item(item_id: int, q: Union[str, None] = None):
return {"item_id": item_id, "q": q}

https://github.com/tiangolo/fastapi

django-blog-tutorial

这是一个 Django 使用教程,该项目一步步带我们使用 Django 从零开发一个个人博客系统,在实践的同时掌握 Django 的开发技巧。

https://github.com/jukanntenn/django-blog-tutorial

dash

dash 是一个专门为机器学习而来的 Web 框架,通过该框架可以快速搭建一个机器学习 APP。

Those interesting and powerful Python libraries

https://github.com/plotly/dash

PyWebIO

同样是一个非常优秀的 Python Web 框架,在不需要编写前端代码的情况下就可以完成整个 Web 页面的搭建,实在是方便。

Those interesting and powerful Python libraries

https://github.com/pywebio/PyWebIO

Python 教程

practical-python

一个人气超高的 Python 学习资源项目,是 MarkDown 格式的教程,非常友好。

https://github.com/dabeaz-course/practical-python

learn-python3

一个 Python3 的教程,该教程采用 Jupyter notebooks 形式,便于运行和阅读。并且还包含了练习题,对新手友好。

https://github.com/jerry-git/learn-python3

python-guide

Requests 库的作者——kennethreitz,写的 Python 入门教程。不单单是语法层面的,涵盖项目结构、代码风格,进阶、工具等方方面面。一起在教程中领略大神的风采吧~

https://github.com/realpython/python-guide

其他

pytools

这是一位大神编写的类似工具集的项目,里面包含了众多有趣的小工具。

Those interesting and powerful Python libraries

截图只是冰山一角,全貌需要大家自行探索了

import random
from pytools import pytools

tool_client = pytools.pytools()
all_supports = tool_client.getallsupported()
tool_client.execute(random.choice(list(all_supports.values())))

https://github.com/CharlesPikachu/pytools

amazing-qr

可以生成动态、彩色、各式各样的二维码,真是个有趣的库。

#3 -n, -d
amzqr https://github.com -n github_qr.jpg -d .../paths/

https://github.com/x-hw/amazing-qr

sh

sh 是一个成熟的,用于替代 subprocess 的库,它允许我们调用任何程序,看起来它就是一个函数一样。

$> ./run.sh FunctionalTests.test_unicode_arg

https://github.com/amoffat/sh

tqdm

强大、快速、易扩展的 Python 进度条库。

from tqdm import tqdm
for i in tqdm(range(10000)):
...

https://github.com/tqdm/tqdm

loguru

一个让 Python 记录日志变得简单的库。

from loguru import logger

logger.debug("That's it, beautiful and simple logging!")

https://github.com/Delgan/loguru

click

Python 的第三方库,用于快速创建命令行。支持装饰器方式调用、多种参数类型、自动生成帮助信息等。

import click

@click.command()
@click.option("--count", default=1, help="Number of greetings.")
@click.option("--name", prompt="Your name", help="The person to greet.")
def hello(count, name):
"""Simple program that greets NAME for a total of COUNT times."""
for _ in range(count):
click.echo(f"Hello, {name}!")

if __name__ == '__main__':
hello()

Output:

$ python hello.py --count=3
Your name: Click
Hello, Click!
Hello, Click!
Hello, Click!

KeymouseGo

Python 实现的精简绿色版按键精灵,记录用户的鼠标、键盘操作,自动执行之前记录的操作,可设定执行的次数。在进行某些简单、单调重复的操作时,使用该软件可以十分省事儿。只需要录制一遍,剩下的交给 KeymouseGo 来做就可以了。

Those interesting and powerful Python libraries

https://github.com/taojy123/KeymouseGo

The above is the detailed content of Those interesting and powerful Python libraries. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
解读CRISP-ML(Q):机器学习生命周期流程解读CRISP-ML(Q):机器学习生命周期流程Apr 08, 2023 pm 01:21 PM

译者 | 布加迪审校 | 孙淑娟目前,没有用于构建和管理机器学习(ML)应用程序的标准实践。机器学习项目组织得不好,缺乏可重复性,而且从长远来看容易彻底失败。因此,我们需要一套流程来帮助自己在整个机器学习生命周期中保持质量、可持续性、稳健性和成本管理。图1. 机器学习开发生命周期流程使用质量保证方法开发机器学习应用程序的跨行业标准流程(CRISP-ML(Q))是CRISP-DM的升级版,以确保机器学习产品的质量。CRISP-ML(Q)有六个单独的阶段:1. 业务和数据理解2. 数据准备3. 模型

人工智能的环境成本和承诺人工智能的环境成本和承诺Apr 08, 2023 pm 04:31 PM

人工智能(AI)在流行文化和政治分析中经常以两种极端的形式出现。它要么代表着人类智慧与科技实力相结合的未来主义乌托邦的关键,要么是迈向反乌托邦式机器崛起的第一步。学者、企业家、甚至活动家在应用人工智能应对气候变化时都采用了同样的二元思维。科技行业对人工智能在创建一个新的技术乌托邦中所扮演的角色的单一关注,掩盖了人工智能可能加剧环境退化的方式,通常是直接伤害边缘人群的方式。为了在应对气候变化的过程中充分利用人工智能技术,同时承认其大量消耗能源,引领人工智能潮流的科技公司需要探索人工智能对环境影响的

找不到中文语音预训练模型?中文版 Wav2vec 2.0和HuBERT来了找不到中文语音预训练模型?中文版 Wav2vec 2.0和HuBERT来了Apr 08, 2023 pm 06:21 PM

Wav2vec 2.0 [1],HuBERT [2] 和 WavLM [3] 等语音预训练模型,通过在多达上万小时的无标注语音数据(如 Libri-light )上的自监督学习,显著提升了自动语音识别(Automatic Speech Recognition, ASR),语音合成(Text-to-speech, TTS)和语音转换(Voice Conversation,VC)等语音下游任务的性能。然而这些模型都没有公开的中文版本,不便于应用在中文语音研究场景。 WenetSpeech [4] 是

条形统计图用什么呈现数据条形统计图用什么呈现数据Jan 20, 2021 pm 03:31 PM

条形统计图用“直条”呈现数据。条形统计图是用一个单位长度表示一定的数量,根据数量的多少画成长短不同的直条,然后把这些直条按一定的顺序排列起来;从条形统计图中很容易看出各种数量的多少。条形统计图分为:单式条形统计图和复式条形统计图,前者只表示1个项目的数据,后者可以同时表示多个项目的数据。

自动驾驶车道线检测分类的虚拟-真实域适应方法自动驾驶车道线检测分类的虚拟-真实域适应方法Apr 08, 2023 pm 02:31 PM

arXiv论文“Sim-to-Real Domain Adaptation for Lane Detection and Classification in Autonomous Driving“,2022年5月,加拿大滑铁卢大学的工作。虽然自主驾驶的监督检测和分类框架需要大型标注数据集,但光照真实模拟环境生成的合成数据推动的无监督域适应(UDA,Unsupervised Domain Adaptation)方法则是低成本、耗时更少的解决方案。本文提出对抗性鉴别和生成(adversarial d

数据通信中的信道传输速率单位是bps,它表示什么数据通信中的信道传输速率单位是bps,它表示什么Jan 18, 2021 pm 02:58 PM

数据通信中的信道传输速率单位是bps,它表示“位/秒”或“比特/秒”,即数据传输速率在数值上等于每秒钟传输构成数据代码的二进制比特数,也称“比特率”。比特率表示单位时间内传送比特的数目,用于衡量数字信息的传送速度;根据每帧图像存储时所占的比特数和传输比特率,可以计算数字图像信息传输的速度。

数据分析方法有哪几种数据分析方法有哪几种Dec 15, 2020 am 09:48 AM

数据分析方法有4种,分别是:1、趋势分析,趋势分析一般用于核心指标的长期跟踪;2、象限分析,可依据数据的不同,将各个比较主体划分到四个象限中;3、对比分析,分为横向对比和纵向对比;4、交叉分析,主要作用就是从多个维度细分数据。

聊一聊Python 实现数据的序列化操作聊一聊Python 实现数据的序列化操作Apr 12, 2023 am 09:31 AM

​在日常开发中,对数据进行序列化和反序列化是常见的数据操作,Python提供了两个模块方便开发者实现数据的序列化操作,即 json 模块和 pickle 模块。这两个模块主要区别如下:json 是一个文本序列化格式,而 pickle 是一个二进制序列化格式;json 是我们可以直观阅读的,而 pickle 不可以;json 是可互操作的,在 Python 系统之外广泛使用,而 pickle 则是 Python 专用的;默认情况下,json 只能表示 Python 内置类型的子集,不能表示自定义的

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor