问题:
您有一个文本字段,用户可以在其中输入任意内容文本,您需要提取所有 YouTube 视频 URL 及其对应的ID。
解决方案:
要使用正则表达式从字符串中提取 YouTube 视频 ID,请按照以下步骤操作:
定义正则表达式模式:
https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|</a>))[?=&+%\w.-]*
说明:
使用正则表达式解析文本:
使用 re.findall 函数搜索所有 YouTube 视频 URL这text.
import re def find_video_ids(text): pattern = r'https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube(?:-nocookie)?\.com\S*?[^\w\s-])([\w-]{11})(?=[^\w-]|$)(?![?=&+%\w.-]*(?:['"][^<>]*>|</a>))[?=&+%\w.-]*' return re.findall(pattern, text)
提取视频 ID:
re.findall 函数返回匹配的视频 URL 列表。您可以使用 [:11] 从每个网址访问视频 ID(YouTube 视频 ID 长度为 11 个字符)。
def get_video_ids(text): video_urls = find_video_ids(text) return [url[:11] for url in video_urls]
示例:
text = """ Lorem Ipsum is simply dummy text. https://www.youtube.com/watch?v=DUQi_R4SgWo of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. https://www.youtube.com/watch?v=A_6gNZCkajU&feature=relmfu It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.""" video_ids = get_video_ids(text) print(video_ids) # Output: ['DUQi_R4SgWo', 'A_6gNZCkajU']
以上是如何使用正则表达式从字符串中提取 YouTube 视频 ID?的详细内容。更多信息请关注PHP中文网其他相关文章!