Youtube 数据 API：获取播放列表中每个视频的标题和文字记录

2023-12-24

我正在尝试获取播放列表中所有视频的标题和文字记录。这是到目前为止的代码：

from googleapiclient.discovery import build
from youtube_transcript_api import YouTubeTranscriptApi

api_key = "************************"

#1.query API 

rq = build("youtube", "v3", developerKey=api_key).playlistItems().list(
        part="contentDetails, snippet",
        playlistId="PL590CCC2BC5AF3BC1",
        maxResults=39,        
        ).execute()
        
#2.Create a list with video Ids and Titles

vid_ids = []
vid_title = []
for item in rq["items"]:
    vid_ids.append(item["contentDetails"]["videoId"])
    vid_title.append(item["snippet"]["title"])

#3.Get transcripts

srt = YouTubeTranscriptApi.get_transcripts(vid_ids)

#4.For each video id extract the Key:"text" from a list of dictionaries 

for i in vid_ids:
    get_key_text = " ".join([a_dict["text"] for a_dict in srt[0][i]])       

#5.Create a dictionary with the title and transcript for each video id

pl_dict = dict(zip(vid_title,get_key_text))

#6.print trancript under title

for key, value in pl_dict.items():
    print(key,"\n", value)

当我运行它时，我得到这个结果：

Lec 1 | MIT 18.01 Single Variable Calculus, Fall 2007 
 T
Lec 2 | MIT 18.01 Single Variable Calculus, Fall 2007 
 h
Lec 3 | MIT 18.01 Single Variable Calculus, Fall 2007 
 e
Lec 4 | MIT 18.01 Single Variable Calculus, Fall 2007 
  
Lec 5 | MIT 18.01 Single Variable Calculus, Fall 2007 
 f
Lec 6 | MIT 18.01 Single Variable Calculus, Fall 2007 
 o

在步骤 3 中，我获取了成绩单，并且存储在变量“srt”中的结果是一个元组 ---> 元组 [dict, list]。此外，字典内有一个列表字典（每个视频一个），包含 3 个键：“文本”、“开始”和“持续时间”。

在步骤 4 中，我从所有字典中提取 key: "text"，将它们连接起来（不使用双引号），并将它们存储在变量 get_key_text 中。但问题是，该变量将所有视频的转录本存储在一个文字字符串中，因此在下一步中，当我尝试打印标题和每个转录本时，我只得到第一个字符、第二个字母等。

我应该怎样做才能在标题后打印每份成绩单？

电流输出说明

get_key_text是一个字符数组（又名字符串，这里它的值类似于'The following content is provided under ...') while vid_title是一个字符数组的数组（又名字符串数组，这里它是这样的['7K1sB05pE0A', 'ryLdyDrBfvI', ...]），所以当做zip(vid_title,get_key_text)您正在创建一个pair数组，其中第一个条目是字符串vid_title第二个条目是（当您在每次迭代时覆盖的最后一个视频脚本）的字符get_key_text，所以你会得到类似的东西[['7K1sB05pE0A', 'T'], ['ryLdyDrBfvI', 'h'], ...].

解决方案：

所以你应该改变：

for i in vid_ids:
    get_key_text = " ".join([a_dict["text"] for a_dict in srt[0][i]])

To:

get_key_text = [" ".join([a_dict["text"] for a_dict in srt[0][i]]) for i in vid_ids]

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)