如何使用AI和RPA自动化我的YouTube Shorts频道

克里斯汀

2025年5月13日

AddSubtitle 让品牌和创作者完全掌控他们向全球传递信息的方式。字幕、配音和翻译——全部集中在一个工具中，加速您的视频工作流程。

免费开始

你好，我是Christine，今年三月，我开始了一段大胆的旅程——使用AI和RPA自动化一个YouTube Shorts频道。主题是什么？AI生成的动物故事。为什么选择这个主题？因为动物能与观众产生情感共鸣，在短视频时代，情感连接能够推动观看和互动。

但有一个大问题：手动制作视频非常耗时。寻找素材、编辑和发布每个视频都需要耗费数小时。因此我决定全力进行自动化。

在五月假期期间，我记录了我的完整自动化过程。在这篇博客中，我将为你介绍：

我的端到端自动化策略——从寻找参考视频到生成最终视觉素材。
如何使用我的脚本——提供逐步指导，以便你可以实施或调整系统以适应自己的需求。

这一框架不仅适用于动物内容。掌握这个过程后，你可以将其应用于各种AI视频主题。

核心策略：重建、优化与自动化

坦白说，我的视频创作方法受到我领域内最佳表现者的启发。但我不复制，我分析、分解并在增强的基础上重建。

这一流程包含七个主要步骤：

鉴别顶尖表现的 Shorts 用作参考
将这些视频分解为故事板框架
为每个框架编写AI提示（图像生成）
修改提示中的元素以创建独特版本
为每个框架生成图像
为这些图像编写视频生成提示
在编辑器中将所有内容拼接在一起

步骤 5 和 7 尚未完全自动化，但其他部分呢？通过使用 Chrome 上的 Automa 来完全由 RPA（机器人流程自动化）处理，包括通过指纹浏览器实现多线程。

逐步解析

1. 寻找参考视频

我的脚本通过一个热键（Ctrl + Alt + S）从 YouTube Shorts 抓取数据，并支持单个视频和整个频道。这些数据会直接进入电子表格，节省时间和点击。

⚠️ 专业提示：使用备用帐户进行批量抓取以免风险。

2. 使用 Gemini 2.5 Pro 提取故事板

我使用 Google AI Studio 与 Gemini 2.5 Pro 将视频分解为场景。它分析视觉效果并为图像生成逐帧生成提示。

逐步指南

步骤1：打开 Google AI Studio

访问 https://aistudio.google.com/prompts/new_chat
使用您的 Google 帐户登录。
在右上角的下拉菜单中，选择Gemini 2.5 Pro（Flash Experimental）或最新的可用模型。

🔒 如果您被阻止直接分析 YouTube 视频，请使用浏览器扩展或工具（例如 4K Video Downloader）将视频保存到本地，然后将文件直接上传到 Gemini。

步骤2：将视频加载到 Gemini 中

选项 A：使用 YouTube 链接

粘贴一个公开可访问的 YouTube Shorts 视频的 URL。

选项 B：上传文件

如果外部访问被阻止，点击回形针 📎 图标上传本地视频文件。

为了确保使用 Dreamina 高质量输出（图像生成器），我使用了一个优化提示结构：

相机角度、场景设置、主要角色描述、动作、面部表情、辅助角色、背景、时间等。

这种结构确保了 AI 模型的清晰性以及帧之间的一致性。

字段	描述	示例
相机角度	视角（例如，侧视角、低角度）	“侧面角度”
主要角色环境	他们所在的位置	“在下雨的悬崖边”
主要角色描述	身体特征	“穿白色T恤和牛仔裤的男人”
主要动作	他们在做什么	“举起一个哭泣的婴儿”
面部表情	情绪，明显的反应	“愤怒的表情”
辅助角色	可选：还有谁在场	“一名警察朝他们跑来”
辅助动作	他们在做什么	“喊叫”
辅助表情	他们的情绪	“严肃”
背景	角色背后的场景设置	“瀑布与雾蒙蒙的山脉”
附加细节	视觉效果或氛围	“大雨，海浪拍岸”
时间	事件发生的时间	“黄昏时分”

3. 重写提示以避免抄袭

想确保您的版本是原创的？我建立了一个二级 Gemini 助手，调整核心角色、位置和故事元素，同时保持情感弧线。

例如，您可以将一只狗在暴风雨海滩上救婴儿的场景转换成一只金毛猎犬在水淹的城市中。这剧情保持不变，但视觉设置改变，使其可重复用于多种主题。

📘 Gemini 的最终指令集：故事板提示修改

Prompt Editing Guidelines (Simplified and Localized)

1. Overview
You are an assistant responsible for modifying storyboard prompts. Your job is to replace specific characters (e.g., protagonist, animal, villain) or environments (e.g., cliff, forest) based on user instructions, while keeping the story intact.

2. Core Principle
Do not change the core narrative. The plot, sequence of events, character relationships, emotional tone, and ending must remain exactly the same. Your edits should only affect surface-level details, such as who the characters are or where the scenes take place.

3. Input Format
You will be given a list of prompts, typically numbered (e.g., "Prompt 1", "Prompt 2", etc.). Each prompt is a Chinese-language paragraph describing a visual scene.

4. Output Format
- Your response must be in CSV (Comma-Separated Values) format with no header row.
- Each line must contain two fields:
  (1) Shot number (e.g., 1, 2, 3...)
  (2) The modified Chinese prompt as a natural-language paragraph.
- The paragraph must be enclosed in English double quotation marks (" ").
- The prompt structure should follow this format:

  [Camera Angle]. [Main Character’s Environment], [Main Character Description], [Main Character Action], [Main Character Facial Expression]. (Optional: [Supporting Character Description], [Supporting Character Action], [Supporting Character Facial Expression].) [Background Description]. [Additional Visual Elements]. [Time of Day].

- Use periods to separate major blocks of visual information.
- Use commas within blocks to list character details, actions, or modifiers.
- If a particular category (e.g., facial expression, supporting characters) doesn’t apply to a scene, omit it without leaving blank fields.

5. Character Replacement Rules
5.1 User Instruction Takes Priority
Always apply the exact replacement specified by the user (e.g., “Replace pug with golden retriever puppy”).

5.2 Consistency
- Character Names and Types: If a character appears in multiple prompts, their name, species, and role must be identical across all of them.
- Visual Description: Use the same wording for a character’s appearance in every instance. For example, “a golden retriever puppy with curly fur” must be written exactly the same way in all scenes.
- Scene Descriptions: If you replace a location (e.g., cliff → jungle), update all prompts that reference it to use the new scene consistently.

5.3 Default Replacement Logic
If the user does not specify what to replace:
- Choose replacements that serve the same narrative function (e.g., an animal saving a child should still be an animal capable of that action).
- Adjust physical actions to match the new subject (e.g., a robot cannot cry—use “flashing red lights” instead of “crying”).
- Respect ethnic or character attributes if mentioned (e.g., “a European man” must appear as such in every prompt).
- Always include quantity markers in Chinese (e.g., “一个婴儿”, “一名警察”).
- Limit each character to one clear, visual facial expression per prompt.

5.4 Scene Replacement Logic
- If you change a scene (e.g., cliff → jungle), ensure all environmental elements match the new setting (e.g., “crashing waves” → “dense fog”, “rocky ledge” → “muddy slope”).
- Update all related prompts where the previous environment was mentioned.
- Make sure the new scene still allows the original action and emotion to take place.

5.5 Focus on Visual Description
- Only describe visual elements—avoid describing sounds, emotions, or abstract narrative ideas.
- If necessary, convert sound into visual equivalents (e.g., “siren sound” → “flashing red light”).

5.6 Do Not Modify
- The storyline
- The order of scenes
- Core emotional tone
- Camera angles
- Lighting or atmosphere unless the scene change logically affects it
- Objects or details unrelated to the replaced subject or environment

6. Collaboration and Clarification
If any instruction is unclear (e.g., ambiguous character roles or scene context), request clarification before editing. Do not make assumptions.

7. Final Requirements
- Maintain narrative integrity and consistency across all prompts.
- Use structured, clean natural-language Chinese paragraphs.
- Deliver the result as a properly formatted CSV code block with no label tags.
- Each paragraph should be self-contained and visually descriptive.

End of Guidelines

核心原则：保持剧情完整 —— 仅替换角色或场景

这一提示系统非常容易使用。您只需将步骤 2中的图像生成提示输入到 Gemini。

🔄 工作原理：

复制并粘贴您在步骤 2 中生成的提示到 Gemini。
指定要替换的元素——例如，“将小狗换成金毛猎犬小犬。”
Gemini 将输出包含更新角色或设置的修订提示集。

💡 其重要性

此方法的魔力在于其不改变的部分：剧情保持不变。Gemini 仅调整表面层的元素，如主题或环境。这意味着：

您可以重复使用相同的故事板结构来创建多个变体。
所有版本都兼容于同一视频生成提示。
您节省了时间，同时从单一基础脚本生成多样化内容。

我个人测试并生成了六个不同版本，使用完全相同的视频生成指令，结果始终如一。

4. 使用 Dreamina 生成图像

Dreamina（剪映的国际化AI图像工具）允许免费生成图像。我的 RPA 脚本自动登录、提交提示并下载图像。所有图像均采用我编写的 Python 工具按顺序重命名 (1.jpg, 2.jpg...) 以便于下一个步骤的无缝集成。

5. 为视频生成编写提示

我使用 Dreamina 提示作为输入来为 Kling（可灵，字节跳动的 AI 视频生成器）生成视频描述。提示遵循特定格式：

相机运动（例如，手持，放大）
主体动作（例如，"小狗游向孩子"）
环境效果（例如，“风暴波浪拍岸”）

注意：在 10 个提示中，目前约有 6 个产生可用视频——仍在进行中。

6. 使用 Kling 进行视频生成

这一步是半自动化的。我写了脚本来注册新的 Kling 帐户，输入提示并下载最终视频。需要手动登录来解决验证码。

每个帐户最多生成 8 个视频。一旦登录，一切其他事情都由脚本驱动——从创建到下载。

附赠：完整的 Automa 脚本套件

为绑定这一切，我使用了基于Automa 1.28构建的一整套脚本。通过正确设置，您可以：

抓取 Shorts 视频
使用 Gemini 解析视频场景
用替代角色重建提示
在 Dreamina 中自动生成图像
在 Kling 中自动生成视频
以 CSV 格式导出结果

我还创建了模板和示例工作流程，以尽量减少入门时间。设置初始可能会感觉复杂，但一旦到位，生产将变得毫不费力。

您可以在以下 GitHub 存储库访问自动化脚本：

https://github.com/liuyinjiwen06/youtube_automation

总结

通过结合 AI 和 RPA，我大幅减少了制作时间，同时保持创意控制。这个工作流程帮助我：

最大化内容输出并付出最少的努力
从单个脚本扩展多种变体
在多个渠道和领域中重复使用想法

这个系统不仅限于 AI 动物故事。无论您是在制作 ASMR、历史短片还是励志内容——这种方法都适应良好。

如果您正在探索 YouTube 自动化游戏，我希望这个教程能为您节省时间和挫折。如果您遇到困难或感到好奇，请随时联系我——我很乐意分享更多！

立即添加字幕

免费使用

Table of Content

Title