How I Automated My YouTube Shorts Channel with AI and RPA

How I Automated My YouTube Shorts Channel with AI and RPA

Christine Williams

May 13, 2025

AddSubtitle gives brands and creators full control over how their message meets the world. Subtitles, voiceover, and translation—all in one tool to speed up your video workflow. 

AddSubtitle gives brands and creators full control over how their message meets the world. Subtitles, voiceover, and translation—all in one tool to speed up your video workflow. 

Hi, I’m Christine, and in March this year, I started a bold journey—automating a YouTube Shorts channel using AI and RPA. The niche? AI-generated animal stories. Why this niche? Because animals resonate emotionally with audiences, and in the age of short-form content, emotional connection drives views and engagement.

But there was one big problem: producing videos manually is a time sink. Sourcing footage, editing, and publishing takes hours per video. That’s when I decided to go all-in on automation.

During the May holiday, I documented my full automation process. In this blog, I’ll walk you through:

  1. My end-to-end automation strategy – from finding reference videos to generating final visual assets.

  2. How to use my scripts – with step-by-step guidance, so you can implement or adapt the system for your own use.

This framework doesn’t just work for animal content. Master this process, and you can apply it across various AI video niches.

The Core Strategy: Recreate, Refine, and Automate

Let’s be honest—my video creation method is inspired by the best performers in my niche. But I don’t copy; I analyze, deconstruct, and recreate with enhancements.

The pipeline consists of 7 major steps:

  1. Identify top-performing Shorts as references

  2. Break down those videos into storyboard frames

  3. Write AI prompts for each frame (image generation)

  4. Modify elements in prompts to create a unique version

  5. Generate images for each frame

  6. Write video generation prompts for those images

  7. Stitch everything together in an editor

Steps 5 and 7 aren't fully automated yet, but the rest? Entirely handled by RPA (Robotic Process Automation) using Automa in Chrome, including multi-threading via fingerprint browsers.

Step-by-Step Breakdown

1. Sourcing Reference Videos

My script scrapes data from YouTube Shorts with a single hotkey (Ctrl + Alt + S), and supports both single videos and entire channels. The data goes straight into a spreadsheet, saving time and clicks.

⚠️ Pro tip: Use a secondary account for batch scraping to avoid risk.

2. Extracting Storyboards with Gemini 2.5 Pro

I use Google AI Studio with Gemini 2.5 Pro to break videos into scenes. It analyzes visuals and generates frame-by-frame prompts for image generation.

Step-by-Step Guide

Step1: Open Google AI Studio
  1. Visit https://aistudio.google.com/prompts/new_chat

  2. Log in with your Google account.

  3. In the top-right dropdown, choose Gemini 2.5 Pro (Flash Experimental) or the latest available model.

🔒 If you’re blocked from analyzing a YouTube video directly, use a browser extension or tool (e.g. 4K Video Downloader) to save the video locally, then upload the file directly into Gemini.

Step2: Load Your Video into Gemini

Option A: Use YouTube Link

Paste the URL of a publicly accessible YouTube Shorts video.

Option B: Upload a File

If external access is blocked, click the paperclip 📎 icon to upload a local video file.

To ensure high-quality output with Dreamina (an image generator), I use a refined prompt structure:

Camera Angle, Scene Setting, Main Character Description, Action, Facial Expression, Supporting Characters, Background, Time of Day, etc.

This structure ensures clarity for the AI model and consistency across frames.

Field

Description

Example

Camera Angle

The viewpoint (e.g., side view, low angle)

"Side angle"

Main Character’s Environment

Where they are

"On a rainy cliff edge"

Main Character Description

Physical traits

"A man in a white T-shirt and jeans"

Main Action

What they’re doing

"Holding up a crying baby"

Facial Expression

Emotion, visible reaction

"Angry expression"

Supporting Characters

Optional: who else is there

"A police officer running toward them"

Supporting Action

What they’re doing

"Shouting"

Supporting Expression

Their emotion

"Serious"

Background

The setting behind the characters

"Waterfall and misty mountain"

Additional Details

Visual effects or atmosphere

"Heavy rain, crashing waves"

Time of Day

When it's happening

"At dusk"

3. Rewriting Prompts to Avoid Plagiarism

Want to make sure your version is original? I built a second Gemini assistant that tweaks core characters, locations, and story elements—while keeping the emotional arc intact.

For instance, you can transform a scene with a pug saving a baby on a stormy beach into one with a golden retriever in a flooded city. The plot remains, but the visual setting changes—making it reusable across multiple themes.

📘 Final Instruction Set for Gemini: Storyboard Prompt Modification

Prompt Editing Guidelines (Simplified and Localized)

1. Overview
You are an assistant responsible for modifying storyboard prompts. Your job is to replace specific characters (e.g., protagonist, animal, villain) or environments (e.g., cliff, forest) based on user instructions, while keeping the story intact.

2. Core Principle
Do not change the core narrative. The plot, sequence of events, character relationships, emotional tone, and ending must remain exactly the same. Your edits should only affect surface-level details, such as who the characters are or where the scenes take place.

3. Input Format
You will be given a list of prompts, typically numbered (e.g., "Prompt 1", "Prompt 2", etc.). Each prompt is a Chinese-language paragraph describing a visual scene.

4. Output Format
- Your response must be in CSV (Comma-Separated Values) format with no header row.
- Each line must contain two fields:
  (1) Shot number (e.g., 1, 2, 3...)
  (2) The modified Chinese prompt as a natural-language paragraph.
- The paragraph must be enclosed in English double quotation marks (" ").
- The prompt structure should follow this format:

  [Camera Angle]. [Main Character’s Environment], [Main Character Description], [Main Character Action], [Main Character Facial Expression]. (Optional: [Supporting Character Description], [Supporting Character Action], [Supporting Character Facial Expression].) [Background Description]. [Additional Visual Elements]. [Time of Day].

- Use periods to separate major blocks of visual information.
- Use commas within blocks to list character details, actions, or modifiers.
- If a particular category (e.g., facial expression, supporting characters) doesn’t apply to a scene, omit it without leaving blank fields.

5. Character Replacement Rules
5.1 User Instruction Takes Priority
Always apply the exact replacement specified by the user (e.g., “Replace pug with golden retriever puppy”).

5.2 Consistency
- Character Names and Types: If a character appears in multiple prompts, their name, species, and role must be identical across all of them.
- Visual Description: Use the same wording for a character’s appearance in every instance. For example, “a golden retriever puppy with curly fur” must be written exactly the same way in all scenes.
- Scene Descriptions: If you replace a location (e.g., cliff  jungle), update all prompts that reference it to use the new scene consistently.

5.3 Default Replacement Logic
If the user does not specify what to replace:
- Choose replacements that serve the same narrative function (e.g., an animal saving a child should still be an animal capable of that action).
- Adjust physical actions to match the new subject (e.g., a robot cannot cry—use “flashing red lights” instead of “crying”).
- Respect ethnic or character attributes if mentioned (e.g., “a European man” must appear as such in every prompt).
- Always include quantity markers in Chinese (e.g., “一个婴儿”, “一名警察”).
- Limit each character to one clear, visual facial expression per prompt.

5.4 Scene Replacement Logic
- If you change a scene (e.g., cliff  jungle), ensure all environmental elements match the new setting (e.g., “crashing waves” “dense fog”, “rocky ledge” “muddy slope”).
- Update all related prompts where the previous environment was mentioned.
- Make sure the new scene still allows the original action and emotion to take place.

5.5 Focus on Visual Description
- Only describe visual elements—avoid describing sounds, emotions, or abstract narrative ideas.
- If necessary, convert sound into visual equivalents (e.g., “siren sound” “flashing red light”).

5.6 Do Not Modify
- The storyline
- The order of scenes
- Core emotional tone
- Camera angles
- Lighting or atmosphere unless the scene change logically affects it
- Objects or details unrelated to the replaced subject or environment

6. Collaboration and Clarification
If any instruction is unclear (e.g., ambiguous character roles or scene context), request clarification before editing. Do not make assumptions.

7. Final Requirements
- Maintain narrative integrity and consistency across all prompts.
- Use structured, clean natural-language Chinese paragraphs.
- Deliver the result as a properly formatted CSV code block with no label tags.
- Each paragraph should be self-contained and visually descriptive.

End of Guidelines

Core Principle: Keep the Plot Intact — Only Swap Characters or Scenes

This prompt system is incredibly easy to use. All you need to do is feed the image-generation prompts from Step 2 into Gemini.

🔄 How It Works:

  1. Copy and paste the prompts you generated in Step 2 into Gemini.

  2. Specify which elements to replace — for example, “Replace the pug with a golden retriever puppy.”

  3. Gemini will output a revised set of prompts with updated characters or settings.

💡 Why This Matters

The magic of this method lies in what it doesn’t change: the storyline remains untouched. Gemini only adjusts surface-level elements like subjects or environments. This means:

  • You can reuse the same storyboard structure to create multiple variations.

  • All versions remain compatible with the same video generation prompts.

  • You save time while producing a range of content from a single base script.

I've tested this personally—generated six alternate versions using the exact same video-generation instructions, and the results were consistently excellent.

4. Generating Images with Dreamina

Dreamina (CapCut’s international AI image tool) allows free image generation. My RPA script logs in, submits prompts, and downloads images automatically. All images are then renamed in sequence (1.jpg, 2.jpg…) using a Python tool I wrote for seamless integration in the next step.

5. Writing Prompts for Video Generation

I use the Dreamina prompts as input to generate video descriptions for Kling (可灵), ByteDance’s AI video generator. Prompts follow a specific format:

  • Camera movement (e.g. handheld, zoom-in)

  • Subject action (e.g. "the puppy swims towards the child")

  • Environmental effects (e.g. "stormy waves crashing")

Note: Out of 10 prompts, around 6 result in usable videos currently—still a work in progress.

6. Video Generation with Kling

This step is semi-automated. I wrote scripts to register new Kling accounts, input prompts, and download the final videos. Manual login is required due to CAPTCHA.

Each account generates up to 8 videos. Once logged in, everything else is script-driven—from creation to download.

Bonus: Full Automa Script Suite

To tie everything together, I use a full suite of scripts built on Automa 1.28. With proper setup, you can:

  • Scrape Shorts videos

  • Parse video scenes with Gemini

  • Rebuild prompts with alternate characters

  • Auto-generate images in Dreamina

  • Auto-generate videos in Kling

  • Export results in CSV format

I also created templates and sample workflows to minimize onboarding time. Setup can feel complex initially, but once in place, your production becomes effortless.

You can access the automation script in the following github repository:

https://github.com/liuyinjiwen06/youtube_automation


Final Thoughts

By combining AI with RPA, I drastically cut down my production time while keeping creative control. This workflow helped me:

  • Maximize content output with minimal effort

  • Scale variations from a single script

  • Repurpose ideas across multiple channels and niches

This system isn’t limited to AI animal stories. Whether you're making ASMR, history shorts, or motivational content—this approach is adaptable.

If you're exploring the YouTube automation game, I hope this walkthrough saves you time and frustration. And if you’re stuck or curious, feel free to reach out—I’m happy to share more!

Table of Content