YouTube research and media operations agent — searches videos, extracts metadata, downloads media, reads comments, fetches transcripts, and transcribes audio locally with Whisper when captions are unavailable.
YouTubeAgent is a research-and-media automation agent built on top of the CoreAgent framework. It gives Claude the ability to search YouTube, pull detailed metadata from videos and channels, read comments, fetch transcripts, download media, clip segments, and transcribe audio locally — all through a set of 8 purpose-built CLI tools orchestrated by a single CLAUDE.md instruction file.
The agent follows a tool-routing pattern: when a user makes a request, the agent reads each tool's README.md to determine the exact flags and expected output format, then invokes the corresponding shell command. Six of the eight tools are thin Python wrappers around yt-dlp, the de-facto YouTube extraction library. The two exceptions are Whisper-CLI, which uses Apple's WhisperKit (CoreML) for on-device speech-to-text, and VideoClipper, which shells out to ffmpeg for lossless segment cutting.
A key architectural decision is the transcript fallback chain: the agent first attempts to fetch YouTube's own captions via YouTubeTranscript; if none are available, it downloads the audio with YouTubeDownload and pipes it through Whisper-CLI for local transcription. A similar pattern exists for clip transcription — VideoClipper extracts the segment, then Whisper-CLI processes it. This guarantees text output for any video, regardless of whether the uploader provided captions.
Every tool writes structured output (JSON or plain text) to its own TOOL-OUTPUT-RESULTS/ directory, making results easy to reference across multi-step research workflows. The agent excels at tasks like competitive analysis, content research, audience sentiment extraction, and media archival — combining metadata lookup, transcript retrieval, and comment analysis into a single conversational flow.
YouTubeAgent is a tool-routing agent — it receives a user request, determines which CLI tool to invoke, reads the tool's README for exact flags, and executes the command. All 6 Python tools share the same pattern: a youtube_*.py script wrapped by a shell entrypoint, using yt-dlp as the YouTube backend. Whisper-CLI is the exception — it uses WhisperKit (CoreML) for on-device transcription and only runs when YouTube captions are unavailable. VideoClipper is a thin wrapper around ffmpeg for cutting segments. Each tool writes its output to its own TOOL-OUTPUT-RESULTS/ directory.
flowchart TD
USER["User Request"]
AGENT["YouTubeAgent"]
SEARCH["YouTubeSearch"]
INFO["YouTubeVideoInfo"]
CHANNEL["YouTubeChannelInfo"]
COMMENTS["YouTubeVideoComments"]
TRANSCRIPT["YouTubeTranscript"]
DOWNLOAD["YouTubeDownload"]
WHISPER["Whisper-CLI"]
CLIPPER["VideoClipper"]
USER --> AGENT
AGENT --> SEARCH
AGENT --> INFO
AGENT --> CHANNEL
AGENT --> COMMENTS
AGENT --> TRANSCRIPT
AGENT --> DOWNLOAD
TRANSCRIPT -.->|"No captions"| DOWNLOAD
DOWNLOAD -.->|"Audio file"| WHISPER
DOWNLOAD -.->|"Media file"| CLIPPER
CLIPPER -.->|"Clipped segment"| WHISPER
style USER fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style AGENT fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style SEARCH fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style INFO fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style CHANNEL fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style COMMENTS fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style TRANSCRIPT fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style DOWNLOAD fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style WHISPER fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
style CLIPPER fill:#f0f0f0,stroke:#888,color:#333,stroke-width:1.5px,rx:10,ry:10
Eight CLI tools organized by function: 6 Python tools using yt-dlp for YouTube API operations, 1 shell-based transcriber using WhisperKit CoreML, and 1 ffmpeg wrapper for media clipping. All Python tools share the same dependency stack (Python 3.10+, yt-dlp, Node.js) and follow a consistent --export / --output-dir pattern for Markdown output.
| Flag | Default | Description |
|---|---|---|
| --query | Required | Search query string |
| --results | 10 | Number of results to return |
| --export | false | Export results to Markdown |
| --output-dir | TOOL-OUTPUT-RESULTS/ | Custom export directory |
| Flag | Default | Description |
|---|---|---|
| --video | Required | YouTube URL or video ID |
| --export | false | Export metadata to Markdown |
| --output-dir | TOOL-OUTPUT-RESULTS/ | Custom export directory |
| Flag | Default | Description |
|---|---|---|
| --channel | Required | Channel handle (e.g. @OpenAI) |
| --sort | recent | Sort order: recent or popularity |
| --results | 50 | Number of videos to fetch |
| --export | false | Export to Markdown |
| Flag | Default | Description |
|---|---|---|
| --video | Required | YouTube URL or video ID |
| --comments | 20 | Number of top-level comments |
| --replies-per-comment | 3 | Replies per comment (0 to disable) |
| --sort | popularity | Sort: popularity or recent |
| --export | false | Export to Markdown |
| Flag | Default | Description |
|---|---|---|
| --video | Required | YouTube URL or video ID |
| --format | lines | Output format: lines or paragraph |
| --language | auto | Preferred source language code |
| --translate-to | none | Target translation language |
| --export | false | Export to Markdown |
| Flag | Default | Description |
|---|---|---|
| --video | Required | YouTube URL or video ID |
| --mode | video | Download mode: video or audio |
| --quality | best | Video quality: best, 2160, 1080, 720, etc. |
| --audio-format | best | Audio format: best, mp3, m4a, wav, etc. |
| --audio-quality | best | Audio bitrate (e.g. 192) |
| --list-formats | false | List all available formats |
| --export | false | Export summary to Markdown |
| Flag | Default | Description |
|---|---|---|
| -a, --audio | Required | Absolute path to input audio file |
| -l, --language | auto | Force language (en, es, etc.) |
| -o, --out-dir | out/ | Override output directory |
| --keep-artifacts | false | Keep JSON, SRT, TXT, and logs |
| Flag | Default | Description |
|---|---|---|
| -i | Required | Input file (absolute path) |
| -ss | Required | Start timestamp (HH:MM:SS) |
| -to | Required | End timestamp (HH:MM:SS) |
| -acodec | libmp3lame | Audio codec for re-encoding |
| -b:a | 192k | Audio bitrate |
| -c:v copy | — | Copy video stream (video files only) |
YouTubeVideoInfo — get metadata (title, channel, views, duration, description)YouTubeTranscript — get captions. If unavailable, offer Whisper fallbackYouTubeVideoComments — get top comments if user wants audience reactionYouTubeChannelInfo — get channel metadata and video list (recent or popular)YouTubeVideoInfo or YouTubeTranscriptYouTubeDownload with --mode audio to get the audio fileWhisper-CLI with the downloaded audio path to generate a Markdown transcriptYouTubeVideoInfo — get metadata to identify chapter timestampsYouTubeDownload with --mode audio to get the full audio fileVideoClipper — clip the relevant segment using the timestampsWhisper-CLI — transcribe only the clipped segmentYouTubeSearch — find videos by queryYouTubeVideoInfo, YouTubeTranscript, or YouTubeVideoComments for deeper analysis--batch flag or a file-based input mode where the user provides a list of URLs and the tool processes them all, aggregating results into a single export.
whisper.cpp or the OpenAI Whisper API on non-macOS systems. The README doesn't mention this platform limitation.
TOOL-OUTPUT-RESULTS/ directory. These files accumulate indefinitely. There's no age-based cleanup, no size limit, and no command to purge old results.
--clean flag on each tool or a top-level script that removes exports older than N days.