Decorative circuit board background pattern
Command-Line Video Editing with Claude Changed Everything

Command-Line Video Editing with Claude Changed Everything

JL

Jay Long

Software Engineer & Founder

Published March 31, 2026

I Didn't Expect to Be Editing Video Today

I had an awesome breakthrough in video editing today, and it came from the last place I expected: the command line.

I didn't have time for video editing. I knew video content was important, but it wasn't high on the priority list. Getting video out, though? That was extremely high priority. I've built up a solid blog with a lot of great stories and ideas, and there really needs to be a video component. Detailed breakdowns, demos, instructional content. The vast majority of articles I've published could benefit from video. Screenshots of pretty much every article would add substantial perspective.

So I picked one. I went with the blog publishing pipeline because it's one of my most-used fully automated pipelines. It runs fast, dependably, and I use it every day. It's had a huge impact on my company, my brand, my clients. I've gotten real clients that trace back to blog articles I've published. New leads, new contracts. The next obvious thing was video.

Why Has Video Been So Hard?

I've done videos in the past, and the amount of time I had to spend in video editing software just to get something barely viable was brutal. And when I say barely viable, I don't mean the ideas were weak or hollow. The soul is there. The story I have to tell, the ideas I want to communicate, I stand behind all of it. It's worth your time. But preparing it in a way that signals quality and effectiveness? Very challenging.

I tend to go down rabbit holes. I go on tangents. It's kind of how my mind works. I've had such great success with the blog publishing pipeline because my editor agent, with all of its skills and voice DNA, is really good at keeping me on track. It's actually more than an editor. There are two agents: one does the hands-on editing, and when that editor filters out parts of my transcript where I go too far off topic, it saves the information and hands it off to the personal assistant. The assistant rolls it into a stand-up for the next day. Keeps it in memory. So I get to freestyle, and enough of my tangents get preserved that it doesn't stomp all over my voice, because going down multiple threads is just how I talk when I'm excited about something.

I actually usually do a decent job of tying tangents back into the main point, explaining why my mind made the connection to something that seems completely unrelated. The blog editor is pretty good at finding that thread. But a lot of why I can make those connections is because my mind also goes on tangents that are genuinely chaotic and random. I'll open up many trains of thought at once while talking. Some hit, some get abandoned. It's just how my perspective evolves.

What If We Just Pointed Claude at Video?

I'd had such good results processing voice memo transcripts that I thought, let's see what we can do with video. At the very least, we could use Whisper to transcribe a long-form video and I could provide screenshots. I'd been doing this a lot with client pipelines already, taking screenshots and bug reports from Slack and building development plans from them. That whole pipeline was running well. So I naturally thought: let's just point it at video and see what happens. Use Whisper to transcribe, send screenshots of significant moments, put it all in a folder, and tell Claude to analyze everything.

I thought I was going to have to manually record a bunch of different shorts from the long-form video. That's not what happened.

Claude Suggested FFmpeg, and It Actually Made Sense

When I told Claude the plan, it came back with something I never expected: "I've got FFmpeg. Python has the libraries. We can process these clips on the command line."

I never once considered editing video on the command line. It seemed crazy. A complete dead end, a time vampire. We should just use some kind of video editing application. But it turns out there's actually some pretty powerful functionality available through the CLI. No AI besides Claude running the scripts. This is not generating video. The only inference being used is Claude analyzing the video content and telling me where to chop it up.

As soon as Claude explained the initial pass, it made perfect sense. Of course there's a Python library that can chop videos up. How many web apps have little simple video editors built in? All I'm doing is describing in plain text what I want done, and Claude is using its understanding of the content to perform those commands. It's kind of like a magic trick. David Copperfield isn't doing real magic. It's sleight of hand, it's a facade, but your brain thinks it's real. Because Claude has such a deep understanding of the video we're editing, it can use basic tools in a way that feels magical.

From 10 Minutes to Under 60 Seconds

Here's how it played out. Claude got the timestamps of every word in the transcript. I provided screenshots of different sections I wanted to highlight, with their timestamps. Between my narration in the raw video, the screenshot sequence, and Claude's ability to analyze what was on each screenshot, it understood every second of the video without even analyzing the actual video file itself. It went in, chopped the video into segments, and created separate media files for each one. Then it wrote me an outline of what I should talk about for each segment.

That's when I pushed it further. I was proud of getting the raw video down to 10 minutes. When I get excited about something I have a tendency to ramble, so 10 felt like an achievement. But Claude said it could identify my core story, highlight the most standout moments, filter out everything tangential, and squeeze the whole thing to under a minute. A legit short.

So we did it. Claude separated the video into segments, identified what was most important in each one, squeezed those segments down to tiny snippets showing just what you need to see on screen to tell the story, then concatenated them all together into a video under 60 seconds. It gave me the end-to-end outline, and I recorded a voiceover.

Bumpers, Silence Trimming, and Speed Ramping

I told Claude to jump over to the CyberWorld codebase, grab the company logo, and make bumpers out of it so you see the brand coming in and going out. I didn't end up using those, though, because I decided to record the voiceover on camera. Claude took the beginning and end of my camera feed as bumpers, so you get me actually talking to the camera about what happened, with the screen recording in between.

Even then there was fine-tuning. Claude identified silence at the beginning and end and cut it out. Then it noticed a gap where the video couldn't keep up with the narration, so it double-speeded that one segment to keep everything moving along. All of this with no graphical user interface. The whole thing didn't even take an hour.

This Means a Video Pipeline Is Coming

The most powerful thing about all of this is that we did it entirely on the command line, entirely by describing what I wanted in plain text. That means we're 80 to 90 percent of the way to automating this thing. I'm going to have my video publishing pipeline this week. It's going to look very similar to the audio transcript pipeline.

This is huge because I thought video was going to be 10x the level of effort compared to publishing text articles. Turns out recording a video and turning it into shorts is almost as fast and effective as recording a voice memo and publishing a blog article. We're almost there.

Share this article

Help others discover this content by sharing it on social media