Tag: Cursor

How to Manage Your WordPress Blog from Cursor Using MCP

I just published a blog post to my WordPress site without ever opening wp-admin. No browser tab. No Gutenberg editor. I wrote the content in Cursor, told my AI agent to publish it, and it did — complete with categories, tags, and proper formatting.

This isn’t a hack or a workaround. WordPress now has official MCP (Model Context Protocol) support, which means any AI agent — Cursor, Claude Code, Codex, Gemini CLI — can interact with your WordPress site programmatically. Create posts. Upload media. Manage plugins. Moderate comments. All from the same interface where you write code.

Here’s how I set it up and why it’s worth doing.

Why Bother?

If you already use AI agents for coding, you know the flow: you describe what you want, the agent writes it, and you review. But the moment you need to publish something to WordPress, you break out of that flow entirely. Open a browser. Log in. Navigate to the editor. Paste content. Format it. Add categories. Hit publish.

With MCP, the agent handles all of that. You stay in your IDE. The content goes from draft to published without a context switch. And it’s not just posts — you get full access to your WordPress admin capabilities.

The Stack

Three WordPress plugins and one MCP client configuration. That’s the entire setup.

Component	What It Does	Source
Abilities API	WordPress core framework for declaring machine-readable capabilities	GitHub
MCP Adapter	Bridges WordPress abilities to the MCP protocol so AI agents can discover and call them	GitHub
MCP Expose Abilities	Registers 61 core WordPress abilities (posts, pages, media, plugins, users, comments, menus, options)	GitHub
MCP client config	Connects your AI agent (Cursor, Claude Code, etc.) to the WordPress MCP server via HTTP	Your `~/.cursor/mcp.json`

The Abilities API and MCP Adapter are official WordPress packages, maintained by the WordPress core team. MCP Expose Abilities is a free, open-source plugin by Devenia that registers the actual content management abilities.

Setup: WordPress Side

Step 1: Install the Plugins

Go to your WordPress admin (your-site.com/wp-admin) and install these three plugins in order:

Abilities API — download from GitHub releases, upload via Plugins → Add New → Upload Plugin, then activate
MCP Adapter — download from GitHub releases (v0.4.1), install the same way, activate
MCP Expose Abilities — download from GitHub releases, install, activate

The order matters. Abilities API provides the foundation, MCP Adapter bridges it to MCP, and MCP Expose Abilities registers the actual WordPress capabilities.

Step 2: Create an Application Password

The MCP client needs to authenticate with your WordPress site. WordPress Application Passwords are the cleanest way to do this.

Go to Users → Your Profile
Scroll down to the Application Passwords section
Enter a name like cursor-mcp
Click Add New Application Password
Copy the generated password — you only see it once

This password is scoped to API access only. It can’t be used to log into wp-admin directly, which is a nice security boundary.

Setup: Cursor Side

For remote WordPress sites, the connection goes through @automattic/mcp-wordpress-remote, an official proxy package that handles the HTTP transport.

Add this to your ~/.cursor/mcp.json inside the mcpServers object:

"wordpress-blog": {
  "command": "npx",
  "args": [
    "-y",
    "@automattic/mcp-wordpress-remote@latest"
  ],
  "env": {
    "WP_API_URL": "https://your-site.com/wp-json/mcp/mcp-adapter-default-server",
    "WP_API_USERNAME": "your-username-or-email",
    "WP_API_PASSWORD": "your-application-password"
  }
}

Replace the URL, username, and password with your actual values. Restart Cursor to pick up the new MCP server.

What You Get: 68 WordPress Abilities

Once connected, you can ask your agent to discover what’s available. Behind the scenes, it calls the mcp-adapter-discover-abilities tool and gets back a full inventory. Here’s what MCP Expose Abilities registers:

Category	Abilities
Content	List, get, create, update, delete, patch, search posts & pages; manage revisions, categories, tags
Media	Upload (from URL), get, update, delete media items
Plugins	List, upload, activate, deactivate, delete plugins
Menus	List, create menus; add, update, delete items; assign locations
Users	List, get, create, update, delete users
Comments	List, get, create, reply, update status, delete comments
Options	Get, update, list site options
System	Site info, environment info, debug log, toggle debug mode, transients

There are also 12 optional add-ons for Elementor, Rank Math, Wordfence, Cloudflare, GeneratePress, and more — install only the ones your site uses.

Using It in Practice

Here’s what it looks like in a real session. I asked my Cursor agent to publish a blog post I’d just drafted. It:

Called content/list-categories to find the right category ID
Created 5 new tags via content/create-tag
Called content/create-post with the full HTML content, category, and excerpt
Called content/update-post to attach the tags

The post was live as a draft in under 10 seconds. I reviewed it in wp-admin, made one formatting tweak (which I also did via MCP), and published.

The agent calls look like this under the hood:

// Discover what's available
mcp-adapter-discover-abilities {}

// Create a post
mcp-adapter-execute-ability {
  "ability_name": "content/create-post",
  "parameters": {
    "title": "My Blog Post Title",
    "content": "<!-- wp:paragraph -->n<p>Your content here...</p>n<!-- /wp:paragraph -->",
    "status": "draft",
    "category_ids": [6],
    "excerpt": "A short description of the post."
  }
}

// Create and assign tags
mcp-adapter-execute-ability {
  "ability_name": "content/create-tag",
  "parameters": { "name": "AI" }
}

Tips and Gotchas

Use Gutenberg Block Markup

If you send raw HTML, WordPress will render it but your theme’s block styles won’t apply. For proper formatting, wrap content in Gutenberg block comments:

<!-- wp:paragraph -->
<p>Your paragraph text here.</p>
<!-- /wp:paragraph -->

<!-- wp:code -->
<pre class="wp-block-code"><code>your code here</code></pre>
<!-- /wp:code -->

<!-- wp:heading -->
<h2 class="wp-block-heading">Your Heading</h2>
<!-- /wp:heading -->

This is what the Gutenberg editor produces internally. If you tell your agent to use this format, your posts will look identical to ones created in the editor.

Draft First, Publish Later

Always create posts with "status": "draft" so you can review before publishing. You can then either publish from wp-admin or tell the agent to update the status:

mcp-adapter-execute-ability {
  "ability_name": "content/update-post",
  "parameters": {
    "id": 48,
    "status": "publish"
  }
}

The Abilities API Is Now in WordPress Core

As of WordPress 6.9, the Abilities API has been merged into core. If you’re on 6.9+, you may not need the separate Abilities API plugin. The MCP Adapter and MCP Expose Abilities plugins are still required as separate installs.

Beyond Blog Posts

The real power isn’t just publishing posts faster. It’s having your entire WordPress site accessible as part of your AI workflow. Some things I plan to use this for:

Batch content updates — update metadata, fix formatting, or patch content across multiple posts in one session
Plugin management — check plugin status, activate/deactivate, or upload new plugins without leaving the IDE
Comment moderation — review and respond to comments as part of a daily routine, all from the terminal
Site diagnostics — check debug logs, site info, and environment details when troubleshooting
Content pipelines — research a topic, draft in Obsidian, review, then publish to WordPress — all in one agent session

That last one is exactly what I did today. I researched a topic, drafted a blog post, saved it to my Obsidian vault, then published it to WordPress — all without leaving Cursor.

The Full Setup Checklist

For reference, here’s everything in one place:

Install Abilities API plugin (or upgrade to WordPress 6.9+)
Install MCP Adapter plugin
Install MCP Expose Abilities plugin
Create an Application Password in Users → Your Profile
Add the MCP server config to ~/.cursor/mcp.json
Restart Cursor
Ask your agent: “discover my WordPress abilities”

That’s it. Your WordPress site is now part of your AI workflow.

Credits and Links:

MCP Adapter — official WordPress MCP integration by the WordPress core team
Abilities API — the underlying framework for machine-readable WordPress capabilities
MCP Expose Abilities by Devenia — the plugin that registers 61 core abilities
@automattic/mcp-wordpress-remote — the HTTP transport proxy by Automattic
Model Context Protocol — the open standard that makes this all work

This post was written, formatted, and published to WordPress entirely from within Cursor, using the exact setup described above.

March 18, 2026

How to Make AI Agents Understand Videos

Cursor, Claude Code, Codex — they can read your codebase, write code, run shell commands, and even browse the web. But hand them a video file and they’re blind.

This isn’t a minor gap. Screen recordings of bugs, product demo videos, YouTube tutorials you want to reference in code — video is everywhere in modern development workflows. Yet when you drop a .mp4 into a conversation, your AI agent has no idea what to do with it.

I ran into this exact limitation and went looking for a solution. Here’s what I found, what broke, and how I fixed it.

The Problem: AI Agents Can’t See Videos

Most AI coding agents — including those powered by Claude — support image inputs natively. You can screenshot your UI bug and ask “what’s wrong here?” and get a useful answer.

But video? Nothing. The Read tool in Cursor supports JPEG, PNG, GIF, and WebP. No MP4. No MOV. No video URLs. If you ask an agent to “watch this video,” it’ll politely tell you it can’t.

The workaround people suggest is extracting frames with ffmpeg and feeding them as images. That works for visual-only content, but you lose audio, context, and temporal understanding. A series of screenshots doesn’t tell you what someone said or in what order things happened.

The Solution: The video-understand Skill

The open agent skills ecosystem (via skills.sh) has a growing collection of skills that extend what agents can do. I found one that solves the video problem elegantly.

video-understand by jrusso1020 is a multi-provider video understanding skill that gives AI agents the ability to analyze videos — both visual content and audio.

The clever part: it doesn’t try to make the agent itself process video. Instead, it routes the video to external models that can handle it natively (like Google’s Gemini), and returns the structured analysis as text that the agent can work with.

How It Works

The skill includes Python scripts that:

Auto-detect available providers based on which API keys you have set
Upload and process video through the best available provider
Return structured JSON with the analysis, transcript, and metadata

It supports 9 providers with automatic fallback:

Priority	Provider	What It Does	Cost
1	Gemini (Google AI Studio)	Full video understanding — visual + audio	Free tier available
2	Vertex AI	Same as Gemini, enterprise tier	Pay-as-you-go
3	OpenRouter	Routes to Gemini models	Free tier available
4	FFMPEG + Whisper	Frame extraction + audio transcription	Free, runs locally
5–9	OpenAI, AssemblyAI, Deepgram, Groq, Local Whisper	Audio transcription only	Varies

You can also pass custom prompts — ask specific questions about the video, request timestamps, or extract particular information.

Installation

npx skills add jrusso1020/video-understand-skills@video-understand -g -y

Set up at least one provider. The simplest is Gemini:

Go to aistudio.google.com
Click Get API Key → Create API Key
Add to your shell config:

echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.zshrc
source ~/.zshrc

Install the Python SDK and CLI tools:

pip install google-genai
brew install ffmpeg yt-dlp

Verify everything works:

python3 ~/.agents/skills/video-understand/scripts/check_providers.py

Usage

Process a local video:

python3 ~/.agents/skills/video-understand/scripts/process_video.py /path/to/video.mp4 
  -p "Describe what happens in this video"

Process a YouTube video (download first, then analyze):

yt-dlp -f "best[ext=mp4]" -o /tmp/video.mp4 "https://youtube.com/watch?v=..."
python3 ~/.agents/skills/video-understand/scripts/process_video.py /tmp/video.mp4 
  -p "Summarize the key points"

The output is clean JSON:

{
  "source": {
    "type": "local",
    "path": "/tmp/video.mp4",
    "duration_seconds": 19.13,
    "size_mb": 0.3
  },
  "provider": "gemini",
  "model": "gemini-3-flash-preview",
  "capability": "full_video",
  "response": "The video shows a young man standing in front of..."
}

The Bug: Deprecated SDK Breaks Everything

Here’s where things got interesting. I installed the skill, set my Gemini API key, and ran the test. It failed immediately:

googleapiclient.errors.HttpError: <HttpError 400 when requesting
https://generativelanguage.googleapis.com/upload/v1beta/files?...
returned "API key expired. Please renew the API key.">

My key was brand new. I had just generated it 30 seconds ago.

The real issue was buried in a warning that appeared before the error:

FutureWarning: All support for the `google.generativeai` package has ended.
It will no longer be receiving updates or bug fixes.
Please switch to the `google.genai` package as soon as possible.

The skill was using google-generativeai — the old, deprecated Python SDK for Gemini. Google has fully sunset this package and replaced it with google-genai. The old package’s file upload API no longer works with current API keys, producing a misleading “API key expired” error even with valid keys.

The Fix

The core change was in the process_with_gemini() function. Here’s what the old code looked like:

# Old — broken (deprecated SDK)
import google.generativeai as genai

genai.configure(api_key=api_key)
genai_model = genai.GenerativeModel(model_name)
video_file = genai.upload_file(source)
response = genai_model.generate_content([prompt, video_file])

And the updated version using the new SDK:

# New — working (current SDK)
from google import genai
from google.genai import types

client = genai.Client(api_key=api_key)
video_file = client.files.upload(file=source)

response = client.models.generate_content(
    model=model_name,
    contents=[
        types.Content(
            parts=[
                types.Part.from_uri(
                    file_uri=video_file.uri,
                    mime_type=video_file.mime_type
                ),
                types.Part.from_text(text=prompt),
            ]
        )
    ],
)

The new google.genai SDK uses a Client-based architecture instead of the old module-level configuration. Content is constructed with typed Part objects rather than raw dicts.

I updated all 7 files — the main script, setup checker, SKILL.md, README, requirements.txt, and the reference docs — to use the new SDK throughout.

The Forked Repo

I’ve submitted a PR to the original repo with the fix. Until that’s merged, you can install directly from my fork which has the fix on the main branch:

npx skills add sarvesh-ghl/video-understand-skills@video-understand -g -y

Forked repo: github.com/sarvesh-ghl/video-understand-skills

Testing It

To verify it works, I tested with the first video ever uploaded to YouTube — “Me at the zoo” by Jawed Karim:

yt-dlp -f "worst[ext=mp4]" -o /tmp/test.mp4 "https://www.youtube.com/watch?v=jNQXAC9IVRw"
python3 ~/.agents/skills/video-understand/scripts/process_video.py /tmp/test.mp4 
  -p "What is happening in this video? Who is the person?"

Gemini’s response:

“The man in this video is Jawed Karim, one of the co-founders of YouTube. In the video, Karim is standing in front of two elephants at the San Diego Zoo. He’s talking about how cool the elephants are, specifically pointing out their ‘really, really, really long trunks.’ This video, titled ‘Me at the zoo,’ was the first video ever uploaded to YouTube.”

Full video understanding — visual identification, audio transcription, and even historical context — all from an AI agent that couldn’t process video 20 minutes earlier.

Why This Matters

Video is becoming a primary medium for technical communication. Screen recordings for bug reports. Loom videos for async standups. YouTube tutorials for onboarding. Product demos for stakeholders.

If your AI agent can’t process video, it’s missing a significant chunk of the context it needs to be genuinely useful. This skill bridges that gap — not perfectly, not natively, but practically.

The open skills ecosystem is what makes this possible. Someone built a skill, shared it publicly, and now any agent — Cursor, Claude Code, Codex, Gemini CLI — can understand video. When the SDK broke, I fixed it and contributed back. That’s how open source is supposed to work.

Credits:

video-understand skill by jrusso1020 — the original author who built the multi-provider video understanding system
skills.sh — the open agent skills ecosystem where these extensions are discovered and shared
Google Gemini — the underlying vision model that makes full video understanding possible

Links:

Original skill: github.com/jrusso1020/video-understand-skills
My fork (with SDK fix): github.com/sarvesh-ghl/video-understand-skills
PR with the fix: github.com/jrusso1020/video-understand-skills/pull/1
Skills ecosystem: skills.sh

March 18, 2026