Blog

  • How to Manage Your WordPress Blog from Cursor Using MCP

    How to Manage Your WordPress Blog from Cursor Using MCP

    I just published a blog post to my WordPress site without ever opening wp-admin. No browser tab. No Gutenberg editor. I wrote the content in Cursor, told my AI agent to publish it, and it did — complete with categories, tags, and proper formatting.

    This isn’t a hack or a workaround. WordPress now has official MCP (Model Context Protocol) support, which means any AI agent — Cursor, Claude Code, Codex, Gemini CLI — can interact with your WordPress site programmatically. Create posts. Upload media. Manage plugins. Moderate comments. All from the same interface where you write code.

    Here’s how I set it up and why it’s worth doing.

    Why Bother?

    If you already use AI agents for coding, you know the flow: you describe what you want, the agent writes it, and you review. But the moment you need to publish something to WordPress, you break out of that flow entirely. Open a browser. Log in. Navigate to the editor. Paste content. Format it. Add categories. Hit publish.

    With MCP, the agent handles all of that. You stay in your IDE. The content goes from draft to published without a context switch. And it’s not just posts — you get full access to your WordPress admin capabilities.

    The Stack

    Three WordPress plugins and one MCP client configuration. That’s the entire setup.

    ComponentWhat It DoesSource
    Abilities APIWordPress core framework for declaring machine-readable capabilitiesGitHub
    MCP AdapterBridges WordPress abilities to the MCP protocol so AI agents can discover and call themGitHub
    MCP Expose AbilitiesRegisters 61 core WordPress abilities (posts, pages, media, plugins, users, comments, menus, options)GitHub
    MCP client configConnects your AI agent (Cursor, Claude Code, etc.) to the WordPress MCP server via HTTPYour ~/.cursor/mcp.json

    The Abilities API and MCP Adapter are official WordPress packages, maintained by the WordPress core team. MCP Expose Abilities is a free, open-source plugin by Devenia that registers the actual content management abilities.

    Setup: WordPress Side

    Step 1: Install the Plugins

    Go to your WordPress admin (your-site.com/wp-admin) and install these three plugins in order:

    1. Abilities API — download from GitHub releases, upload via Plugins → Add New → Upload Plugin, then activate
    2. MCP Adapter — download from GitHub releases (v0.4.1), install the same way, activate
    3. MCP Expose Abilities — download from GitHub releases, install, activate

    The order matters. Abilities API provides the foundation, MCP Adapter bridges it to MCP, and MCP Expose Abilities registers the actual WordPress capabilities.

    Step 2: Create an Application Password

    The MCP client needs to authenticate with your WordPress site. WordPress Application Passwords are the cleanest way to do this.

    1. Go to Users → Your Profile
    2. Scroll down to the Application Passwords section
    3. Enter a name like cursor-mcp
    4. Click Add New Application Password
    5. Copy the generated password — you only see it once

    This password is scoped to API access only. It can’t be used to log into wp-admin directly, which is a nice security boundary.

    Setup: Cursor Side

    For remote WordPress sites, the connection goes through @automattic/mcp-wordpress-remote, an official proxy package that handles the HTTP transport.

    Add this to your ~/.cursor/mcp.json inside the mcpServers object:

    "wordpress-blog": {
      "command": "npx",
      "args": [
        "-y",
        "@automattic/mcp-wordpress-remote@latest"
      ],
      "env": {
        "WP_API_URL": "https://your-site.com/wp-json/mcp/mcp-adapter-default-server",
        "WP_API_USERNAME": "your-username-or-email",
        "WP_API_PASSWORD": "your-application-password"
      }
    }

    Replace the URL, username, and password with your actual values. Restart Cursor to pick up the new MCP server.

    What You Get: 68 WordPress Abilities

    Once connected, you can ask your agent to discover what’s available. Behind the scenes, it calls the mcp-adapter-discover-abilities tool and gets back a full inventory. Here’s what MCP Expose Abilities registers:

    CategoryAbilities
    ContentList, get, create, update, delete, patch, search posts & pages; manage revisions, categories, tags
    MediaUpload (from URL), get, update, delete media items
    PluginsList, upload, activate, deactivate, delete plugins
    MenusList, create menus; add, update, delete items; assign locations
    UsersList, get, create, update, delete users
    CommentsList, get, create, reply, update status, delete comments
    OptionsGet, update, list site options
    SystemSite info, environment info, debug log, toggle debug mode, transients

    There are also 12 optional add-ons for Elementor, Rank Math, Wordfence, Cloudflare, GeneratePress, and more — install only the ones your site uses.

    Using It in Practice

    Here’s what it looks like in a real session. I asked my Cursor agent to publish a blog post I’d just drafted. It:

    1. Called content/list-categories to find the right category ID
    2. Created 5 new tags via content/create-tag
    3. Called content/create-post with the full HTML content, category, and excerpt
    4. Called content/update-post to attach the tags

    The post was live as a draft in under 10 seconds. I reviewed it in wp-admin, made one formatting tweak (which I also did via MCP), and published.

    The agent calls look like this under the hood:

    // Discover what's available
    mcp-adapter-discover-abilities {}
    
    // Create a post
    mcp-adapter-execute-ability {
      "ability_name": "content/create-post",
      "parameters": {
        "title": "My Blog Post Title",
        "content": "<!-- wp:paragraph -->n<p>Your content here...</p>n<!-- /wp:paragraph -->",
        "status": "draft",
        "category_ids": [6],
        "excerpt": "A short description of the post."
      }
    }
    
    // Create and assign tags
    mcp-adapter-execute-ability {
      "ability_name": "content/create-tag",
      "parameters": { "name": "AI" }
    }

    Tips and Gotchas

    Use Gutenberg Block Markup

    If you send raw HTML, WordPress will render it but your theme’s block styles won’t apply. For proper formatting, wrap content in Gutenberg block comments:

    <!-- wp:paragraph -->
    <p>Your paragraph text here.</p>
    <!-- /wp:paragraph -->
    
    <!-- wp:code -->
    <pre class="wp-block-code"><code>your code here</code></pre>
    <!-- /wp:code -->
    
    <!-- wp:heading -->
    <h2 class="wp-block-heading">Your Heading</h2>
    <!-- /wp:heading -->

    This is what the Gutenberg editor produces internally. If you tell your agent to use this format, your posts will look identical to ones created in the editor.

    Draft First, Publish Later

    Always create posts with "status": "draft" so you can review before publishing. You can then either publish from wp-admin or tell the agent to update the status:

    mcp-adapter-execute-ability {
      "ability_name": "content/update-post",
      "parameters": {
        "id": 48,
        "status": "publish"
      }
    }

    The Abilities API Is Now in WordPress Core

    As of WordPress 6.9, the Abilities API has been merged into core. If you’re on 6.9+, you may not need the separate Abilities API plugin. The MCP Adapter and MCP Expose Abilities plugins are still required as separate installs.

    Beyond Blog Posts

    The real power isn’t just publishing posts faster. It’s having your entire WordPress site accessible as part of your AI workflow. Some things I plan to use this for:

    • Batch content updates — update metadata, fix formatting, or patch content across multiple posts in one session
    • Plugin management — check plugin status, activate/deactivate, or upload new plugins without leaving the IDE
    • Comment moderation — review and respond to comments as part of a daily routine, all from the terminal
    • Site diagnostics — check debug logs, site info, and environment details when troubleshooting
    • Content pipelines — research a topic, draft in Obsidian, review, then publish to WordPress — all in one agent session

    That last one is exactly what I did today. I researched a topic, drafted a blog post, saved it to my Obsidian vault, then published it to WordPress — all without leaving Cursor.

    The Full Setup Checklist

    For reference, here’s everything in one place:

    1. Install Abilities API plugin (or upgrade to WordPress 6.9+)
    2. Install MCP Adapter plugin
    3. Install MCP Expose Abilities plugin
    4. Create an Application Password in Users → Your Profile
    5. Add the MCP server config to ~/.cursor/mcp.json
    6. Restart Cursor
    7. Ask your agent: “discover my WordPress abilities”

    That’s it. Your WordPress site is now part of your AI workflow.


    Credits and Links:

    This post was written, formatted, and published to WordPress entirely from within Cursor, using the exact setup described above.

  • How to Make AI Agents Understand Videos

    How to Make AI Agents Understand Videos

    Cursor, Claude Code, Codex — they can read your codebase, write code, run shell commands, and even browse the web. But hand them a video file and they’re blind.

    This isn’t a minor gap. Screen recordings of bugs, product demo videos, YouTube tutorials you want to reference in code — video is everywhere in modern development workflows. Yet when you drop a .mp4 into a conversation, your AI agent has no idea what to do with it.

    I ran into this exact limitation and went looking for a solution. Here’s what I found, what broke, and how I fixed it.

    The Problem: AI Agents Can’t See Videos

    Most AI coding agents — including those powered by Claude — support image inputs natively. You can screenshot your UI bug and ask “what’s wrong here?” and get a useful answer.

    But video? Nothing. The Read tool in Cursor supports JPEG, PNG, GIF, and WebP. No MP4. No MOV. No video URLs. If you ask an agent to “watch this video,” it’ll politely tell you it can’t.

    The workaround people suggest is extracting frames with ffmpeg and feeding them as images. That works for visual-only content, but you lose audio, context, and temporal understanding. A series of screenshots doesn’t tell you what someone said or in what order things happened.

    The Solution: The video-understand Skill

    The open agent skills ecosystem (via skills.sh) has a growing collection of skills that extend what agents can do. I found one that solves the video problem elegantly.

    video-understand by jrusso1020 is a multi-provider video understanding skill that gives AI agents the ability to analyze videos — both visual content and audio.

    The clever part: it doesn’t try to make the agent itself process video. Instead, it routes the video to external models that can handle it natively (like Google’s Gemini), and returns the structured analysis as text that the agent can work with.

    How It Works

    The skill includes Python scripts that:

    1. Auto-detect available providers based on which API keys you have set
    2. Upload and process video through the best available provider
    3. Return structured JSON with the analysis, transcript, and metadata

    It supports 9 providers with automatic fallback:

    PriorityProviderWhat It DoesCost
    1Gemini (Google AI Studio)Full video understanding — visual + audioFree tier available
    2Vertex AISame as Gemini, enterprise tierPay-as-you-go
    3OpenRouterRoutes to Gemini modelsFree tier available
    4FFMPEG + WhisperFrame extraction + audio transcriptionFree, runs locally
    5–9OpenAI, AssemblyAI, Deepgram, Groq, Local WhisperAudio transcription onlyVaries

    You can also pass custom prompts — ask specific questions about the video, request timestamps, or extract particular information.

    Installation

    npx skills add jrusso1020/video-understand-skills@video-understand -g -y

    Set up at least one provider. The simplest is Gemini:

    1. Go to aistudio.google.com
    2. Click Get API KeyCreate API Key
    3. Add to your shell config:
    echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.zshrc
    source ~/.zshrc

    Install the Python SDK and CLI tools:

    pip install google-genai
    brew install ffmpeg yt-dlp

    Verify everything works:

    python3 ~/.agents/skills/video-understand/scripts/check_providers.py

    Usage

    Process a local video:

    python3 ~/.agents/skills/video-understand/scripts/process_video.py /path/to/video.mp4 
      -p "Describe what happens in this video"

    Process a YouTube video (download first, then analyze):

    yt-dlp -f "best[ext=mp4]" -o /tmp/video.mp4 "https://youtube.com/watch?v=..."
    python3 ~/.agents/skills/video-understand/scripts/process_video.py /tmp/video.mp4 
      -p "Summarize the key points"

    The output is clean JSON:

    {
      "source": {
        "type": "local",
        "path": "/tmp/video.mp4",
        "duration_seconds": 19.13,
        "size_mb": 0.3
      },
      "provider": "gemini",
      "model": "gemini-3-flash-preview",
      "capability": "full_video",
      "response": "The video shows a young man standing in front of..."
    }

    The Bug: Deprecated SDK Breaks Everything

    Here’s where things got interesting. I installed the skill, set my Gemini API key, and ran the test. It failed immediately:

    googleapiclient.errors.HttpError: <HttpError 400 when requesting
    https://generativelanguage.googleapis.com/upload/v1beta/files?...
    returned "API key expired. Please renew the API key.">

    My key was brand new. I had just generated it 30 seconds ago.

    The real issue was buried in a warning that appeared before the error:

    FutureWarning: All support for the `google.generativeai` package has ended.
    It will no longer be receiving updates or bug fixes.
    Please switch to the `google.genai` package as soon as possible.

    The skill was using google-generativeai — the old, deprecated Python SDK for Gemini. Google has fully sunset this package and replaced it with google-genai. The old package’s file upload API no longer works with current API keys, producing a misleading “API key expired” error even with valid keys.

    The Fix

    The core change was in the process_with_gemini() function. Here’s what the old code looked like:

    # Old — broken (deprecated SDK)
    import google.generativeai as genai
    
    genai.configure(api_key=api_key)
    genai_model = genai.GenerativeModel(model_name)
    video_file = genai.upload_file(source)
    response = genai_model.generate_content([prompt, video_file])

    And the updated version using the new SDK:

    # New — working (current SDK)
    from google import genai
    from google.genai import types
    
    client = genai.Client(api_key=api_key)
    video_file = client.files.upload(file=source)
    
    response = client.models.generate_content(
        model=model_name,
        contents=[
            types.Content(
                parts=[
                    types.Part.from_uri(
                        file_uri=video_file.uri,
                        mime_type=video_file.mime_type
                    ),
                    types.Part.from_text(text=prompt),
                ]
            )
        ],
    )

    The new google.genai SDK uses a Client-based architecture instead of the old module-level configuration. Content is constructed with typed Part objects rather than raw dicts.

    I updated all 7 files — the main script, setup checker, SKILL.md, README, requirements.txt, and the reference docs — to use the new SDK throughout.

    The Forked Repo

    I’ve submitted a PR to the original repo with the fix. Until that’s merged, you can install directly from my fork which has the fix on the main branch:

    npx skills add sarvesh-ghl/video-understand-skills@video-understand -g -y

    Forked repo: github.com/sarvesh-ghl/video-understand-skills

    Testing It

    To verify it works, I tested with the first video ever uploaded to YouTube — “Me at the zoo” by Jawed Karim:

    yt-dlp -f "worst[ext=mp4]" -o /tmp/test.mp4 "https://www.youtube.com/watch?v=jNQXAC9IVRw"
    python3 ~/.agents/skills/video-understand/scripts/process_video.py /tmp/test.mp4 
      -p "What is happening in this video? Who is the person?"

    Gemini’s response:

    “The man in this video is Jawed Karim, one of the co-founders of YouTube. In the video, Karim is standing in front of two elephants at the San Diego Zoo. He’s talking about how cool the elephants are, specifically pointing out their ‘really, really, really long trunks.’ This video, titled ‘Me at the zoo,’ was the first video ever uploaded to YouTube.”

    Full video understanding — visual identification, audio transcription, and even historical context — all from an AI agent that couldn’t process video 20 minutes earlier.

    Why This Matters

    Video is becoming a primary medium for technical communication. Screen recordings for bug reports. Loom videos for async standups. YouTube tutorials for onboarding. Product demos for stakeholders.

    If your AI agent can’t process video, it’s missing a significant chunk of the context it needs to be genuinely useful. This skill bridges that gap — not perfectly, not natively, but practically.

    The open skills ecosystem is what makes this possible. Someone built a skill, shared it publicly, and now any agent — Cursor, Claude Code, Codex, Gemini CLI — can understand video. When the SDK broke, I fixed it and contributed back. That’s how open source is supposed to work.


    Credits:

    • video-understand skill by jrusso1020 — the original author who built the multi-provider video understanding system
    • skills.sh — the open agent skills ecosystem where these extensions are discovered and shared
    • Google Gemini — the underlying vision model that makes full video understanding possible

    Links:

  • The CI Check That Forces Your Docs to Keep Up With Your Code

    The CI Check That Forces Your Docs to Keep Up With Your Code

    How I Built an AI-Powered Documentation Gate Using GitHub Actions, Bun, and GPT-4.1 Mini

    Every engineering team has the same dirty secret: documentation is always out of date.

    You ship a new endpoint on Monday. The API reference still shows last month’s schema on Friday. Someone adds three environment variables but nobody touches the config docs. A new developer joins, reads the architecture guide, and builds a mental model that’s six sprints behind reality.

    We all know the solution — “just update the docs when you change the code.” But humans are terrible at remembering, and code reviewers are terrible at catching it.

    So I built a CI check that does it for them.

    The Problem: Docs Drift

    I maintain a Knowledge Base Management Dashboard — a full-stack app with a Bun/Hono backend, Next.js frontend, PostgreSQL, Qdrant vector database, and Redis. The codebase already has solid documentation: API reference, architecture guide, backend services, frontend pages, database schema, configuration. Six markdown files, all carefully maintained.

    The problem wasn’t having docs — it was keeping them current. We already had rules in our AGENTS.md file:

    • New API endpoint → update api-reference.md
    • New service → update backend.md and architecture.md
    • Schema change → update database-schema.md

    Rules that everyone agreed with and nobody consistently followed.

    The Idea: Make CI Enforce It

    What if the CI pipeline could detect when you changed code that should trigger a doc update, and block the merge until you actually update the docs?

    Not a linter. Not a reminder. A hard gate.

    Here’s the flow I wanted:

    1. Developer opens a PR
    2. CI detects which code files changed and maps them to documentation files
    3. If the relevant docs weren’t updated → fail the CI and post a review comment with exactly what to do
    4. If docs were updated → use AI to verify the updates actually cover the changes
    5. Developer fixes the docs, pushes again → CI re-checks → repeat until it passes

    Phase 1: The Suggestion Bot (and Why I Scrapped It)

    My first attempt was gentler. I built a GitHub Actions workflow that would analyze PRs and suggest documentation updates as a regular PR comment. It used GPT-4.1 Mini to read the diffs, compare them against the current docs, and generate specific suggestions like “add this endpoint to api-reference.md.”

    It worked. The suggestions were good. But nobody acted on them.

    Turns out, optional suggestions in a PR comment are just noise. Developers read them, think “I’ll do that later,” and merge. The docs stay stale.

    Lesson learned: if you want docs to stay current, you need a gate, not a suggestion.

    Phase 2: The Docs Gate

    The rewrite changed three things:

    1. CI fails instead of suggesting. The workflow exits with code 1, which means the check shows as a red X. If your branch protection requires passing checks, the PR literally cannot merge.

    2. It posts a REQUEST_CHANGES review, not a comment. GitHub review comments have a “Resolve conversation” button. They show up in the “Files changed” tab. They count as blocking reviews. You can’t ignore them the way you ignore a bot comment.

    3. It generates a Cursor prompt, not doc content. Instead of the AI writing the docs (which produced mediocre results), it generates a prompt that the developer pastes into Cursor. Cursor has full IDE context — it reads the changed files, reads the existing docs, and updates them properly. The AI in CI just detects the gap; the AI in the IDE fixes it.

    How It Works Under the Hood

    The system has four TypeScript modules running on Bun:

    File Classifier — A regex-based mapping from code paths to doc files. Routes map to the API reference. Services map to the backend docs. Schema changes map to the database docs. This is the cheapest possible detection — no AI needed, just pattern matching.

    const FILE_TO_DOC_MAPPINGS = [
      {
        pattern: /^apps\/knowledgebase\/backend\/src\/routes\/.+\.ts$/,
        docs: ["docs/api-reference.md", "docs/backend.md"],
      },
      {
        pattern: /^apps\/knowledgebase\/backend\/src\/db\/schema\.ts$/,
        docs: ["docs/database-schema.md"],
      },
      // ... more mappings
    ];
    

    LLM Verifier — When docs were updated, this module sends the code diffs + doc diffs to GPT-4.1 Mini and asks: “Do the documentation changes adequately cover the code changes?” It returns a pass/fail with specific gaps. This is a verification prompt, not a generation prompt — much cheaper and more reliable.

    Cursor Prompt Builder — Generates a developer-friendly prompt that lists exactly which files to read and which docs to update. For incomplete docs, it includes the specific gaps the AI found. The prompt references the repo’s own AGENTS.md rules so Cursor follows the project’s conventions.

    Main Orchestrator — Ties it all together with a decision tree:

    No doc-relevant code changed         → Pass
    Code changed, docs not touched       → Fail (no AI needed, free)
    Code changed, docs touched           → AI verifies quality
      AI says complete                   → Pass, dismiss previous review
      AI says incomplete                 → Fail with specific gaps
    

    The clever bit: when docs aren’t touched at all, the check fails without making any AI calls. It’s completely free. The AI only runs when docs were actually updated and need quality verification.

    The Cost Profile

    This was important to get right. Nobody wants a CI check that costs $5 per PR.

    PR scenarioDetection costVerification cost
    No doc changes neededFreeNone
    Docs not updated (most common failure)FreeNone
    Docs updated, < 2000 linesFree1 LLM call (~$0.01)
    Docs updated, 2000-5000 linesFreeN calls, chunked by doc
    Mega PR > 5000 linesFreeSkipped (file-touched check only)

    The typical case — a developer forgets to update docs — costs literally nothing. The AI only fires when there’s actual doc content to verify.

    What the Developer Sees

    When the check fails, the PR gets a review like this:

    The developer expands the prompt, pastes it into Cursor, and Cursor does the rest. Push, and the CI re-runs.

    Until the docs are updated.

    The GitHub Actions Workflow

    The whole thing runs in a single job:

    name: Docs Gate
    
    on:
      pull_request:
        types: [opened, synchronize, reopened]
        branches: [main]
    
    permissions:
      contents: read
      pull-requests: write
    
    jobs:
      check-docs:
        name: Check documentation
        runs-on: ubuntu-latest
        steps:
          - uses: actions/checkout@v4
          - uses: oven-sh/setup-bun@v2
          - run: bun install --frozen-lockfile
            working-directory: .github/scripts
          - run: bun run docs-gate.ts
            working-directory: .github/scripts
            env:
              GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
              OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
              GITHUB_REPOSITORY: ${{ github.repository }}
              PR_NUMBER: ${{ github.event.pull_request.number }}
    

    Setup: add OPENAI_API_KEY as a repo secret. That’s it. GITHUB_TOKEN is automatic.

    What I’d Do Differently

    Start with the gate, not the suggestion bot. I wasted a full iteration building a “nice” suggestion system that nobody used. The constraint (blocking merge) is what makes it work.

    The file classifier is the most important piece. Get the regex mappings right and everything else follows. Get them wrong and developers will learn to ignore false positives.

    Let the IDE AI write docs, not the CI AI. CI has limited context — just diffs and file contents. The IDE has the full codebase, language server, and developer intent. Use CI for detection, IDE for correction.

    Try It Yourself

    The entire implementation is open source. You can adapt it to any repo by:

    1. Writing a file classifier that matches your project’s code-to-docs mapping
    2. Pointing the Cursor prompt builder to your documentation conventions file
    3. Adding the workflow and an OpenAI API key

    The code lives in .github/scripts/ — four TypeScript files, about 500 lines total, running on Bun.

    If your team has the “docs are always out of date” problem, this fixes it. Not by generating docs for you, but by making it impossible to merge without them.


    The complete source code is available on GitHub: sarvesh-ghl/docs-gate. Fork it, customize the file classifier for your project, add an OpenAI API key, and you’re done.

    _If you’re interested in the implementation details or want to discuss adapting this for your stack, feel free to reach out._

  • Engineering Impact: Inside the Tech Stack that Powers the Trillion Tree Campaign

    Engineering Impact: Inside the Tech Stack that Powers the Trillion Tree Campaign

    When we talk about climate change, we usually talk about carbon, temperature, and policy. But at Plant-for-the-Planet, we also talk about scale.

    The Trillion Tree Campaign isn’t just a slogan; it’s a logistical beast. How do you track planting data from 225+ projects globally? How do you visualize geospatial data for millions of trees without crashing a mobile browser? And how do you process donations from Apple Pay, Google Pay, and Stripe while ensuring every cent is traceable?

    I spent a significant part of my career architecting solutions for these problems. While my previous post covered the why, this post covers the how.

    Here is a technical deep dive into the development of Forest Cloud, the open-source platform driving the global reforestation movement.

    The Core Stack: Performance Meets Purpose

    The platform (web.plant-for-the-planet.org) needed to be fast, SEO-friendly (so projects get found), and capable of handling complex state management.

    • Frontend: We utilized Next.js (React) for its hybrid static/server-side rendering capabilities. This was crucial for indexing the individual pages of hundreds of planting projects.
    • Widgets: For embeddable components (like the tree counter you might see on partner sites), we used Svelte. Its compile-step approach meant we could ship tiny, highly performant bundles that didn’t drag down the host websites.
    • Backend: We leaned heavily into Serverless APIs. When traffic spikes during a climate awareness campaign, serverless functions scale instantly. We implemented aggressive caching strategies to ensure that the “Tree Counter” didn’t hammer our database on every page load.

    Challenge 1: Visualizing the Invisible

    One of my biggest tasks was making the data “real” for donors. A number on a screen is abstract; a satellite image of a restoration site is tangible.

    We developed interactive maps using Mapbox GL JS and ESRI. The challenge here wasn’t just rendering a map; it was rendering heavy geospatial tree planting data alongside satellite imagery layers without killing performance.

    By optimizing how we loaded vector tiles and managing state carefully, we created a seamless experience where users could zoom from a global view down to specific planting sites in the Yucatan. This accessibility directly correlated with impact: in 2021 alone, we saw a 378% increase in trees planted.

    Challenge 2: DevOps & The “n8n on Heroku” Story

    Automation is the silent hero of non-profits. We used n8n (a workflow automation tool) to glue various services together. However, deploying it cost-effectively was a hurdle.

    I created the first-ever Docker implementation of n8n specifically for Heroku deployment. This allowed us to run complex workflows—like triggering emails or syncing data between CRMs—without managing a dedicated server. It was a perfect example of how “devops creativity” can save resources that are better spent on planting trees.

    The Pipeline:

    • CI/CD: We moved to a strict version-controlled environment. I managed pipelines on Heroku, Vercel, and Gridpane, ensuring that code moved from develop to staging to production via automated GitHub Actions.
    • WordPress as Headless: We didn’t abandon WordPress; we just used it better. I wrote custom PHP plugins to expose WordPress content via REST APIs, effectively treating it as a Headless CMS. This gave the content team a familiar interface while keeping the frontend strictly React-based.

    Challenge 3: Trust & Payments

    The Transparency Standards at Plant-for-the-Planet are rigorous. We couldn’t just “take money”; we had to route it.

    I worked on integrating a multi-gateway payment system including Stripe, PayPal, Apple Pay, and Google Pay. The complexity wasn’t just in the API calls, but in the webhooks—ensuring that when a payment succeeded, the “Tree Counter” updated, the donor got a receipt, and the specific planting project was credited, all in real-time.

    Why This Matters

    Open Source is usually associated with developer tools, but Green Open Source is a growing field. By making our repository public (GitHub Link), we invite developers to audit our code and contribute to climate justice.

    This project wasn’t just about writing code; it was about building the digital soil in which a trillion trees could take root.


    If you are interested in how we used Docker to containerize automation tools or want to discuss the geospatial challenges of mapping a trillion trees, feel free to reach out.

  • Plant-for-the-Planet — My Journey in the Global Movement for Climate Justice

    Plant-for-the-Planet — My Journey in the Global Movement for Climate Justice

    Today I want to share a part of my journey that has deeply shaped my sense of purpose, collaboration, and impact — my experience with Plant-for-the-Planet.

    When I first came across Plant-for-the-Planet, I was struck by how a global climate movement could be both ambitious in its goals and inclusive in its approach. It isn’t just another environmental organisation — it is a youth-empowerment and restoration movement that believes in practical, scalable solutions in the fight against the climate crisis. (Plant-for-the-Planet)

    What Plant-for-the-Planet Is

    Plant-for-the-Planet began in 2007 when nine-year-old Felix Finkbeiner proposed that children around the world could plant one million trees in every country to combat climate change — a simple but powerful idea rooted in action and justice. (Wikipedia)

    Over the years, this idea evolved from grassroots climate activism into a global movement focused on restoring forests, empowering youth as Climate Justice Ambassadors, and providing free tools and platforms for restoration work. (Plant-for-the-Planet)

    Today, Plant-for-the-Planet’s mission includes:

    • Empowering children and youth through climate education and leadership training. (Plant-for-the-Planet)
    • Supporting forest restoration and conservation projects in ecosystems across the world. (Plant-for-the-Planet)
    • Providing free, transparent digital tools that help people donate to, monitor, and manage restoration projects. (Plant-for-the-Planet)

    Their global vision — to plant one trillion trees — guides much of the organisation’s work and underscores how nature-based solutions are central to tackling climate change. (Wikipedia)

    Joining the Movement Through Code

    My contribution to Plant-for-the-Planet began in a way that reflects both my values and my skill set — through open-source development.

    In 2020, I started contributing to the Plant-for-the-Planet Web App — an open-source platform called Forest Cloud that powers key parts of the Trillion Tree Campaign. (GitHub)

    Forest Cloud and its associated ecosystem — including tools like TreeMapper and the Restoration Platform — are aimed at making reforestation transparent, traceable, and accessible to individuals, organisations, and communities around the world. (Plant-for-the-Planet)

    By March 2022, I had become one of the top open-source contributors to the project (ranked #2 by contributions in the planet-webapp repository between June 2020 and March 2022). This was more than writing code — it was about building infrastructure for impact. Seeing features come to life and help visualise tree donations or restoration progress was deeply motivating and taught me a lot about collaborative, purpose-driven software development.

    What I Learned Along the Way

    Working with Plant-for-the-Planet reinforced some powerful lessons:

    1. Climate action is both local and global.
    Plant-for-the-Planet connects local planting efforts and youth leadership with a global framework and tools that unlock resources for restoration everywhere. (Plant-for-the-Planet)

    2. Technology can democratise impact.
    By contributing to open-source platforms, I saw first-hand how accessible digital infrastructure can enable anyone to support restoration, track progress, and donate with confidence. (Plant-for-the-Planet)

    3. Youth leadership is real leadership.
    The organisation’s model — training Climate Justice Ambassadors through peer-to-peer learning and action — shows how young people can not only lead but educate others and shape real outcomes. (Plant-for-the-Planet)

    Looking Back, Moving Forward

    My time with Plant-for-the-Planet wasn’t just a chapter in my professional journey — it was a worldview shift. I appreciated being part of a community that tackles one of our generation’s biggest challenges while staying anchored in optimism and measurable action.

    Even now, when I think about climate solutions, I think about forests, community, and transparent tools — and how each of us, regardless of age or background, can contribute in meaningful ways. Plant-for-the-Planet gave me a space to do that, and I’m grateful for every line of code, every discussion, and every shared mission that made this journey unforgettable.

  • Enroot Innovation Foundation — How It Started, What It Means, and Why It Matters

    Enroot Innovation Foundation — How It Started, What It Means, and Why It Matters

    When I look back at the start of my professional journey, one of the most formative experiences was being part of a collective effort that was more than just an organisation — it was a community with a purpose. That is how I first encountered Enroot Innovation Foundation (Enroot Mumbai) — a grassroots-driven innovation platform rooted in Mumbai that strives to do social good through creativity, design thinking, and collaboration.

    Enroot wasn’t born out of a boardroom or a business plan drafted in isolation. It started with a group of curious, passionate people in Mumbai — a bunch of “crazy” innovators, as we sometimes joked — determined to solve real-world problems collaboratively and support each other’s growth. What tied us together wasn’t just a cause, but a belief that community-led innovation could create sustainable and human-centred solutions for society’s pressing challenges. (Enroot)

    The Beginning: Community First

    In its early days, Enroot was very much a community effort. The philosophy was simple: bring together motivated individuals from diverse backgrounds — technology, design, engineering, social development — and let collective problem solving lead the way. The mission was clear: to cultivate an attitude of problem solving among the citizens of Mumbai with collaborative support. (Enroot)

    This community-first approach meant that everyone involved was learning as we built — first from each other, then from our partners and the real communities we were trying to serve. It was a place where volunteering was not just about giving time, but about learning, experimenting, and growing together.

    From Community to Impactful Action

    As this community matured into a more structured organisation — now formally known as Enroot Innovation Foundation — the work broadened while staying true to its roots. Enroot positioned itself as an innovation-first organisation that uses design thinking and engineering innovation to create sustainable solutions for social change. (Enroot)

    The ethos was not just “do good,” but amplify the good already being done. The foundation believed that many NGOs and social initiatives struggle not because of lack of intent, but due to limited resources, technological integration, and fragmented collaboration — areas where an innovation mindset can make a big difference. (Enroot)

    Project Work and Real-World Outcomes

    One of the things I’m proudest of during my time with Enroot is how theory turned into action. A few examples of this include:

    • Myna App — A platform to empower young underprivileged women with access to health and well-being resources. (Enroot)
    • Global Parli — A rural empowerment initiative focused on 360-degree village development. (Enroot)
    • COVID-19 Maharashtra Tracker — A multilingual resource during the pandemic. (Enroot)
    • Creating Abilities — An initiative to uplift and support the specially-abled community. (Enroot)
    • Saplings for Farmers — A regional digital platform to support farmers with sapling access and agricultural knowledge. (Enroot)

    These were not just theoretical ideas — they were real projects developed with teams, stakeholders, and communities at the centre. Looking back, every team effort taught us something about empathy, iterative design, and human impact.

    What I Learned and Why It Matters

    My role in Enroot — especially as someone leading engineering contributions — wasn’t just about writing code or building apps. It was about listening to lived experiences, understanding constraints, and working with communities rather than for them. It shaped my approach to not only technology but also collaboration, empathy, and purpose.

    At a moment when technology often creates distance, Enroot reminded me that true innovation bridges gaps — between communities and opportunities, between empathy and execution.

    The Path Ahead

    Even in early 2022, it was clear that Enroot was more than an organisation — it was a learning ecosystem. The journey from an informal community in Mumbai to an active innovation foundation showed what happens when passion meets structure, and when people choose to collaborate instead of compete. (Enroot)

    As I continue my path beyond Enroot, the lessons from those early days stay with me. They influence how I think about creating value — not just in terms of outputs or products, but in the lives impacted along the way.

  • Deploying the Undeployable: How I Engineered the First Dockerized n8n for Heroku

    Deploying the Undeployable: How I Engineered the First Dockerized n8n for Heroku

    When I set out to automate Plant-for-the-Planet’s internal workflows, I had a simple requirement: I needed n8n (a powerful workflow automation tool), but I didn’t want to manage a server.

    We were already using Heroku for our main stack, so deploying n8n there seemed like the logical step. I thought it would be a quick afternoon task: write a Dockerfile, push it, and go home.

    I was wrong. What followed was a week of relentless trial and error, staring at CrashLoopBackOff logs, and hacking around Heroku’s strict security model.

    Here is the story of how I built the first stable, one-click Docker deployment for n8n on Heroku—and the specific technical hurdles I had to break to get there.

    Phase 1: The “Permissions” Hell

    The first wall I hit was immediate. I tried deploying the standard n8n Docker image, and the app crashed instantly.

    The logs were cryptic but pointing to a permissions failure:

    Bash

    su-exec: setgroups: Operation not permitted
    [WARN tini (3)] Tini is not running as PID 1
    

    The Engineering Conflict:

    The official n8n image at the time relied on su-exec to switch users at runtime. It expected to start as root, set up permissions, and then drop down to the node user.

    Heroku, however, is a locked-down PaaS. It runs containers with a random, non-root user ID for security. It strictly forbids setgroups calls. Basically, the container was trying to say “I am root,” and Heroku was saying “No, you are not.”

    The Fix:

    I had to rebuild the image from scratch. I wrote a custom Dockerfile that stripped out the su-exec logic entirely. Instead of trying to switch users at runtime, I configured the image to run natively as the node user from the start, ensuring it never requested privileges Heroku wouldn’t grant.

    Phase 2: The Random Port Lottery

    Once I fixed the user permissions, the app started—but nobody could reach it.

    Heroku reported: Error R10 (Boot timeout) -> Web process failed to bind to $PORT.

    The Engineering Conflict:

    n8n defaults to listening on port 5678.

    Heroku doesn’t care about your defaults. It assigns a random port to your application every time the dyno restarts (e.g., 12345, 54321) and passes it via the $PORT environment variable. If your app doesn’t listen on exactly that port within 60 seconds, Heroku kills the process.

    The Fix:

    I couldn’t hardcode the port in a config file because it changed every 24 hours. I had to write a dynamic entrypoint script (docker-entrypoint.sh) to bridge the gap:

    Bash

    #!/bin/sh
    if [ -z ${PORT+x} ]; then
        echo "PORT variable not defined, leaving N8N to default port."
    else
        # Inject Heroku's random port into n8n's expected variable
        export N8N_PORT=$PORT
        echo "N8N will start on '$PORT'"
    fi
    
    # Execute the command
    n8n start
    

    This script acts as the translator between Heroku’s infrastructure and n8n’s runtime.

    Phase 3: The “Deploy to Heroku” Button

    Fixing the code was only half the battle. I wanted this to be reusable for my team (and others). I didn’t want anyone else to have to manually configure Heroku Postgres or Redis just to get this running.

    I introduced an app.json manifest to the repository. This file tells Heroku exactly what the app needs before it even builds:

    JSON

    "env": {
        "DB_TYPE": {
            "description": "The type of database to use.",
            "value": "postgresdb"
        },
        "N8N_ENCRYPTION_KEY": {
            "description": "The encryption key for n8n.",
            "generator": "secret"
        }
    },
    "addons": [
        "heroku-postgresql",
        "heroku-redis"
    ]
    

    By defining the addons and environment generation in code, I turned a complex manual deployment into a literal “One-Click” install.

    The Result

    After dozens of commits (you can scroll through the initial commit history to see the struggle), I finally had a stable architecture.

    This solution allowed Plant-for-the-Planet to run critical automation pipelines on a zero-maintenance infrastructure. But beyond our use case, it took on a life of its own. The repository sarveshpro/n8n-heroku has since been starred and forked hundreds of times by other developers facing the exact same constraints.

    Open source isn’t always about inventing a new framework. Sometimes, it’s just about banging your head against a wall until you find the door—and then holding that door open for everyone else.


    If you need to deploy n8n on Heroku today, you don’t need to repeat my trial and error. You can just fork the solution here:

    GitHub – sarveshpro/n8n-heroku