How I built a chief of staff on OpenClaw that's better than any human I've hired

I'm a VC in the middle of a fundraise, sitting on boards, helping portfolio companies, and angel investing on the side. I've worked with great human EAs and chiefs of staff over the years, so I know what high-leverage support actually looks like. When the first AI APIs came out, I tried to build an AI version of that as a product and couldn't make it work.

When OpenClaw launched I went deep immediately and haven't stopped. I have helped a number of friends set it up and each of them have asked what I have done to configure it and super power it. @ryancarson's post (link in the comments) about how he built his OpenClaw assistant was also great to see, and the response to it convinced me to finally write up what I've been building.

What I have now is more capable than any human chief of staff I've ever worked with. It never forgets a commitment, it handles the small stuff without being asked, flags the important stuff without being told, and it gets better every week. Plus it never sleeps and it never tires. There are still some bumps, but less and less each week.

If any of this is interesting, let me know. If there's enough interest I'll package the whole system up and open source it.

What makes a great chief of staff?

Before I walk through what I've built, it's worth thinking about what a great chief of staff actually does. Not the job description, the real leverage. The best ones I've worked with filtered the noise so only the right things reached me, made sure I walked into every meeting prepared and that nothing fell through after, kept the full picture of what was in flight and flagged what was slipping, tracked relationships and knew where things stood with every important person, and created the daily and weekly rhythm that kept everything moving.

Her name is Stella. She handles all of these, and I'll walk through each one below. But the two things that make my setup genuinely different from other OpenClaw builds are the memory layer underneath it all and the continuous improvement loop that makes the system get better every week. I want to start there because they're what make everything else compound.

Memory: the foundation

Session memory is a lie. Any assistant that treats conversation history as its working context will fail you at the most frustrating moments.

I built two layers. The first is daily notes: one markdown file per day (memory/YYYY-MM-DD.md) serving as a raw log of everything that happened. Meetings attended, decisions made, tasks added and completed, context that came up in conversation. A script called pulls from my sessions throughout the day and writes these automatically.

The second is long-term memory in MEMORY.md, curated by Stella herself. Key people, active projects, lessons learned, decisions made. She periodically synthesizes this from the daily notes, and it's what she reads on startup to orient herself on what matters right now.

Every meeting processed, every email triaged, and every task tracked feeds back into this picture continuously. Without this layer you have a capable assistant with amnesia. With it you have something closer to a person who's been working alongside you for months and never forgets anything.

I've also come to really value that all of this lives in flat markdown files rather than a database. I can open any memory file, read it, edit it if something's wrong, and understand exactly what the assistant knows. I can back the whole thing up to git and restore anything instantly. There's no abstraction layer between me and the assistant's understanding of my world, which means I trust it more and fix things faster when they're off.

Here's where the layers really come together. I'm managing a fundraise involving 100+ LP contacts across multiple countries. Stella tracks the full pipeline, keeps context on each LP and contact, and knows where every relationship stands. For first meetings, I've created a rule that she researches the fund and any recent content they or their partners have published, then preps me with what she found, how it maps to our thesis, and tailored talking points as part of the pre-meeting brief. For ongoing relationships, she knows exactly where they are in the pipeline, what was discussed and committed in our last meeting, and what the key issues are. You can't automate something as critical as a fundraise, but having this kind of structure underneath it means I'm spending my time on the conversations themselves rather than managing the process around them.

Kaizen: the system improves itself

This might be my favorite part, and the thing that makes it feel genuinely different from any assistant I've worked with, human or AI.

Every Friday, a cron job runs research. Stella scans the OpenClaw community, checks for new patterns, looks at what other builders are doing, and saves findings to `memory/kaizen-research-YYYY-MM-DD.md`. On Sunday morning we review it together. She summarizes the week's research, surfaces the top ideas worth trying, and we talk through what to actually change.

But it's not just external research. She also learns from our daily interactions. If I keep correcting something, or if a feature creates more friction than value, that gets captured in memory and eventually surfaces as a suggestion to fix it. If a triage filter is too noisy, or a brief format isn't landing, she notices and proposes a change.

This is something a human chief of staff genuinely cannot do at scale. They can absolutely learn from working with you, but they can't simultaneously scan what hundreds of other builders are doing and cross-reference it against your system every week. That combination means the system gets measurably better on a cadence, not just when I remember to tinker with it.

The Kaizen loop also drives continuous refactoring of the system itself. I've been through multiple cycles where I build a feature, run it for a few weeks, see how it actually fits my workflow, and then clean it up or cut it entirely. First versions are always too complicated, too noisy, or solving the wrong part of the problem. A smaller system you trust will always beat a bigger one you route around. The Kaizen process makes that refactoring disciplined rather than ad hoc.

Meeting prep and follow-through

This is where the best human EAs spend most of their time, and it's where Stella earns her keep.

Sixty minutes before any external meeting, a brief arrives via WhatsApp. It pulls prior meeting notes on the attendees from memory, checks recent email threads, and finds any open action items. For LP meetings during the fundraise, it includes pipeline stage, what deck version they've seen, their questions from last time, and what I committed to sending. I walk into every meeting more prepared than I ever was with a full-time EA.

On the other side, Stella processes every meeting through the Granola API (could be any note taker with an API). She fetches the notes, deduplicates against what's already been processed, and extracts action items. My tasks go to Todoist with proper projects and due dates. Commitments other people made get tracked in per-person markdown files, one per team member, so I can see at a glance what anyone owes or is owed. If I asked someone to do something three weeks ago and there's no update, the system knows because it's in their file.

Everything from both sides, the prep and the follow-through, feeds back into memory. So the next time I meet with that person, the cycle starts again with a richer picture.

Task and priority management

A great chief of staff doesn't just keep a to-do list. They maintain the full picture of what's in flight and apply judgment about what actually matters today versus what can wait.

Stella keeps the comprehensive task picture in structured markdown with all the context and history. That's the source of truth. The important near-term items get synced to Todoist (again, any Task management with an API), which is where I can see my near-term tasks at a glance. I think this split is important for understanding how agents should work alongside dedicated tools. The chat interface is great for certain modes, but a focused app with a focused view is still really valuable for getting through a task list. Stella keeps both in sync, and the combination is better than either one alone.

Every evening she runs a task sweep. What's due, what's overdue, what's been sitting too long, what's coming this week. If something has rolled forward five days in a row, she flags the pattern. If there's a high-stakes meeting Tuesday and the prep isn't done, she says so. If nothing needs attention, silence.

Stakeholder and relationship context

A great chief of staff carries a mental CRM. They know who matters, where things stand with each person, what was discussed last, and what's sensitive.

Stella does this at a scale no human could. She maintains persistent, structured context on every person, company, and project I'm working with. Relationship history, last touchpoint, open commitments, what they care about, where things stand. Every meeting processed, every email triaged, and every task tracked feeds back into this picture continuously. This is really just memory applied to relationships, but it's worth calling out because it's the piece that compounds the most visibly over time.

Information filtering

Stella runs email and calendar triage regularly across both my personal and work Gmail accounts, surfacing what needs action and what's worth knowing while dropping everything else. But filtering is only the beginning. She auto-pulls expense receipts and routes them to quarterly tracking, generates travel itineraries from booking confirmations and flags gaps, and drafts follow-up emails as part of fundraising tasks. When a task says "follow up with [person] re: [topic]," she pulls the last thread, drafts the follow-up in my voice, and queues it for my review.

Operational rhythm

Stella sends a morning brief at 9am and an evening wrap at 6pm, both via WhatsApp. The morning brief covers top priorities, overdue tasks, today's calendar, and anything that needs attention before I open my laptop. The evening wrap covers what happened, what stalled, and what to prep for tomorrow. If there's nothing to say, I hear nothing. This is the piece that makes it feel like working with a chief of staff rather than using a tool. She structures my day without me asking.

Research and intelligence

A weekly curated digest of the X accounts I track and newsletters I subscribe to, organized into tiers (AI researchers, VCs, founders, operators), scored by engagement and filtered for relevance. It turns 45 minutes of scrolling into a few minutes of high value reading.

Judgement vs predictability

One design rule that makes everything above reliable: LLMs handle judgment, and scripts handle everything else. Anything deterministic like reading files, calling APIs, sending messages, or comparing timestamps lives in Python. The LLM layer handles synthesis, prioritization, drafting, and anything where the answer requires reasoning rather than computation. When you push deterministic work through an LLM, things break in unpredictable ways and you stop trusting the system. Once you get the layer separation right, it becomes something you actually depend on.

Where this is going

I didn't get the best assistant I've ever had by asking better questions. I got it by giving the system a better operating model. I couldn't go back, and I can only imagine where this will be in a year.

If you want to build something like this, or if you've built something similar, I want to hear about it. If there's enough interest I'll package it up and share.