[
  {
    "id": "agent-harness-is-the-new-ide",
    "data": {
      "title": "The agent harness is the new IDE",
      "created": "2026-05-10T00:00:00.000Z",
      "updated": "2026-05-10T00:00:00.000Z",
      "tags": [
        "agents",
        "tools",
        "development"
      ],
      "claims": [],
      "inputs": [],
      "observations": []
    },
    "body": "The agent harness is the new IDE.\n\nThat sounds grandiose until I look at my own behavior. VS Code used to be the place where software work happened: the project tree, the terminal, search, type errors, tests, notes, diffs, habits. Now those surfaces still exist, but they have slipped behind the agent. The desktop apps for Codex and Claude Code have become the primary environment, and the editor has become an implementation detail.\n\nThe important shift is not that an agent can write code. It is that the harness now holds the working context: the repository, the instructions, the shell, the browser, the review loop, the memory of what has been tried, and the ability to move from intention to patch to validation without forcing me to keep every intermediate state in my head. The old IDE optimized the human's direct manipulation of files. The new one optimizes delegation, inspection, and correction.\n\nThis changes the shape of craft. I still care about the same things: readable code, tight feedback loops, tests that mean something, source-of-truth boundaries, reversible decisions. But I increasingly express that craft as constraints and reviews around an agentic runtime rather than as keystrokes inside an editor. The skill is moving up a layer.\n\nThere is a strange intimacy to this. VS Code was where I lived with the code. Codex and Claude Code are where I now live with the work."
  },
  {
    "id": "agent-harnesses-are-state-machines",
    "data": {
      "title": "Agent harnesses are state machines",
      "created": "2026-05-21T00:00:00.000Z",
      "updated": "2026-05-21T00:00:00.000Z",
      "tags": [
        "agents",
        "harnesses",
        "state-machines",
        "workflows"
      ],
      "claims": [],
      "inputs": [],
      "observations": [
        "observations/frontier-models-follow-complex-instructions"
      ]
    },
    "body": "I increasingly see a properly configured agent harness as a state machine.\n\nThe states do not have to be encoded as a formal graph. They can live in prompts, skills, lifecycle hooks, quality gates, and tool permissions. But the shape is the same: the agent begins in one condition, inspects the world, decides which transition is legal, performs the next action, validates the result, and either advances, retries, escalates, or stops.\n\nI noticed this most clearly while setting up a harness for an internal prototypes repository, where the agent is responsible for effectively doing all the work around the prototype, not merely writing code inside an already-prepared environment. It checks whether the environment exists. It makes sure the user has GitHub available and is logged in. It creates and configures the app correctly. It runs the relevant setup, commits the result, pushes the branch, and leaves the team with something that can actually be inspected or continued.\n\nThat sounds like convenience until you ask Codex to draw the state flow. Then the hidden machine becomes visible. The diagram was longer and more complex than I expected: a real workflow with preflight checks, branches, retries, missing-prerequisite paths, validation gates, and handoff points. What I had experienced as \"the agent just handles the setup\" was, underneath, a fairly elaborate operating procedure.\n\nThe important part is that this procedure was not trapped inside some brittle orchestration service. It was expressed in the harness. The repository carried its own working protocol: what the agent should assume, what it must verify, which tools it may use, when it should stop, and what counts as done. The harness turned a messy human setup ritual into a reusable machine.\n\nThis is why harness quality matters so much. A weak harness gives the model a vague destination and hopes capability fills in the path. A strong harness defines the path as a sequence of recoverable states. The frontier model still supplies judgment, language, code, and local adaptation, but it is not wandering through an empty field. It is moving through a designed workflow.\n\nAs of this writing, models like GPT-5.5 and Claude Opus 4.7 are good enough at detailed instruction following that these natural-language state machines can be surprisingly deep. Skills make this even more interesting, because they let the top-level harness stay readable while moving domain-specific procedure into nested, task-specific instruction sets. The result is not just a smarter agent. It is an agent whose operational behavior can be shaped, versioned, reviewed, and improved like infrastructure.\n\nThe team benefit is immediate. People can spin up prototypes without caring about most of the setup, and without every contributor having to rediscover the local ceremony. The harness becomes the place where the ceremony is captured once, made explicit, and handed to the agent."
  },
  {
    "id": "agent-harnesses-will-move-onto-the-shelf",
    "data": {
      "title": "Agent harnesses will move onto the shelf",
      "created": "2026-05-14T00:00:00.000Z",
      "updated": "2026-05-14T00:00:00.000Z",
      "tags": [
        "agents",
        "harnesses",
        "marketplaces"
      ],
      "claims": [],
      "inputs": [
        "anthropic-claude-managed-agents",
        "openai-workspace-agents-chatgpt"
      ],
      "observations": []
    },
    "body": "The next useful abstraction for agents is not another chat box. It is the shelf.\n\nToday, a serious agent is still something a team has to assemble: model, prompt, tools, MCP servers, runtime, permissions, secrets, memory, file system, cloud environment, evaluation loop, lifecycle rules, and a surface where the agent can be asked to work. That is a lot of scaffolding before the first useful task begins. It is also repetitive scaffolding. A \"CMS and frontend agent using Next.js and PayloadCMS on DigitalOcean\" should not need to be reinvented by every team that wants a site changed, migrated, audited, or extended.\n\nThe obvious next move is to rent the harness, not merely call the model. A team asks for a job to be done, and the user's main agent, perhaps Claude or ChatGPT sitting in Slack, acts as broker. It inspects the request, the repository, the permissions, the budget, and the desired outcome, then selects a specialized agent with the right harness already attached. The billing unit might be tokens, execution time, completed subtasks, or some blended meter; the more important shift is that the operational package becomes portable.\n\nThis is what the templatization of agent harnesses enables. Once an agent can be described as a stable bundle of instructions, skills, connectors, environment, policy, and quality gates, it can be versioned, compared, rented, improved, and swapped. The buyer does not want \"access to a model.\" They want a known kind of work performed inside a known kind of runtime.\n\nAnthropic's Managed Agents and OpenAI's Workspace Agents do not yet amount to a public bazaar of rentable specialists. But they make the shape visible. The harness is becoming a product boundary. Once that boundary hardens, agents stop looking like bespoke internal automations and start looking like labor units that can be put on a shelf."
  },
  {
    "id": "agent-surfaces-are-feedback-loop-shaped",
    "data": {
      "title": "Agent surfaces are feedback-loop shaped",
      "created": "2026-05-14T00:00:00.000Z",
      "updated": "2026-05-14T00:00:00.000Z",
      "tags": [
        "agents",
        "development",
        "workflows"
      ],
      "claims": [],
      "inputs": [],
      "observations": []
    },
    "body": "Agent surfaces are feedback-loop shaped.\n\nTagging Claude or Codex in a communication channel like Slack makes the most sense when the loop is almost embarrassingly short: I send a request, the agent replies with a confirmation, and the work either is done or is clearly queued somewhere else. The channel is useful because the interaction is lightweight, shared, and close to the place where the request naturally appears.\n\nAs soon as the work needs more than one or two turns of correction, the center of gravity moves. A desktop app or CLI is still the better interface for iterative agent work because it gives the loop somewhere to live: context, files, diffs, logs, tests, failures, retries, and all the small course corrections that make the work real. Local development with agents enables the most sophisticated feedback loops because the agent is sitting inside the same environment as the artifact it is changing.\n\nCloud agents sit between those poles. They are easier to manage across parallel work streams, which matters when several pieces can move independently, but they are slower and more distant because each stream has to spin up inside a virtual machine before it can become useful. That trade is often worth it for parallelism, but it is not the same thing as immediacy.\n\nThe right surface is not \"chat versus IDE versus cloud.\" It is the shortest loop that can still hold the work."
  },
  {
    "id": "agents-enable-radical-simplicity",
    "data": {
      "title": "Agents enable radical simplicity",
      "created": "2026-05-10T00:00:00.000Z",
      "updated": "2026-05-10T00:00:00.000Z",
      "tags": [
        "agents",
        "simplicity",
        "prompts",
        "skills"
      ],
      "claims": [
        "agent-harness-absorbs-complexity",
        "decision-logic-as-natural-language",
        "prompt-is-spec-is-implementation",
        "radical-simplicity-fewer-parts-per-capability",
        "agent-runtime-substitutes-for-platform",
        "cost-shift-from-infra-to-writing-favorable",
        "agent-systems-collapse-surface-area"
      ],
      "inputs": [],
      "observations": []
    },
    "body": "Agents enable radical simplicity.\n\nFor an internal prototyping setup, I asked Codex to draw a flow diagram of an existing system already in use inside Wild. The diagram came back, and the thing that struck me was not the picture itself but what it revealed: the flows and decision trees inside that project are dense, branching, conditional. Real software-shaped complexity. None of it lives in a service mesh or an orchestration engine. It lives as prompts and skills inside that project's agent harness.\n\nThe harness absorbs the complexity that would otherwise need its own scaffolding. Decision logic that used to require a state machine, a config file, a small DSL, or a feature-flag service is now a few paragraphs of natural-language instruction sitting next to the code it acts on. The branching is real, but the surface area collapses. There is no extra system to deploy, monitor, version, or onboard people to. The prompt is the spec is the implementation.\n\nThis is what I mean by radical simplicity: not \"fewer features,\" but fewer moving parts per unit of capability. The agent is the runtime that lets a small set of well-written instructions stand in for what used to demand a small platform. The cost moves from infrastructure to writing well, which is a trade I will take every time."
  },
  {
    "id": "agents-make-forensic-journalism-more-ordinary",
    "data": {
      "title": "Agents lift data journalism up the stack",
      "created": "2026-05-26T00:00:00.000Z",
      "updated": "2026-05-26T00:00:00.000Z",
      "tags": [
        "agents",
        "journalism",
        "data-analysis",
        "deep-research"
      ],
      "claims": [],
      "inputs": [],
      "observations": []
    },
    "body": "Agents lift data journalism up the stack.\n\nThe old frontier version of this craft is Christo Grozev and Bellingcat: journalism as adversarial data archaeology. In the Navalny investigation, Bellingcat moved through phone metadata, passenger manifests, leaked databases, social traces, vehicle registrations, geolocation, and pattern matching until a hidden state operation became legible. Their own [methodology write-up](https://www.bellingcat.com/resources/2020/12/14/navalny-fsb-methodology/) reads less like a conventional article than a forensic notebook: one dataset suggests a lead, another corroborates it, a suspicious travel pattern becomes a name, a name becomes a phone number, a phone number becomes a workplace, and eventually the outline of an assassination team appears. Meduza's [interview with Grozev](https://meduza.io/en/feature/2020/12/18/it-s-always-a-choice) makes the technical substrate explicit: Bellingcat used MySQL to filter call and data-usage records before manually mapping work-hour locations, then cross-checked records from multiple sources to guard against poisoned data.\n\nI do not know enough about newsroom staffing to make a confident claim about who used to do what, or how often journalists depended on software engineers, data teams, or self-taught technical fluency. That is not the point I can see clearly. The clearer point is the correlation between technical leverage and journalistic possibility. When an investigation depends on finding patterns in messy records, reconciling inconsistent files, querying metadata, reading hundreds or thousands of documents, or testing whether a hunch survives contact with the data, the available tooling changes the questions a journalist can practically ask.\n\nThis is where frontier-model agents feel structurally important. They do not make journalistic judgment obsolete, and they do not make evidence less demanding. They change the level at which a journalist can work. Instead of spending attention on the incidental mechanics of the toolchain, a journalist can stay closer to the requirement: What would count as evidence? Which records need to agree? Where could the data be poisoned? What alternative explanations would weaken the story? What is missing? What would make this claim unfair?\n\nThe agent becomes a technical collaborator inside that loop. It can help sketch a schema, clean a CSV, write a query, compare documents, cluster records, build a scraper, normalize dates, inspect PDFs, create a map, produce a timeline, test an anomaly, explain a statistical caveat, or write the small script needed to answer the next question. None of those acts is \"the journalism\" by itself. They are the scaffolding around the judgment. But lowering the cost of that scaffolding matters enormously, because more of the human attention can move up the stack.\n\nThis is already visible in modest form. ProPublica [described using an LLM](https://www.propublica.org/article/using-ai-responsibly-for-reporting) to examine more than 3,400 National Science Foundation grant descriptions from Senator Ted Cruz's \"woke\" grants database, with reporters reviewing and confirming every detail before publication. The AI did not publish the story. It helped reporters generate leads and see patterns in a pile of text large enough to resist casual reading. OpenAI's [deep research](https://help.openai.com/en/articles/10500283-research-faq) points in the same direction for source synthesis: multi-step web research, citations, and analyst-style reporting over large bodies of material. [ChatGPT agent](https://help.openai.com/en/articles/11752874-chatgpt-agent) goes further by combining research with action: browser use, terminal work, code execution, API access, spreadsheets, and iterative collaboration.\n\nThe interesting opportunity is not \"AI writes the article.\" That is the least interesting and most dangerous version of the story. The interesting opportunity is that agentic engineering gives data-driven journalism a more fluid working surface. A journalist can begin with a question in ordinary language and move, with the agent, through research, files, code, queries, spreadsheets, and verification without constantly dropping down into the machinery. The work still has to become specific. It still has to produce receipts. But the path from question to test becomes shorter.\n\nDeep research is the reading room. Agentic engineering is the room with power tools.\n\nThe discipline is to keep the agent in the role of technical collaborator. It can help operate on the journalist's data, but the claims still need provenance, reproduction, and editorial judgment.\n\nBut that is a better bottleneck. The scarce skill shifts away from operating every piece of machinery by hand and toward asking sharper questions, designing better checks, weighing evidence, and understanding the public interest. Grozev's work showed what becomes possible when journalistic instinct meets forensic data skill. Agents do not replace that combination. They make it easier for more work to happen at that altitude."
  },
  {
    "id": "ai-code-output-is-not-software-progress",
    "data": {
      "title": "The harness makes agentic code hold up",
      "created": "2026-05-23T00:00:00.000Z",
      "updated": "2026-05-23T00:00:00.000Z",
      "tags": [
        "agents",
        "harness",
        "software",
        "management",
        "startups"
      ],
      "claims": [],
      "inputs": [],
      "observations": []
    },
    "body": "I think the hard part of agentic software engineering is not getting an agent to produce code. That part is becoming cheap. The hard part is getting an agent to produce changes that still belong to the same codebase after the fifth, tenth, or fiftieth patch.\n\nThe naive workflow is almost designed to fail at that. Write a prompt, let the agent generate a large diff, run one review, maybe skim the tests, then merge if the output looks plausible. This can feel productive because the visible artifact is large. It can also be a near guarantee of many results that are technically impressive and structurally bad.\n\nThe problem is that software work is not a factory problem where every additional unit is automatically evidence of health. A codebase is a living constraint system. Every new file, abstraction, dependency, generated helper, and half-understood integration enters that system and starts charging rent. If the process does not preserve the model of the system, agentic coding can create complexity faster than the team can understand it.\n\nThis is where I think the agent harness becomes the key factor. The harness is not just a nicer prompt wrapper. It is the operating environment that raises the floor of the work: the standing instructions, the local skills, the repository-specific documentation, the quality gates, the review habits, the shell, the browser, the issue context, the conventions, and the accumulated memory of how this codebase wants to be changed.\n\nA good harness lets agents create a lot of change while keeping the codebase cohesive. It gives the agent more than a task. It gives it a way to behave inside the project. It tells the agent where truth lives, which patterns are local rather than generic, which files are generated, which commands matter, which abstractions are already trusted, and which kinds of cleverness are unwelcome.\n\nWithout that environment, the agent is mostly acting from the prompt and its general model of software. That can be enough for isolated tasks. It is not enough for sustained engineering. Sustained engineering requires repeated contact with the local reality of the system. The agent has to learn, or be continuously reminded, that this repository has its own source-of-truth boundaries, naming habits, testing expectations, architectural decisions, and failure modes.\n\nI believe this is where small companies and startups are more exposed, in both directions. If they use a simplistic agent workflow, the over-complexity, drift, and hidden regressions show up quickly. The team is small, the codebase is changing fast, and there are fewer buffers between bad engineering behavior and customer-visible pain. A startup that uses agents to generate impressive piles of code nobody can reason about may not have enough time to discover the problem slowly.\n\nThe same exposure can also make them better. If agentic software engineering has to work for the company to survive, the team has a strong reason to discover the process that actually works. It has to learn when to ask for exploration, when to ask for a small patch, when to split work across agents, when to run tests, when to stop and re-plan, when to delete, and when to reject a plausible implementation because it does not fit the system.\n\nLarge enterprises often have a different feedback loop. I do not think the issue is that their engineers are worse. The issue is that inefficient agentic workflows can survive longer inside a successful organization. A team can count generated output as progress. A division can reward visible throughput. A platform group can normalize agent-produced complexity. Because the business is still carried by an existing product, platform, distribution advantage, or customer relationship, the cost may propagate slowly through architecture, review culture, dependency graphs, and staffing plans before it becomes impossible to ignore.\n\nThe risk, then, is not simply \"AI writes bad code.\" The risk is that an organization adopts a thin process for agentic work and then measures the output of that thin process as success. More pull requests, more generated files, more closed tickets, more prototypes: all of that can look like acceleration from a managerial distance. But if the harness is weak, the output may be weakening the codebase faster than it is improving the product.\n\nGood agentic engineering seems to require almost the opposite instinct. The leverage is not that an agent can write a large amount of code. The leverage is that a disciplined process can use agents to explore, compress, delete, test, document, and validate with less friction. The question is not \"how much did the agent produce?\" The question is \"what burden did the system avoid, remove, or make clearer?\"\n\nIn my experience, the harness is what makes that bar realistic. A strong prompt matters, but the prompt alone is not the process. Skills matter because they encode repeatable local procedures. Deep integration of local documentation matters because it gives the agent the context that a senior engineer would otherwise carry in memory. Quality gates matter because they turn taste and correctness into recurring pressure rather than occasional hope.\n\nThe companies that use agents well may not be the ones that generate the most code. They may be the ones that build the best environment around generation: enough instruction to constrain the work, enough documentation to ground it, enough skills to make good behavior repeatable, enough tests to catch drift, and enough review discipline to reject plausible nonsense.\n\nStartups are pushed toward that discipline by the possibility of vanishing. Enterprises, when they lack that immediate threat, may have to choose the discipline deliberately and earlier than their normal feedback loops require. In both cases, I think the real question is not whether the company is using AI to write code. It is whether the agentic process is strong enough for the code to hold up."
  },
  {
    "id": "app-stores-will-evolve-into-agent-markets",
    "data": {
      "title": "App stores will evolve into agent markets",
      "created": "2026-05-14T00:00:00.000Z",
      "updated": "2026-05-14T00:00:00.000Z",
      "tags": [
        "agents",
        "app-stores",
        "marketplaces"
      ],
      "claims": [],
      "inputs": [
        "anthropic-claude-managed-agents",
        "openai-workspace-agents-chatgpt"
      ],
      "observations": []
    },
    "body": "The app store was the distribution primitive for software you owned, installed, and opened. The agent market will be the distribution primitive for work you delegate.\n\nThat sounds like a small naming change, but it changes what is being sold. The old store sold an interface and a bundle of capabilities. The new one sells a configured worker: tools, permissions, runtime, taste, process knowledge, escalation behavior, and a billing model wrapped around execution. You do not download the \"marketing calendar app.\" You rent the \"marketing operations agent\" for a launch week and let your primary assistant broker the handoff.\n\nThe closest ancestor is not only the mobile app store. It is also the plugin marketplace, the SaaS integration catalog, the cloud marketplace, the automation template gallery, and the consulting bench. Agent markets compress those categories into a single object: a reusable work pattern with enough autonomy to be worth governing and enough specificity to be worth discovering.\n\nThis is why the next agent distribution layer will care less about screenshots and more about trust. What data can this agent see? What tools can it call? Who authored the harness? Which outcomes has it completed? What does it cost when it runs for six hours? Can my organization pin a version, audit its actions, and revoke it without breaking everything else?\n\nThere is no clean public agent app store yet from OpenAI or Anthropic. Still, Workspace Agents already point toward a team directory and templates; Managed Agents point toward packaged harnesses and environments. Put those motions together and the destination is hard to miss: app stores stop being catalogs of software surfaces and become catalogs of delegated execution."
  },
  {
    "id": "design-will-be-seed-driven",
    "data": {
      "title": "Design will be seed driven",
      "created": "2026-05-14T00:00:00.000Z",
      "updated": "2026-05-14T00:00:00.000Z",
      "tags": [
        "design",
        "ai",
        "design-systems",
        "seeds",
        "generative-design"
      ],
      "claims": [],
      "inputs": [],
      "observations": []
    },
    "body": "Design in the near future will be seed driven.\n\nInstead of defining every component, token, interaction, motion curve, and writing guideline manually, designers using AI will create a set of seed semantics: a detailed description of the design system and brand, with key definitions precise enough to generate from. That manifest becomes the seed. From it, the system creates tokens, components, motion, behavior, language, accessibility rules, and other guidelines.\n\nThe seed then acts as the reference point for every future generation of that seed version. The output becomes exponential because each new screen, component, variant, prototype, deck, or product surface does not start from a blank prompt. It starts from a structured brand definition that already knows what \"on brand\" means.\n\nThis preserves control over the core brand aesthetic and its deeper definition. The better the seed, the more aligned the generated work becomes. In that world, the quality of design process depends less on manually policing every artifact and more on improving the seed until generation becomes reliable.\n\nSeeds also imply versioning. Outputs need generation metadata attached to them: which seed version produced this, with which constraints, model, prompt, and transformation history. Without that lineage, generated design becomes impossible to audit, compare, or reproduce.\n\nThis is a strong contrast to tools like Figma and other common design environments, which remain literal about design artifacts. They are excellent at arranging objects, but they do not treat design as a structured semantic data model first. Seed-driven design starts from the opposite assumption: the foundation is not the canvas, but the generative definition behind it."
  },
  {
    "id": "expensive-ai-code-review-is-not-insane",
    "data": {
      "title": "Expensive AI code review is not insane",
      "created": "2026-05-16T00:00:00.000Z",
      "updated": "2026-05-16T00:00:00.000Z",
      "tags": [
        "agents",
        "code-review",
        "pricing",
        "software"
      ],
      "claims": [],
      "inputs": [
        "anthropic-claude-code-review-pricing",
        "openai-codex-pricing"
      ],
      "observations": []
    },
    "body": "Anthropic estimates Claude Code Review at roughly $15 to $25 per PR. At first glance, that sounds insane.\n\nThe comparison that makes it feel insane is Codex. OpenAI's Codex Pro tier starts at $100 per month, and for many day-to-day coding workflows the usage envelope can feel close to unlimited, especially next to a review product that might consume a meaningful fraction of that subscription on a single pull request. The instinctive reaction is: how can one review cost that much?\n\nBut I think the reason is simple. A real review by a senior engineer, the kind of review that is actually worth its salt, is not a glance at the diff. It is ten to thirty minutes of attention, sometimes more: reading the change, reconstructing intent, checking edge cases, asking whether the local fix violates a distant invariant, and deciding which comments are worth spending another human's time on. The expensive part is not generating text. The expensive part is sustained judgment over a live codebase.\n\nSeen that way, $15 to $25 is not priced against tokens. It is priced against senior engineering attention. And against that benchmark, it is still cheaper than a human review, while being fully automated, non-blocking, and available in the background whenever the organization chooses to spend it.\n\nThat does not make it a casual developer feature. This is big-enterprise economics: high-trust automation attached to expensive teams, expensive codebases, and expensive mistakes. But it also means the price is less outlandish than it first appears. If the review is genuinely good, the surprising part is not that it costs real money. The surprising part is that we briefly expected serious code review to be priced like autocomplete."
  },
  {
    "id": "the-work-should-not-stop-when-i-sleep",
    "data": {
      "title": "It is starting to feel strange when the work stops as I sleep",
      "created": "2026-05-17T00:00:00.000Z",
      "updated": "2026-05-17T00:00:00.000Z",
      "tags": [
        "agents",
        "autonomy",
        "harnesses"
      ],
      "claims": [],
      "inputs": [],
      "observations": []
    },
    "body": "It is starting to feel strange when the work stops as I sleep.\n\nThere is an old line about wealth that says you are not really rich unless your money earns while you are asleep. I dislike most of the atmosphere around that sentence: the status anxiety, the financialized self-help, the faint smell of someone trying to sell you a course. But after stripping all that away, there is a useful shape underneath it. Certain systems start to feel different once they can keep doing useful things after your attention has left the room.\n\nAgents are beginning to rhyme with that feeling. If every useful action requires me to sit there, approve the next step, paste the next command, watch the next test run, and decide the next branch, then the agent still feels close to an interface: powerful, useful, often impressive, but tethered to my continuous presence. The stranger feeling begins when the agent can accept a bounded job, carry context forward, make reversible progress, validate its own work, and come back with evidence.\n\nSleep is a good test because it removes the comforting fiction of continuous supervision. While I am awake, even a fragile agent can feel autonomous because I am quietly holding the whole system together: noticing when it drifts, supplying missing context, approving tool calls, restarting failed processes, interpreting ambiguous errors. Overnight work exposes whether the harness actually contains the job. Does the agent have the repository, credentials, budget, permissions, tests, memory, stopping rules, escalation path, and review surface it needs? Or does it only look capable while a human is nearby to catch every dropped thread?\n\nThe point is not that I should be working while asleep. It is almost the opposite. The point is to separate progress from vigilance. A good agentic setup should let me define the shape of acceptable work, go away, and return to something inspectable: a branch, a failing test with diagnosis, a draft, a queue of proposals, a narrowed question with the relevant context already gathered. Morning should not contain magic. It should contain receipts.\n\nAttended agents are already a core part of how I work, and I do not want to diminish that mode. Sitting with an agent, steering it, correcting it, and using it as an extension of attention is genuinely useful. But there is another mode beginning to come into view. We already accept background competence from CI, cron, monitors, queues, and deployment systems. Agents seem likely to join that family, but with more judgment and therefore a much stricter need for harness. An agent without overnight competence can still be excellent. It just belongs to a different rhythm of work.\n\nI do not think this is a definition yet. It is more of a taste forming in real time: once an agent can plausibly keep working while I sleep, it becomes harder not to expect that from the category."
  }
]