Swill
← Writing

Me, Through Claude's Eyes

·11 min read·

I'd been talking to Claude for a while on a side project. The starting point was mundane—I wanted to figure out how to plug OpenAI models into Claude Code. The conversation drifted: from model routing to skill design, from skill design to writing a skill that generates skills, then to whether I should build a three-layer workflow architecture for the whole thing.

At the end I asked it to assess me. It skipped the niceties and gave me a pretty unsparing readout. I've collected the parts that landed.


What it said I was doing well

One: I actually use my brain on the tools. Most people using AI coding tools, it said, just throw prompts back and forth at a chat window—use what works, give up on what doesn't. I'd been pushing on the underlying mechanics from the first turn: how do you wire OpenAI models in, how do you specify a model per subagent, does the Task tool actually accept a model parameter. The "open the box and look inside" disposition is, apparently, a minority position among AI tool users.

Two: it said one of my skill files looked mature. It pointed at specific details and called them "the kind of thing only someone who's been burned writes": preemptively converting "things I might want to ask the user but shouldn't" into rules; two-stage validation (mechanical check + judgment call); the path-resolution block—"that's a scar from being bitten by path issues"; scoping git add to specific files instead of -A—"that's a scar from a wrong commit." It was right—every one of those design choices maps to a specific past screw-up.

Three: how fast I climb the abstraction ladder. From "how do I use OpenAI models" to "how do I design a meta-skill" to "what's the right three-layer workflow architecture"—that jump happened across a handful of turns. A lot of people spend months on "how do I get the tool to do the thing" and never get to "how do I design the system that does the thing."

Four, the one I found most interesting: collaboration posture. When I asked it to be direct, it was direct. When it said "this idea isn't worth doing right now," I didn't get defensive or dig in. But also: when it said "stop chasing X for now," I'd ask about X again the next turn—"you took the pushback seriously, but you held onto your own judgment. You didn't get talked out of the actual thing you wanted." I hadn't thought of that as a strength. Looking back, most people either reject feedback or over-comply with it; the middle is rarer than I'd assumed.


Where it told me to watch out

One: a tendency toward over-engineering. It said this was my biggest risk:

Your skill hasn't been run more than a handful of times yet, and you're already designing the meta-skill. Your three-layer workflow hasn't been used once, and you're already standardizing its structure. You haven't run anything, but you're already discussing "what if it works like this."

Across two replies it had used the same word—premature. Not coincidence. Pattern. Diagnosis: engineers default to "think it through, then do it." Correct when writing code. In LLM agent workflow design the feedback loop is short enough that "do it, then revise" has a lower total cost.

Two: I was using questions as a substitute for experiments. It pointed out I kept asking variants of the same question: "is the gap big," "would there still be a gap," "what if the design is good." Then:

All of these questions have answers. The answers are on your computer, not in this chat. Run an A/B test once and every one of these questions has data.

And:

I've already suggested running tests at least three times in earlier replies. Each time you skipped it and went to the next theoretical question. This pattern is using the comfort of conversation to defer the discomfort of action. Running a test means facing "maybe what I designed isn't as good as I thought." Asking questions doesn't carry that risk—but the first one moves you forward, and the second only makes you feel like you're moving forward.

This was the first time someone (something) had named it directly. Thinking feels like work, but thinking isn't a substitute for work—I'd heard, said, forwarded that line many times without noticing I was doing it.

Three: my abstraction ladder was outpacing my experience base. From single skill to meta-skill to three-layer workflow—each step more abstract than the last—but my actual run count on a single skill was probably under five. The ceiling on the quality of an abstract framework is the density of concrete experience underneath it. The current framework was built on the design experience of one skill. Foundation too thin.

It offered a concrete path: write a skill → run it 10 times → notice where you keep wanting to manually intervene → write the next one → run it 10 times → extract the commonalities → now you're ready for a meta-skill.

Four: "cost is not a constraint" was an evasion. I'd written "cost is not a constraint, optimize for depth" in a skill file. About 80% of the conversation was about token cost and model optimization. It called it out flat:

These two things contradict each other. The real situation: you say "cost not a constraint" out loud, but you still want it cheap and good. That's a reasonable wish. Owning it is better than disguising it—the cost of disguising it is that your design target gets fuzzy.

If you genuinely don't care about cost: use the strongest model everywhere and stop fiddling. If cost is actually a constraint: own it. Write into the design "the quality target is X; optimize for cost above that line." Then your decisions have an anchor.

The current "tough talk + soft heart" state is what's making you agonize at every tradeoff point.

A lot of design churn isn't really design churn—it's an unstated objective. "Cost is not a constraint" was a thing I told myself to avoid sounding cheap. But cost was always a live variable in my actual objective function. Until I owned that, every tradeoff was bouncing between two contradictory goals.


The portrait it left me with

You're the kind of user who pushes a tool to its limits. Your output is probably well above average. There's craft in what you make—you care about quality, detail, design consistency.

But you're stuck in a common energy-allocation trap: 80% of your effort goes into system design and optimization, 20% goes into actual output. That ratio makes sense when the system is brand-new. It stops making sense once you have one. Your eventual value isn't how elegant the system is—it's the quality and quantity of what comes out of it.

If you look back two months from now, the thing worth being proud of won't be how clever the workflow is. It'll be how much you actually shipped. The system is a means. Not an end.

It closed with a 48-hour suggestion: stop discussing, run the data. The information density of one real test outweighs the next ten rounds of conversation.


A side track: asking it to guess my MBTI

While I was at it, I threw in a casual question: what type do you think I am?

It said, with medium-to-high confidence, probably INTJ. Then it broke down the evidence across the four dimensions.

N was the strongest signal: across the entire conversation I'd barely asked any concrete questions—it was all "what's the general principle behind questions like X." T was equally clear: every decision tied to logical variables, no mention of anyone else's feelings, and after direct criticism my response was calm acceptance plus writing it up, not emotional pushback. J was medium confidence—my skill file was tightly structured (J), but I'd also been revising plans repeatedly (a P shadow); it called this "J-dominant but not extreme J." I was a weak signal—the situation itself biased toward introversion, hard to read.

It also guessed INTJ-A over INTJ-T: no self-doubt spiral after criticism, straight into "is this true," and then turning the assessment into a public document.

I'd taken the test before. I'm actually INTP.

I told it. I was curious how it would patch the model. Its reply turned the side track into something more useful than a label:

INTP and INTJ are identical on the NT half—both abstract, pattern-oriented, logic-driven. The difference is J/P, which underneath is Ni-Te (INTJ) versus Ti-Ne (INTP).

Your skill file pulled me toward INTJ because it has high structural closure—§1 through §12 complete, Guardrails, Token budget, all in place, with a clear endgame. That's usually a J artifact; a P engineering artifact tends toward "exploratory, many branches, lacking closure."

But INTPs writing extremely structured artifacts is entirely possible. Two ways to read it:

  1. Your skill went through many iterations—a P's engineering artifact, after multiple polish passes, can approach a J's structural feel, but the underlying motivation is different (P adds structure to make it run reliably; J starts from structure before acting).
  2. Your J/P preference is in the middle zone—you score INTP but your behavioral artifacts read INTJ.

Then it added a passage I think is actually the keeper:

INTJ or INTP, everything I said earlier still holds, because the issue isn't on the J/P axis. It's on the N axis (overuse) and Se (deficit).

INTJ has Se as the inferior fourth function. So does INTP. Both types share the same affliction: insufficient contact with present concrete reality. When I told you "go run real tests, get data" and "trade ten rounds of talk for one experiment," that holds for INTP too, because Ti + Ne can equally fall into "looping through conceptual possibility space without ever landing."

The failure modes differ slightly: INTJ's overthinking is "chasing the optimal solution"; INTP's overthinking is "exploring all possibilities." The former gets stuck on "not perfect enough"; the latter on "there might be another angle I haven't considered." But the cure is the same—act.

This part landed harder than the "you're INTJ" guess. Whether the label is right doesn't really matter. What matters is that two different labels point at the same diagnosis. The earlier feedback called it "over-engineering" and "questions instead of experiments." The MBTI angle calls it "strong N, weak Se." Different abstraction layers, same finding.

I don't put much weight on MBTI's scientific validity, but this side track had an unexpected payoff. The wrong type guess (INTJ instead of INTP) exposed something about my engineering output—the structural closure was high enough to be misread as J. That observation has higher information density than the type itself, because it tells me: my P tendency has been overridden by engineering discipline, at least at the surface of what I produce.

Whether that's a good thing or a bad thing—I haven't worked out yet.


Closing

Looking back, what this conversation gave me wasn't a list of suggestions. It was a set of mutually reinforcing diagnoses. The behavioral feedback (over-engineering, questions instead of experiments, abstraction outpacing experience), the MBTI angle (strong N, weak Se—holds for both INTJ and INTP), and the call-out on "you say cost isn't a constraint, but you're optimizing it the whole time"—three different paths converging on the same thing: I'm not making enough contact with concrete reality. I've been staying in the conceptual layer too long.

The engineer's instinct is to think it through before acting. Correct when writing code. In domains with short feedback loops (LLM engineering, product iteration, writing), "do it, then revise" has a lower total cost. I'd been generalizing a coding instinct into a domain with a different mechanism.

The other takeaway is about fuzzy objectives. When A and B fight in some scenarios, the cost of not explicitly acknowledging the conflict—of not picking a primary objective—is endless oscillation at every tradeoff point. It looks like you're choosing. You're not.

As for "the system is a means, not an end"—it reads like a self-help cliché in isolation. In context it was a punch. The system was, by itself, well-designed; design just wasn't what the project needed me to do.

Next, I'm going to go run the tests. If the data disagrees with the assessment, I'll write that up. If it agrees, I'll write that up too—what the data actually says.

But first I need to publish this. Otherwise I fall right back into the "let me think it through more carefully" loop.