# Episode 28: Claude Opus 4.8, Undocumented Claude Code Features, Eval Harness for AI Skills, Pope on AI

> Anthropic shipped Claude Opus 4.8 just 41 days after 4.7, and its new dynamic-workflow tool — high thinking plus a massive fan-out of coordinated parallel agents, or 'Gas Town, by Anthropic' — points at the default Shimin names in the cold open: trading compute for human labor, spending compute to pre-attack your own code from four angles so the human only does the final review. Rahul's out, so Shimin and Dan also cover Pope Leo XIV's AI encyclical 'Magnifica Humanitas' (models are grown not developed, AI isn't neutral, automation must protect workers — with Anthropic's Chris Olah in the room), the undocumented Claude Code features from a Building Better source-code read, a Pinterest harness for testing whether your AI skills actually fire (Codex 73%→95%, Claude 62%→73% and worse when combined), Jamie Hurst's 'Is This Sustainable?' on senior engineering and perishable AI depth, Owen McGrann's 'Dead Economy Theory,' and a Two Minutes to Midnight on SpaceX's $26.5T-TAM S-1 and Anthropic overtaking OpenAI on a $65B Series H — clock moves to 5:30.

Published: 2026-06-02
Source: https://adipod.ai/episodes/28-claude-opus-4-8-undocumented-claude-code-features-eval-harness-for-ai-skills-pope-on-ai/

---
Anthropic shipped Claude Opus 4.8 just 41 days after 4.7, and its new dynamic-workflow tool — high thinking plus a massive fan-out of coordinated parallel agents, which Dan TLDRs as "Gas Town, by Anthropic" — points at the default Shimin names in the cold open: trading compute for human labor, spending compute to pre-attack your own generated code from four angles so the human only does the final review. Rahul's out this week, so Shimin and Dan break down 4.8's mixed vibes (better than 4.7, but it hallucinated file names that don't exist and burned a full token budget in 25 minutes — likely a Mythos distill, not a new base model), Pope Leo XIV's AI encyclical "Magnifica Humanitas" (models are grown not developed, AI isn't neutral, automation must come with verifiable worker protection — with Anthropic's Chris Olah in the room), the undocumented Claude Code features from a Building Better source-code read (pre-tool-use hooks that rewrite a tool's input mid-flight, return allow/deny with a reason, and inject context; skill front-matter for model and effort; auto-memory and "dream"), a Pinterest harness for testing whether your AI skills actually fire (Codex 73%→95% combined, Claude 62%→73% on one change and worse combined), Jamie Hurst's "Is This Sustainable?" on senior engineering and perishable AI depth, Owen McGrann's "Dead Economy Theory," and a Two Minutes to Midnight on SpaceX's $26.5T-TAM S-1 ("truth seeking" ×39) and Anthropic overtaking OpenAI as the world's most valuable AI startup on a $65B Series H — clock moves to 5:30.

## Takeaways

- Claude Opus 4.8 landed just 41 days after 4.7 — fast enough that both hosts read it as a Mythos distill or post-training tweak rather than a new base model. Dan likes it better than 4.7 (some of 4.6's "flair" is back) but watched it hallucinate file names that don't exist — LLMisms he hadn't seen in about a year — and burn his entire personal-account token budget in 25 minutes on a medium project. The headline is the new dynamic-workflow tool: high-thinking mode plus a huge fan-out of coordinated parallel agents that Dan calls "Gas Town, by Anthropic" (the pattern Anthropic keeps gobbling from the rest of the internet into Claude Code). Anthropic pitched it for bug-bashing a whole codebase in one go, double-checking critical work, and the Bun→Rust rewrite; the open question is how it merges all the parallel agent output without drift. Shimin's thesis on top of it: the new default is to spend compute to attack your own freshly generated code from several perspectives, leaving the human only the final review — "trading compute for human labor."
- Pope Leo XIV's encyclical "Magnifica Humanitas" ("On Safeguarding the Human Person in the Time of Artificial Intelligence") is, per Shimin, a more sophisticated read on AI than most Fortune 500 CEOs manage. It understands that models are "grown, not developed," warns against AI presenting itself as neutral and objective while reinforcing its designers' biases, and insists every introduction of automation come with verifiable measures to protect employment and retraining — with the cost of adaptation not falling solely on individuals. He quotes Tolkien on doing "what is in us… so that those who live after may have clean earth to till," and picked the name Leo partly to echo Leo XIII's foundational encyclical on capital and labor. Chris Olah of Anthropic attended the presentation — Shimin's read on Anthropic's PR machine: "they've got the Pope on their side."
- The Building Better "I Read the Claude Code Source Code" teardown surfaces genuinely useful undocumented behavior. A pre-tool-use hook can rewrite a tool's input mid-flight (e.g., force every `git push` to `--dry-run`), return an allow/deny permission decision plus a reason that's actually shown in the UI, and inject additional context — without prompting the user. Skills accept undocumented front-matter including model and effort (Shimin: "this is a game changer"), and settings.json exposes auto-memory and an experimental "dream" toggle. The tangent worth catching: built-in memory is just JSON with no semantic search, which prompted Shimin to flag Hermes — the harness reportedly overtaking OpenClaw in SF — whose differentiator is a portable rag/vector store for user preferences (a "Dialectic API"), so you can carry your memory across models the way Pi Agent lets you swap the model mid-conversation.
- Pinterest Engineering's "An Engineer's Guide to Better AI Skills" is the rare piece of hard data on a problem the show usually hand-waves: how do you know a skill is any good — i.e., does it fire at the right time? Their harness is 15 positive prompts (skill should invoke) and 5 negative (it shouldn't), run 5× each with a log parser checking invocation. Results split by model: Codex climbed 73%→95% when all techniques (description tweaks, aggressive "you MUST load this" language, AGENTS.md updates) were combined, but Claude did best at 62%→73% from a single change and got *worse* (down to 69%) when everything was combined — aggressive language helps Codex and can drag Claude. A second "ask the AI to improve the skill" pass didn't help (Codex 95→93, Claude 69→66). Shimin's critiques: only two harnesses with one model each, and no false-positive vs. accuracy breakdown — but once you have the harness, you're in repeatable self-improvement land instead of vibes.
- Jamie Hurst's "Is This Sustainable?" reframes senior engineering in the AI age. Seniors have absorbed AI's rising stakes for years longer than juniors; one senior plus an LLM now drives the technical work a squad or two used to, with direction translated into prompts instead of humans. The internal-sales step — write the RFC/design doc, make slides, shop it around — collapses into "just build the damn thing" and let people play with it, and one-on-ones were the first thing Hurst dropped (which Shimin worries cuts communication "to the bone"). The line that landed: AI depth is *perishable* (maybe irrelevant in 18 months), so the durable skills are taste and judgment — which, as a colleague of Dan's put it, were always what you were hired for. Shimin's counterweight: code isn't actually free (Microsoft is pulling Claude Code licenses back to Copilot on cost), and maintenance plus figuring out *what* to build is still expensive — and now it's most of the job, which is its own burnout.
- Owen McGrann's "The Dead Economy Theory" is a turn-by-turn pushback on AI inevitability. Turn one: a company licenses AI to replace much of its workforce; margins explode, stock pops (à la Block). Turn two: the laid-off workers cut spending, so the businesses they patronized lose revenue and reach for AI cost-cutting too. Turn three: the firm that fired workers discovers its customers were, in aggregate, other companies' workers — revenue stalls, and the efficiency "investment" turns out to corrode its own market. The detour runs through Peter Thiel's 2009 line that freedom and democracy are no longer compatible, Silicon Valley's misread of Nietzsche's Übermensch, and UBI (Henry Ford's self-inflating consumer loop versus "who funds the state if the economy is dead?"). Shimin — half-infected by the inevitability virus — pushes back that wealth transfer / resource reallocation is a core job of the state, and that trust-fund babies and Bushwick artists seem dignified enough on unearned income.
- Two Minutes to Midnight: SpaceX's long-awaited S-1 claims a $26.5T total addressable market that's mostly "AI" (riffed off a total-addressable-digital-economy study), with $370B in space and $1.6T in Starlink connectivity — and says "truth seeking" 39 times. Meanwhile Anthropic overtook OpenAI as the world's most valuable AI startup on a $65B Series H (Altimeter, Dragoneer, Greenoaks, Sequoia) priced ~3× its February valuation — roughly $600–700B of value added in about four months — plus a confidentially filed S-1, even as Microsoft very publicly pulled Anthropic licenses back to Copilot on cost and Copilot moved to usage-based pricing. With a wave of IPOs about to force real financial disclosure, the hosts read it as a tipping point that won't produce real signal until post-IPO earnings, and walked the clock from 6:15 to **5:30**.

## Resources Mentioned

- [Anthropic releases Opus 4.8 with new dynamic workflow tool — TechCrunch](https://techcrunch.com/2026/05/28/anthropic-releases-opus-4-8-with-new-dynamic-workflow-tool/)
- [Magnifica Humanitas (encyclical on AI) — Pope Leo XIV / Vatican](https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html)
- [I Read the Claude Code Source Code — Building Better](https://buildingbetter.tech/p/i-read-the-claude-code-source-code)
- [An Engineer's Guide to Better AI Skills — Pinterest Engineering](https://medium.com/pinterest-engineering/an-engineers-guide-to-better-ai-skills-implementing-a-testing-process-to-optimize-agent-a000c9c9abcd)
- [Is This Sustainable? — Jamie Hurst](https://jamiehurst.co.uk/2026-05-24_ai-sustainable)
- [The Dead Economy Theory — Owen McGrann](https://www.owenmcgrann.com/p/the-dead-economy-theory)
- [SpaceX (Space Exploration Technologies) Form S-1 — SEC EDGAR](https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm)
- [Anthropic surpasses OpenAI to become world's most valuable AI startup — Qazinform](https://qazinform.com/news/anthropic-surpasses-openai-to-become-worlds-most-valuable-ai-startup)

## Chapters

- (00:00) - Cold Open & Welcome
- (02:09) - News: Claude Opus 4.8 & the Dynamic Workflow Tool
- (08:27) - News: Pope Leo XIV's AI Encyclical
- (14:17) - News / Tool Shed: I Read the Claude Code Source Code
- (22:10) - Technique Corner: Do Your AI Skills Actually Fire?
- (29:15) - Post Processing: Is This Sustainable?
- (35:30) - Post Processing: The Dead Economy Theory
- (42:12) - Two Minutes to Midnight: SpaceX's S-1
- (45:50) - Two Minutes to Midnight: Anthropic Overtakes OpenAI
- (53:35) - Outro


## Transcript

<details>
<summary>Show full transcript</summary>

Shimin (00:00)
I think every time you vibe code, you're just gonna use compute to save yourself the bottleneck of code review, right? Like you instead of having a single agent write a code, you're gonna probably use a dynamic workflow to attack the code it just generated in like four different perspectives, and then you do the final human in the loop verification. that's coming.

Dan (00:20)
Maybe.

Shimin (00:21)
Trading compute for human labor.

Dan (00:21)
We'll we'll see.

Shimin (00:24)
Hello and welcome back to Artificial Developer Intelligence, a weekly study group and sometimes emotional support group on what it means to be a software developer in the age of AI. My name is Shimin Zhang, and with with me today is my co-host, Dan AI makes him go horse hugging faster than the syphilis Lasky I will explain that middle name later. Dan, how are you doing?

Dan (00:49)
I I might actually be at a loss for words for once. yeah, I'll just pretend I didn't hear any of that. I'm great. how are you doing?

Shimin (00:57)
⁓ I was laughing out loud to myself before before this recording 'cause it's it's so out of left pocket. yeah, doing great out here in the PNW.

Dan (01:01)
He came up with that one. Okay.

Shimin (01:08)
All right. Well, what are we gonna talk about this week?

Dan (01:11)
⁓ well, apparently venereal diseases and machine learning. But ⁓ yeah, so first up we've got the news threadmill where there's a couple articles. I won't I won't spill the beans there, but there's been it's been an interesting week in terms of AI news.

Shimin (01:27)
Mm-hmm. next up we're gonna go into the tool shed where we will look for some undocumented features of Claude code.

Dan (01:35)
Yep. And then ⁓ we're actually gonna be having technique corner this week. So go through an engineer's guide to better AI skills. So implementing a testing process to optimize agent performance, which is pretty cool.

Shimin (01:47)
Right.

next as always we have post processing, where we're gonna talk about two articles titled Is this Sustainable and the Dead Economy Theory?

Dan (01:56)
And then we're last but not least, we're gonna be having a pretty interesting two minutes to midnight too. ⁓ 'cause there's been quite a bit of news there as well this week. So also not gonna spoil it, but stay tuned for a pretty good segment

Shimin (02:09)
Alright, so moving on to our first item, we have the news

Dan (02:14)
Yeah, that

Shimin (02:15)
That Anthropic

release Opus four eight last week.

Dan (02:18)
Yes. it's been pretty interesting. So I guess like I'll start with ⁓ well there is one feature we should touch on first and then ⁓ kind of cover the vibes a little bit. But ⁓ so they also announced this new like dynamic workflow tool. And ⁓ if you're a longtime listener, I would TLDR it as it's Gas Town made by Anthropic. But if you're not a long time listener, essentially it's a

the additional system they've layered in over the top that I it's a combination of like some extra extra high thinking mode it seems like and then ⁓ the ability to spawn like a a truly massive amount of parallel agents and coordin handle coordination for all of them was my read on it. So unfortunately ⁓ I have not had access to that to test that yet. but I have been using four point eight pretty much since it came out.

because I think I count myself in the number of folks that ⁓ was not a huge fan of four seven. andor slash I kept forgetting to update my environment to use it. So there's that. but I gotta say, like I've like vibes for me, I've had pretty mixed results with four eight too. Like I definitely like it a lot better than four seven. Like it has some of the the flair that four six had that I think people sort of claim that they're missing.

Shimin (03:12)
Mm-hmm.

Dan (03:32)
But I've had it hallucinate, like legitimately hallucinate, including like file names that don't exist and stuff like that twice. And it's like, wow, I haven't seen that level of you know, kind of LLMisms in quite some time, like a year probably. so that was kind of a little bit surprising.

Shimin (03:32)
Mm-hmm.

Dan (03:52)
I also noticed

that it it's a bit more aggressive than previous models in terms of just like but you know spinning up I guess maybe not spinning up subagents really but like because I guess I'm still telling it to you that but like suffice to say I burned through my entire like personal accounts token budget in about twenty five minutes on a like medium sized project with it. So

Shimin (04:14)
Well.

Mm-hmm.

Dan (04:17)
I don't know if that's just, you know, new model who this kind of like the tokens are more expensive sort of thing or what, but like, yeah. So ⁓ and that was I think with the I think Anthropic gifted everyone some extra limits or something for like the first week. So I was like, wow, okay. Maybe I do need to switch to a max plan if I'm gonna do stuff like this. but overall I definitely like like it better than four seven. So in terms of vibe check, that was that was mine.

Shimin (04:30)
Yeah.

Yeah.

good to hear that you like it better. I've been using four eight. I've not noticed really any difference. but I'm also fairly happy with four seven. I think ⁓ the vibe on the internet seems to be that four eight is a little more aggressive, it's a little more sure of itself, and four seven would be like second guessing itself all the time. maybe that's if you

Dan (05:02)
Mm-hmm.

Shimin (05:06)
you know, read the thought traces or something, but I haven't noticed a significant difference.

Dan (05:10)
Which is

funny because they like Anthropic themselves quote, like their pull quote is it's ⁓ more likely to flag uncertainties about its work and less likely to make unsupported claims. So you would think that would kind of be the the opposite, but yeah. So I don't know, we'll be yeah, it's true. Well, the other thing that's interesting to note about this is it has only been 41 days since four seven was released.

Shimin (05:25)
Benchmarks are hard. And these models are good. So ⁓

Dan (05:36)
So this is an incredibly fast follow from them. which kind of makes me wonder if like maybe they're both sort of distills of mythos or something where they're able to do it that fast. but

Shimin (05:46)
Yeah,

that's my guess as well. Like clearly Mythos is still out there and it doesn't seem like they're training a new foundation model for four eight. So then are they just doing additional you know, RL tweaks?

Dan (05:57)
Yeah, like post

yeah, or post training or something, yeah.

Shimin (06:00)
Which makes sense.

And about the workflow. I'm very excited to try out the workflow. It you took my notes right out of my mouth, which is it reminds me of Gastown. Absolutely.

Dan (06:08)
Yeah. I mean it's

like we've always said, the anthropic keeps just taking everything that the rest of the internet does and like gobbles it right into Claude code, you know? So

Shimin (06:19)
Absolutely.

I I have wondered and I also haven't the chance to use it myself, but I do wonder how it handles the merging. the blog post itself was kinda hand wavy about, yeah, and then all the agent work just gets merged in. But like that's actually quite complicated and easily suffers from drift. So interesting to find out you know, if they're doing something ⁓ special with that. The other thing that

I I've been thinking about, you know, is this purely a you know, let's experiment with Claude Code and see what's possible? Is it just like we're gonna copy it from Gas Town? Like what is the actual use case? And I can, you know, think of at least two use cases here. One, they they're rewriting Bun to Rust, which is just this large scale project that is if ⁓ at least right.

Dan (06:58)
Mm-hmm.

Which has actually been a little bit controversial. Like the sort of LM

hater folks are like abandoning Bun over it, which is pretty interesting.

Shimin (07:10)
Yeah.

Now that's made it too main. ⁓ and the other issue was

the large scale security vulnerability testing that they've been doing with Mythos probably also requires something like this. So they did so this should be while it is a research preview kind of, it it should be somewhat battle tested or battle initiated. It's it's not green completely, at least.

Dan (07:34)
Yeah. Yeah, that was one of the other things they referenced in the announcement for it was like it's also great for like bug bashing your entire code base in like one go. So that's

Shimin (07:45)
And the other one that they also they called out was to ⁓ check the critical work twice. and you know I think

I'm gonna call it. This is gonna be the standard going forward. I think every time you vibe code, you're just gonna use compute to save yourself ⁓ the bottleneck of code review, right? Like you instead of having a single agent write a code, you're gonna probably use a dynamic workflow to attack the code it just generated in like four different perspectives, and then you do the final human in the loop verification. that's coming.

Dan (08:15)
Maybe.

Shimin (08:15)
Trading compute for human labor.

Dan (08:16)
We'll we'll see.

Shimin (08:19)
We'll see. Okay.

Dan (08:18)
We'll get to that in in two minutes, I think. Not literally two minutes, but 'cause there's been some interesting news that on that front too this week. So ⁓ worth worth chatting.

Shimin (08:27)
Maybe.

Alright, so we had a New Yorker article last week. Dan, you said that clearly AI has gone mainstream when the New Yorkers is ⁓ writing about it. And this week

We have the ⁓ encyclical letter from his holiness Pope Leo XIV himself. I think we I think AI has finally made it to mainstream.

Dan (08:49)
Did wh when you started

when you started this podcast, Shimin did you ever think we'd be talking about the Pope?

Shimin (08:55)
I I think I could have envisioned it, but maybe like a year or two in the future. Not so soon, so quickly. Yeah.

Dan (09:01)
Well you're a better futurist

than I am because I never in million years.

Shimin (09:05)
So, the encyclical. By the way, have you heard of this encyclical before? Cause I have not. I'm not Catholic. ⁓

Dan (09:11)
Not until this. I am well,

I was raised Catholic, but believe it or not, I ⁓ I had not heard of them so till everyone else did at the same time.

Shimin (09:18)
If not. Yeah.

So a encyclical letter is basically like a pamphlet from the Pope that he hands out to a circle of ⁓ bishops and archdiocese, etcetera, kind of is the Pope's position on important social, moral and religious topics. So past encyclicals include, you know, ones on

The environment, ⁓ birth control, industrialization and the relationship between capital and labor. Huh. and our ⁓ latest encyclical letter is titled Magnificate Humanitas. That's my Latin clearly. ⁓ haven't made it to the

Dan (09:56)
Manitas. What is your Latin?

Ha ha ha.

Shimin (10:02)
Encyclical letter portion. Which means the magnificent humanities or the magnificence of humanity. it's titled On Safeguarding the Human Person in the Time of Artificial Intelligence. Now I've only read about half of this letter. It's very long. but the overall vibe I got from it is that our Pope is potentially

has a better knowledge of what AI is and its ramifications than like a lot of Fortune five hundred CEOs, probably.

Dan (10:31)
It's just interesting.

Shimin (10:31)
Like

He did his research.

Like he talked about how AI models are more ⁓ grown than they are developed, which is like a a quite sophisticated and mature way of of understanding how how these models were created. And ⁓ some of my favorite things from this letter include ⁓ here are some pull quotes.

He warned about AI systems presenting themselves as neutral and objective and then end up reflecting and reinforcing stereotypes or ideological biases in their designers and developers. he's thinking about the moral hazards of pretending that AI systems are neutral when they are obviously not. And he's also thinking about ways to protect jobs.

He calls out that every introduction of automation AI should be accompanied by verifiable measures to protect the employment, retaining and participation of workers. In this way, technology will be oriented towards freeing up human time and capabilities rather than producing exclusion. And that we should have proactive policies that make continuous training and professional transitions accessible to all.

Ensuring that the cost of adaptation does not s fall solely on individuals. I these are these are good takes. ⁓ takes that, you know, some political leaders and CEOs are currently not doing right now.

Dan (11:54)
Yeah.

Shimin (11:55)
Yeah, d did you get a chance to take a look, Dan?

Dan (11:58)
I did not read it, no. I I read a couple sort of meta news articles about but I didn't read the actual source. But even just from reading those, I was pretty impressed by the overall take. Like it's pretty aligned with I f what I feel like my own values are this week. Who knows what that'll be next week, but ⁓

Shimin (12:17)
Yeah, maybe next week you think ⁓ AI should just automate the  humans away. ⁓

Dan (12:22)
Yeah.

Shimin (12:22)
The Pope Leo the Fourteenth picked his name, partially because Pope Leo the Thirteenth had a very important encyclical letter about capitalism and and industrialization and the relationship between capital and labor. so I think he almost saw this coming when he became Pope. Like he's been thinking about AI probably for a long time. And he presented this letter on, I believe, like May fifth.

w and in attendance of the presentation was Chris Olah one of the co-founders of Anthropic. So I just wanna say I think Anthropic's PR machine is so good. They've got the Pope on their side. I d this this is incredible stuff. This is like this is like LeBron the Flu game level stuff. lastly and this is

Now my personal insight. I d I'd I'd heard this reference a couple of times on the internet. in this letter was a reference to quote the twentieth century Catholic author J. R. R. Tolkien. And he quotes one of his protagonists from the novels describing our responsibility in this way. It is not our part to master all the tides of the world, but to do what is in us for this

Score of those years wherein we are set uprooting the evil in the fields that we know so that those who live after may have clean earth to till. Again, I can't believe I'm saying this, but like the Pope is good at this like ethics. Yeah. Cool Pope. We not only is he American, not only is he a fan of the Knicks, apparently. ⁓ I don't know if you heard this, but he apparently blessed the New York Knicks in the NBA finals.

Dan (13:48)
Leave leave your code base better than you found it?

And he's from from Chicago

too, so my my old hometown.

Shimin (14:03)
White Sox fan, he knows.

I'm here for more Cool Pope and his AI takes.

Dan (14:07)
Maybe invite him on the podcast, see what happens.

Shimin (14:09)
I'll reach out to his people. His people get my people will reach out to his people for sure.

Dan (14:14)
I just gotta say I have a better chance of my people getting in touch with them than your people 'cause I at least have cousins that live in Italy, so you know.

Shimin (14:17)
Ha ha ha.

that is true. okay. Next up we've got Claude Code source code.

Dan (14:26)
Yeah. So this is a a blog post by Building Better, or on Building Better. And it's kinda in the vein of one of the earlier ones we did where it was like I read all of Claude Code's code, so you don't have to, but this is a bit more practical focused, I should say. So it's like things you can actually do based on understanding that and reading it. So ⁓ some of the stuff that I found really cool.

Shimin (14:46)
Mm-hmm.

Dan (14:50)
⁓ I didn't have I'll be honest with you, like I haven't played around with hooks all that much myself. So some of this was like I didn't know you could do X to begin with, but I really didn't know you could do X plus Y plus C. So this might be more beneficial to folks that have used hooks extensively. But I didn't realize at all that ⁓ pre the pre tool use hook has some undocumented features where it can return updated input.

Shimin (15:01)
Right.

Dan (15:14)
Which is actually a rewrite of the tool's input before it executes. So you can like modify the command that it's going to call mid-flight. you can return permission decision. So you could actually just return like allow or deny by default without f having it prompt the user for it. you can also return permission decision reason, which is actually what gets shown in the UI. And then you can also add a

A field called additional context, which injects like additional context into the thing. So that's pretty neat. ⁓ like I should really play around with these more because I'm sure you could do some really fancy stuff if you knew it. there's a bunch more. I won't go through like every single one for just hooks in there, but it's definitely worth checking out the post. We'll we'll have it in the the show notes. the other

Like one of the the usages of it that I found pretty interesting was they had like a a like a simple example, but when you think about it, it's pretty powerful. So they had it essentially force using a pre-tool use hook, anytime Claude runs git to always dry run its pushes. So it always like rewrites the command to use dry run, which I thought was kind of cool.

Shimin (16:16)
Mm-hmm.

Dan (16:21)
So it gives you, you know, you can like basically see the the the failure of it. ⁓ or you could use that in like a scenario where like you're running it fully unfettered and you actually need like, you know, something to intercept and like check what it's doing. which is kind of cool. And then the the other thing is it's some minor stuff too that I really appreciate was like they're like

There are so many different places that you can have settings JSON in there. So they did a pretty great job of explaining like what what goes where and why.

What else?

⁓ yeah. The other the other big one I took away from it was there's a whole bunch of undocumented front matter fields for skills too. So I've definitely played around with skills. and it's pretty neat. Like you can I think I knew that you could specify the model, but I don't think that's actually documented. but

Shimin (16:57)
Mm-hmm. Yep.

Dan (17:06)
Hmm.

Shimin (17:06)
I I did not know I did not know that you can specify model and effort for your skills. I feel like this is a pretty large yeah, this is a game changer.

Dan (17:13)
It definitely didn't know effort, yeah.

and maybe it was just agents that I was thinking of that you could

run on like Sonnet or something like that that's cheaper. yeah. But I thought that was pretty cool. any other big takeaways that you had from there? Well I'm

Shimin (17:27)
for me, the two that I kinda already knew, but I've never actually had a chance to play around in the settings JSON, the auto memory enabled, which I knew, and the auto dream dream enabled, which was that experimental feature that ⁓ I thought you can only well I guess if you toggle on via the C L I you can also set set it up with J settings.json. Like if I was using memory, I'll be using it all the time. I I'm really interested to find out like

Dan (17:38)
Mm-hmm.

Shimin (17:52)
to to take a look at what dream what happens when an AI dreams. Does an AI dream in electric sheep

Dan (17:59)
Nice. Sounds like someone's future middle name.

Shimin (17:59)
Of electric shift.

Dan (18:01)
See a Philip K. Dick. yeah, the other thing I I've always been kind of skeptical of the built-in memory, I'm not gonna lie, because like it's basically just JSON files that it's writing. And I'm like, how can that be efficient, right? Like, why do we spend all this time doing rag pipelines and stuff if like it's just as good to spit JSON into there? So I'm always like curious to, you know, whether or not there'd be a better rag pipeline. So I've actually been

I wrote that vector thing for SQLite, but I just found out, I guess I'm a little late to the party about LanceDB, which is like an in-process like column nerd database that you can do vector embeddings in too. so I might play around with some of that stuff soon too, because it's like

Shimin (18:40)
Yeah, JSON is

definitely not the best way to store memory or user preferences, right? Like in fact, I was talking to

Dan (18:47)
Yeah. 'Cause you feels like you would want

semantic search on that, right? Like like yeah.

Shimin (18:52)
Absolutely. It's the like

that's why you should have done that from day one. it's not built in, but I don't know if you heard about the agent harness called Hermes. It is apparently all the rage in San Francisco and ⁓ it has taken over ⁓ the enthusiasm for open claw. I was talking to an event organizer on Friday who flies a lot back and forth between ⁓ Seattle and SF and he

Dan (19:01)
Mm-hmm.

Shimin (19:18)
He talks about how ⁓ SF is all about Hermes these days. And then so I lo I dug a little deeper into like what makes Hermes better than OpenClaw. Because OpenClaw is actually very flexible already. And the main difference is that Hermes has a essentially a rag or vector database for ⁓ user preferences built in. It's c it's some it's something like dialectic API is is the big differentiator. ⁓ so

Dan (19:28)
Mm-hmm.

Shimin (19:44)
you can take your memories and user preferences with you. 'cause you if you can swap the model, you can swap the memory. It it is very flexible.

Dan (19:49)
Cool. Yeah.

Yeah.

That's one of the things I always appreciated about like Pi Agent, which is like what OpenClaw built on top of, is it's like you could literally like swap the model midstream, which is kind of wild. I mean you pay the like con you know the the cache miss on it has to basically take your whole conversation and dump it to whatever the other thing is. So you're gonna eat all those tokens up front. But like sometimes I feel like that's worth it if you're like really hitting a wall with, you know, one one model or something. It's like

Shimin (19:54)
That's a little sidebar.

Right.

The service, yeah.

Dan (20:21)
Try another.

Shimin (20:21)
I haven't done

this, but it would be really interesting to take like a really long in depth conversation and then use that as like the baseline for some sort of Pi Agent based model testing and just like spin it out to four different models and see which which ones respond to you like the most. Almost like an eval but like a one-off one as opposed to like if you're making a big decision on whether or not you should like buy a house or marry a person, ⁓ like a particular someone.

Dan (20:36)
Like e like an eval thing?

You shouldn't be using LLMs for that

anyway.

Shimin (20:50)
That's

that's the future. That's gonna be the future. And if it has the you know, the personal preferences and personality profile already stored in an API. be fascinating to see what the response is. Like, should I marry Tommy? Well, you know, Opus has

Dan (21:02)
Yeah, but like look, I'm not a Luddite. Like I

use this stuff every day, but like please don't use LLMs to decide if you should marry somebody or not. Call your buddy. Talk to a human first. You know. I mean you could certainly ask an LLM for an additional input point, but like, man.

Shimin (21:19)
⁓ c well

Yeah.

Yeah, yeah, like use it as an additional input point. I I mean this is the techno inevitability ⁓ part in me speaking, but I feel like what w how does that line go? Like magnificent. Well I'm not I I'm clearly not a real Catholic. ⁓

Dan (21:34)
Did the Pope teach you nothing about the you know implicit biases and

Shimin (21:42)
Well I was gonna say, ⁓ what was that line that was really popular for a while? ⁓ the the dumbest person you know ⁓ just had like Chat GPT tell him or her, like, you're absolutely correct, and that's a brilliant idea. Like I feel like this is coming. Why marry one person when you can marry four people? Okay. On that note.

Dan (21:56)
Perfect. Let's ship that to production. Yeah.

All named Shimin

Weirdly. You had to really search for that.

Shimin (22:10)
Hmm.

I have a conversation with somebody about that. to technique corner. We have an article from the Pinterest Engineering blog this week titled An Engineering's Guide to Better AI Skills Implementing a Testing Process to Optimize Agent Performance in Any Repository or Skill. It is a mouthful. the short version is they created a

AI skill

Harness.

so the initial problem they had was they have a couple of custom skills that were insuff were not invoked often enough. and they were trying to figure out like, hey, how do we get the AI agent to invoke those skills more frequently at the appropriate moments. And this is something that we don't really talk about, right? Like we

We spend a lot time talking about there's a skill for that or you can build your own skill via the Pi agent. ⁓ but we don't spend a lot of time talking about like how do you know a skill is good? Yeah. So Right.

Dan (23:04)
triggers. Yeah. Or how when it's gonna run too. Yeah. I cause I've definitely run

into that with my own stuff. It's like unless I explicitly tell it like use this, it's like off doing its own thing. Yeah.

Shimin (23:14)
Right. Half the time it doesn't get invoked. Yeah.

Or or it gets invoked at the inappropriate time. I'm like if I have one thing to complain about when it comes to the superpower brainstorming skill is it gets invoked all the freaking time. even sometimes when I just ask for like a rather simple question. So what they did was they created ⁓ fifteen prompts as a a set of positive cases where they expect the agent to use the skill

And then five general programming prompts to serve as negative cases where they do not expect the skill to be invoked. And then they created, you know, basically you could vibe code this, a bash script to run all 20 cases five times each to determine and then using I think they had a pretty basic log parser to log the to parse the output and see if the correct skill was invoked.

they then tested ⁓ a few different techniques to make sure their skills ⁓ get invoked more frequently. including front matter description, so forcing it to use it more and give it a little more context, ⁓ use more aggressive language, like you must load the skill if yes, highest priority. updates to agents.markdown.

Dan (24:20)
Or else.

Shimin (24:26)
So give it better again, better better context. And then lastly a combination of all the techniques above. And they had a grid for their output. So ⁓ before this round of improvement, Codex was calling the scale 73% of the time and cloud was calling it sixty two percent of the time.

⁓ and then they tested on a combination of of the various improvement hypotheses. For Codex, when they got a 95%, so what a 22% improvement when they used a combination of all the techniques. So everything from modifying skill description to ⁓ aggressive language. For Claude,

And this is interesting, I didn't mention this in the in the write-up because it doesn't fit the narrative. Claude actually did best from 62 to 73% when they simply mod modified the skill description or when they simply modified agents numark down. and the combination of all the techniques actually dropped from 62 to or from 73 to 69 percent. And the other

result that was interesting is they also ran a additional run of AI improvements. So like asking AI to make the scale better. And that did not help. Right. In in the case of codex it went from ninety five to ninety three. In the case of Claude accuracy it went from sixty nine to sixty six. Now are these actually statistically significant? I'm not sure. I didn't run the you know student T test or whatnot. But it's it's a reasonable

Dan (25:37)
Mm-hmm.

Shimin (25:54)
approximation.

Dan (25:55)
I also thought it was funny that Claude responded to aggressive language 'cause I could totally see Claude not responding to that one and the and that dragging the all combined down a little bit. But no, it says seventy two percent for ⁓ accuracy on aggressive language alone.

Shimin (26:09)
Yeah, maybe for maybe for Opus

four eight, 'cause it's it's more assertive, but maybe not for four seven. So this is actually one of the problems I had with their experiment, is they've only tested two harnesses and a single model each. Like I will love to see what the results are, you know, f with Opus four five compared to Sonnet four five, for example. Like w in what cases and what

Dan (26:23)
Mm-hmm.

Shimin (26:33)
Can you generalize about the power of the model versus how good the skill is?

Dan (26:38)
Yeah.

And I guess like maybe you'd call this like harness engineering too, but I like you could also potentially add like instrumentation hooks too that that report the prompt and then whether or not it got called. So then you're starting test on real world data instead of just these like what they think are good prompts, you know, that they're running in there. so it'd be pretty interesting to see.

Shimin (26:47)
Mm-hmm.

Yeah. Yeah, absolutely. the other thing that I think

is a great ⁓ area for improvement is they basically once you have a harness setup, you you're basically in self improvement land. You can easily ask the AI itself to like iteratively improve your skills until until it it hits some very high but

Dan (27:14)
Mm-hmm.

Shimin (27:17)
not perfect threshold. and also there was no mentioning of if the false like they didn't break it down between the false positive and the ⁓ actual accuracy. So I don't know if they're negative cases where it it's not supposed to use the scale. Like if we can isolate that, I think that would be interesting. Cause I could see aggressive language causing the agent to lean towards calling the skill regardless of whether or not it's needed.

Dan (27:31)
Hmm. Yep.

Mm-hmm. Yeah, like it's too too strong almost.

Shimin (27:45)
Yeah. So over yeah, overall, I really like this as you know, some actual data points on here's how you improve your workflow in a repeatable, confident way, as opposed to just like let's wing it. And now I this feels better.

Dan (27:59)
This feels better. Yeah.

I know, I was thinking about that 'cause I knew you were gonna ask me about like w w how much I'd used four eight and and stuff earlier today. So was thinking about it and I was like, do I have any like

real data points to share, right? Or is it all just like my vibes and that's kinda why I went with my vibes. But I'm also sitting there thinking, like, these are inherently like non deterministic systems, so maybe like vibes are actually like okay, you know? ⁓

It is funny to see people's opinions and then how drastically they can differ too. Like

Shimin (28:26)
Yeah, they're okay. Yeah.

Yeah, vibes 'cause every single one of our unique workflow is basically like a unique benchmark with our vibes as the percentage. So it's interesting to see how different people have different variations of those benchmarks.

Dan (28:39)
Mm-hmm.

Yeah, reactions to like the

same thing. Like they're potentially even using the same like harness obstensibly, right? But then get very different output.

Shimin (28:51)
Right. Yeah.

⁓ I I really like this post. I think we should ⁓ get more of these kind of nitty gritty details. These are the experiments I ran with my particular ⁓ coding workflow posts. I'll try and include more of them on this show. Okay.

Dan (29:10)
Or do more

of them. You've done some pretty cool stuff with with your experiments. I'll have to

Shimin (29:15)
Yes, I will log roll myself a little bit.

moving on to post processing. I got something from you this week first.

Dan (29:20)
Yeah. So despite the title, which is so this is Jamie Hearst ⁓ saying ⁓ titled Is This Sustainable? and you know, immediately what you'd think from reading that title is like, this is gonna be about like AI itself, but it's actually not. it was to me, this whole thing was a pretty fascinating take on what it means to be a senior.

engineer in the age of AI. and there's really a couple salient points that stood out to me. Like the first was something that I really relate to, which is that seniors have been kind of hitting the brunt of AI assisted coding for a lot longer than juniors have. like potentially several years longer. and

The the reason for that is essentially like the stakes keep going up, right? Is as LLMs get rolled out and stuff like that. And previously the way a senior might have worked is you'd you'd be, you know, depending on your level, you're the running one or a couple teams. When I say running, I mean like being the, you know, sort of like tech lead ish for a one squad or maybe two squads of, you know, engineers. And

But you're driving the technical direction for their work. And now we're seeing this like, you know, sort of change where it might just be just that one guy or gal, you know, who's plus an LLM who's driving that same level of work. and there there's some interesting ramifications of that too, which is that like, you know.

Shimin (30:34)
Yeah. Yeah.

Dan (30:46)
Previously you had to translate that to humans and now you're translating it into like prompts. But but the other the other change that he talked about that is something I I also thought was pretty relatable is we're kind of cutting out a whole step in like it's almost like a s internal sales step in software development.

at least at like, you know, medium to large size companies, like you'd write your proposal doc, whatever you call it, RFC design document, whatever, and shop that around, right? And then sometimes if if it was a big enough idea and you're shopping it to high enough level people, you also had to make like a digestible version of that, be it, you know, slides or whatever. And now it's like you don't do any of that. You just build the damn thing.

And go like, What do you think about this? You don't roll out of production, you just build it and let people actually play with it, you know, potentially even up to that level.

Shimin (31:32)
Yeah.

Dan (31:40)
⁓ and that's kind of changing the entire equation of like how things get prioritized and built. and I think the actual like process doesn't necessarily change as drastically, but the ability to like, you know, when code is essentially free, you can prototype this stuff like really, really cheaply. So ⁓ I thought that was kind of fascinating. I hadn't really thought about the

impact to SDLC, I guess, which is sorry, software development lifecycle in that way.

Shimin (32:08)
Not just that. I mean the co but code is not free just yet. I mean if code was truly free, you wouldn't have so many people. Well, even code is not free because if if it was, you wouldn't have something like Microsoft pulling folks off of their Claude code licenses and getting back on Copilot. Right? Like it's much cheaper, but it's not free free. Yeah. I found this part where he talks about how

Dan (32:11)
Right. Well, it is, but maintenance isn't free.

Shimin (32:31)
when you spend so much time doing this high focus work of, you know, building these proof of concepts and then have more communication with the stakeholders, that one of the first things that got dropped for him was the one the one on ones or the one two ones. ⁓ it it almost seems like we we're going too far when it comes to like, we can cut back on the communication overhead. Now let's like cut it to the bone and just like focus on

Dan (32:46)
Mm-hmm.

Shimin (32:57)
exclusively on shipping, shipping new features.

Dan (33:00)
Yeah. yeah, and then the the other little note that he tucks in at the end is like something that I do think we've all kind of been feeling a little bit too, which is that like, you know, with all this acceleration that's happening, there's almost this like wholly unrealistic expectation of like, okay, now you've got an LM, you can do massively more than you could before. And it's like, well, that's

True to a certain degree and true within certain like boundaries, but outside of those boundaries andor beyond that, there's just scope is just being piled on. ⁓ and and it feels can feel overwhelming really quickly.

Shimin (33:34)
Yeah, one thing I've been thinking about is you hear stories like the Bun rewrite, to Rust, and you know that that's something that is really only feasible when code is more or less free. And in the case of Bun when the tokens are also free because you work for Anthropic. But when you're not working on projects that have such a tight scope, that have such a

⁓ well documented existing API and like unit tests to run. yeah, the coding part, the hands on keyboard part may have become much cheaper, but the stakeholder the figuring out what you need to build part is still really expensive. Yeah, and and now that's all you do, which is A, burnout, which we talked about last week. And B, it's a fundamentally different shape of of development.

Dan (34:07)
Yeah, that's always there. Yeah, exactly.

Shimin (34:19)
than it was. And and I think we're starting to see what the second order effect of coda's nearly free yeah looks like. Yeah. And the one thing that it really resonated with me is when Jamie talked about how he feels like his AI depth is perishable, that like all the stuff we're learning may not be relevant at all in 18 months. That's something definitely I think about in the back of my mind

Dan (34:38)
Mm-hmm.

Shimin (34:42)
time. So like what are the things that are not perishable? we should keep a running list of that on the show 'cause that may be the things that remain in two years.

Dan (34:51)
mean you hear people

say it all the time, right? It's like taste.

Shimin (34:54)
Judgment.

Dan (34:54)
Yeah. Right. It's like, can you Yeah. And and like someone at at my work had a a pretty poignant quote on that, which was like essentially like, You were always hired for those things. So don't let this stuff scare you because that was always the value that you brought to the table. whether or not you knew it. You know, it's kinda like

Shimin (34:56)
Mm. Yeah.

Right.

Yeah.

It it's true f I think for a lot of developers, but I'm I'm not a hundred percent sure it is true for those sweatshops, those body shops out there who's just out there to turn out code. Yeah.

Okay, our next article this week for post-processing is titled The Dead Economy Theory by Owen McGrann. And this is where Dan's middle name comes from. ⁓ so it's a fairly long article, but the key points are

Dan (35:40)
Ha ha

Shimin (35:45)
a pushback against the AI inevitability crowd, I am, if nothing else, a little bit ⁓ infected by the AI inevitability virus.

Owen makes the case that okay, so let's assume that these trillion dollar investments do pay off. And we do indeed get ⁓ a significant amount of work replacement. And then what happens? So the turn one is when a company licenses AI to replace a significant portion of its workforce. Great. Stock price goes up, margins explode, everybody is happy.

Such as when Block laid off half of its workforce citing AI agents. Then turn two, the replaced workers stop earning income, they cut spending, the business they used to patronize see revenue decline, and then so the other businesses are now again forced to use AI to cut costs and that compounds things. Turn three, the company that fires workers to save money discovered that its customers were in aggregate other companies' workers.

Dan (36:21)
Mm-hmm.

Shimin (36:44)
Revenue growth stalls, the AI subscription that was supposed to be an invest investment in efficiency turns out to be a contribution of the destruction of its own market. this makes sense. F if you believe that the companies are only going to use AI to cut costs and replace their employees. I'm not convinced that's not what

The CEOs and the tech Illuminati's are trying to do despite their warnings. Cause if your warning is just ⁓ society must adopt, but I'm not gonna do anything about it, you're essentially rooting for this version of the future. And of course, Peter Thiel gets mentioned. ⁓ Silicon Valley's ⁓ incorrect read on Nietzsche's philosophy gets mentioned. where

And this is part of the mm. Okay. So so the

Dan (37:28)
You have to dive into that one a little bit for me.

That's right, I asked

you to dive into philosophy. I wow.

Shimin (37:36)
There we go. ⁓

Nietzsche had this idea that there is this thing called a Ubermensch, a superhuman. Very much like Ann Randian and you know, there are those who whose decisions matter more than everybody else. And it's really supposed to be a concept about man's relationship with his or her choices in the world.

and how we have more agency and the ones who can truly take agency is perhaps better in his philosophical framing.

Peter Thiel wrote in two thousand and nine that he no longer believed freedom and democracy were compatible. This was fifteen years ago. And he's been, for better or worse, working towards

Dan (38:21)
Yeah. Creating that.

Shimin (38:22)
Yeah, creating a world where that is true. And so this all goes back to the idea that you're like these Silicon Valley po elites think that they are above the rules of law, they're above freedom, they're above democracy because they are the true Uberminsches. And of course that's you know not actually true, right? As the Pope just told us like twenty minutes ago, that human dignity is still magnificent. And

So that goes back to our ⁓ horse hugging and syphilis. full circle moment. So Nietzsche lost his mind ⁓ fairly early on in his career. And supposedly the last thing he did before he lost his mind was he was walking down the street and he saw a workhorse that was beaten by its owner and had like lashes on it. He hugged the horse and he cried and then he fell down and he went crazy.

Later on the doctors found out that he has syphilis. So the syphilis may have caused him to go crazy and hug the horse.

Dan (39:18)
Uh-huh.

Shimin (39:18)
But I think AI is what's doing it for a lot of these tech elites these days. They're all hugging horses with the help of AI.

Dan (39:26)
Well the the part that gets me about this, if we're gonna pontificate ha ha ha on it for a moment, is that the you hear a lot about like universal basic income, right? But in especially in this context of like the the dead economy theory. but the part that

Shimin (39:30)
Mm-hmm.

Dan (39:42)
makes me and I I think UBI is interesting, but in the context of like I was interested in it before AI was a thing, right? And like I've read the Ian Banks books, right? Like I I think space communism is pretty cool, but like I don't know. Like that's not what this vision is, right? That a lot of the stuff i is shooting for. And then like when they talk about UBI they're they're

Really looking at it through the lens of like how Henry Ford had to give raises to his workers so that because his workers controlled a large enough swath of the economy that they literally couldn't buy the cars that they were producing without the raises. So it was almost like a self inflating, like consumerist loop kind of, you know. And that's like what I would see UBI doing in that. But like UBI is also like state funded, right? And

Shimin (40:25)
Yeah, absolutely.

Dan (40:31)
It's like, okay, well who's gonna fund the state if the economy is in fact dead? So I don't know.

Shimin (40:38)
I mean I don't necessarily think he's saying that the economy will be dead. It's just that the state would take money from the power from the wealthy and redistribute it to everybody else. I mean that's wealth transfer is is one of the main jobs of of a state, in my opinion. or if you wanna put a better name on it, resource reallocation is one of the main jobs of a state.

Dan (40:56)
Found the

Yeah, that's fair. Or incentivizing. Like

I always like the ⁓ like behavioral economics podcasts and like ⁓ people talk like that study like incentivization, right? So it's like trying to cause better outcomes for everyone via incentives instead of like forced rules. ⁓ so I've often felt things like like tax credits or whatever have always been like an interesting lever that they've had to pull.

Shimin (41:18)
Yeah. So

Yeah, and this article does mention that like people don't want UBI because people feel like so much of their identities are tied to UBI. Yeah. my counterpoint is look at how trust fund babies live their lives. They seem plenty happy and dignified to me.

Dan (41:30)
It's tied to work. Yep.

Dignify Yeah.

Shimin (41:41)
some of Some of are dignified.

Some of them are struggling artists in Bushwick That seems fairly dignified to me.

Dan (41:46)
All

right.

Shimin (41:49)
⁓ yeah, and it doesn't it doesn't seem like the Silicon Valley elites are going to change their view of how they operate. So we'll see, but it is a it is quite a takedown, even if I don't agree with it a hundred percent.

Dan (42:03)
And it will be, of course, as we talk about on two minutes all the time, pretty interesting to see what happens one way.

Shimin (42:12)
Yeah, speaking of ⁓ speaking of seeing what happens, ⁓ let's move on to two minutes to midnight, where we talk about where we are in the AI bubble using the analogy of the Armageddon clock from the Bulletin of Atomic Scientists. Still going on that clock. I have the first item for this week, and it is the long-awaited, much rumored we finally have it, form S1.

Dan (42:12)
Living in interesting times.

All right.

Shimin (42:38)
Of the Space Exploration Technologies Corp, also known as SpaceX. It's funny how they call it the Space Exploration Technologies Group. ⁓ because when you go into the S1 and you search for TAM or Total Addressable Market in their market opportunity section of the S1, huh? We see that SpaceX

Filed that they have a total of $370 billion in its space segment. ⁓ that's a lot of money. there's another $1.6 trillion in their connectivity business group, which is Starlink and Starlink Mobile. and we see a $26.5 trillion adjustable market. That is like most of the size of the US GDP.

And what is it? It is AI. Now I think I think they should name themselves the ⁓ maybe AI exploration, AI Technologies Exploration, parentheses space. But this number is wild, and 22.7 trillion. my I'm not used to saying trillion this often ⁓ anywhere of that is enterprise applications. Of course, this is from their truth seeking models.

Dan (43:34)
Ha.

Shimin (43:49)
You know, let's do a little quick search. Let's see how many times truth seeking comes up. Thirty nine times.

Dan (43:54)
Thirty nine times in an S one.

Shimin (43:57)
Thirty nine

times, yes. lest you forget ⁓ how the XAI models differ from its competitors. It's 'cause they're truth seeking.

the last thing I just wanna quickly mention on this is, you know, when they ⁓ c came up with that ⁓ twenty two point six trillion number for their enterprise TAM, what they did was they basically riffed off a study on the total addressable digital economy, and just stuck that entire number there.

Dan (44:16)
Mm-hmm.

Amazing.

Shimin (44:29)
you can't gotta say w whatever you say about Elon, you can't say he's not ambitious.

Dan (44:33)
Yeah. That's I guess a word you could use.

Shimin (44:36)
⁓ That's what I got.

Dan (44:37)
It's it's especially funny if you remember last week when we talked about the ⁓

that like AI spending comparison website that we'd found. I don't I already forgot the name of it, which is embarrassing. But yeah, that's right. And like what a tiny little drop in the bucket XAI was compared to even the other Frontier labs, who are also tiny drops in the bucket compared to Nvidia. So I know this isn't on the list, but I feel like I have to mention it anyway.

Shimin (44:47)
Yep. Yep. Is AI profitable yet?

Dan (45:05)
Also supposedly, well, not even supposedly, Anthropic announced it on their own blog. Anthropic has confidentially submitted a draft of its S1 to the SEC. So no no surprise there that they're going public soon, but ⁓ just worth noting that's happening. ⁓

Shimin (45:15)
Yes. Yes.

Yeah. I I kinda

thought OpenAI would've beat them to the punch, but I guess not.

Dan (45:27)
Yeah. Especially since they seem to be a little bit more cash starved at the current time. And then also Anthropic just did a raise right before the S one draft too, which is kind of funny. It's like they almost did it just to punch their valuation up. Like kind of. Yeah. Yeah. So yeah, actually we are we are about to talk about that. Good call. So ⁓

Shimin (45:32)
Mm-hmm.

Which is

Well well, which is something that you're gonna talk about.

Dan (45:50)
Yeah, Anthropic has officially surpassed OpenAI to become the world's most valuable AI startup. and that is on the back of a 65 billion Series H round, which included Altimeter Capital, Dragonera, Green Oaks, and Sequoia. So for those of you that are excited about VC names, there you go. yeah, so

Following the deal, Anthropic officially overtook OpenAI in market valuation. ⁓ which is actually kinda crazy to think about because I guess mentally I'd already sort of pegged in there anyway because of the enterprise focus, but ⁓ now it's official. And ⁓ we will see how that pans out when their S one gets made public at some point.

Shimin (46:17)
Mm-hmm.

Yeah. And their money

quote here is their new valuation is nearly three times higher than the company's February valuation. Today is June first. So they've they've done this in like two months. They just created six hundred, seven hundred billion dollars out of thin air. Like this is absolutely the quickest growth of any company valuation probably in the history of humanity.

Dan (46:56)
Mm-hmm.

Shimin (46:57)
⁓ what does that say?

Dan (46:57)
Salarando is here, folks.

Yeah.

Shimin (46:59)
Yeah. And I've got to mention earlier, but you know, the fact that Elon is looking at the entire digital economy as what the total addressable market is for XAI, really dovetails nicely with our previous post processing article where the the goal of the tech elite is to make basically replace all human workers and make the economy

I know, monopoly money. Make us all digital peasants. Something like that.

Dan (47:23)
Yeah.

I w I and did that include like SAS and stuff like that too. I guess it must, right? If it's like the entire

Shimin (47:32)
Yeah, it includes

everything. We can't f find the detail, but yeah. Pretty sure it includes everything.

Dan (47:37)
Good luck replacing Amazon.

Especially when they've effectively just like loaned out their compute to Anthropic, which tells you how much they were using it. Mmm shots fired.

Shimin (47:46)
Right. Yeah, they talk about how they have

they talk about how they have like the world leading state of the art ⁓ AI infrastructure, but they you know, clearly did not mention that, but they are not using it.

Dan (47:59)
Yeah.

glad Anthropic is 'cause ⁓ it was getting kinda dire there with ⁓ rate limits and stuff for a while. So

Shimin (48:05)
Yeah. Yeah, I'm happy about that too.

Dan (48:08)
⁓ okay, so we've got

bunch of big announcements. I feel like we're like right on the cusp of this potentially dramatic tipping point one way or the other, right? 'Cause a bunch of these companies are about to go public. And I think we're gonna start getting like real signal about it. the other thing worth discussing is as you mentioned, like Microsoft has pulled like they've very publicly pulled a bunch of anthropic licenses.

stating cost as the primary driver. and have been pushing their staff to go back to copilot. there was also the post about or I guess let's just say the news about copilot's pricing change that happened recently, right? Where it's going to usage based pricing. So I feel like there's this the those two trends in my mind are converging towards the tipping point and we're gonna find out what happens here.

Shimin (48:47)
Right.

Dan (48:56)
like soon because we're gonna start seeing the cost escalate on this as it gets real, especially as some of these companies go public. They're not gonna be able to just continue to subsidize inference, you know, or even like model creation. and unless I think the maybe the one caveat to that is like investor FOMO gets everybody on board and they just print money for a long time that does sort of

Subsidize it off the market, essentially.

Shimin (49:22)
Yeah, I think somewhere along the first week of SpaceX post IPO and until their first quarterly earning comes out is like a real danger zone. but you know one thing that's been bugging me? why is anthropic raising another sixty five billion dollars when they're when they're about to go IPO in like a couple of months?

Dan (49:42)
I think it's purely

for like valuation play.

Shimin (49:44)
Mm.

Dan (49:45)
Like they're gonna use that as ammo to try to get their like whatever their initial listing price is is like quite high, you know.

Shimin (49:51)
But this is not a

insignificant amount of money for the valuation. Like they could've done this with fifteen, twenty billion dollars. Like this is sixty-five. This is almost ten percent of their company. Like that seems like a lot of money that to raise from private sources. Like unless they're really that crunched for cash. Like it seems like a lot of money to get right before you go IPO and you know, get all that other money from the public.

Dan (50:09)
Could be.

Shimin (50:17)
But otherwise, yeah, I agree with you. We're we're getting close. now that I've seen the SpaceX S one and you know its grand ambitions. and and also just the fact that Anthropics valuation tripled in four months. Sometimes you see these numbers and you're just like

Dan (50:34)
It doesn't even compute as like human sized numbers, yeah.

Shimin (50:35)
Yeah. You just

you just gained like two Toyotas, one Volkswagen? Like what are we talking about here? Like how is this possible within four months? all of that makes me makes me think we should move it forward. Like we're not we're clearly not there yet, but I I think we're getting closer to kinda the music is about to stop, so to speak.

Dan (50:54)
Yeah. Well, or it will continue in grand style, you know. I don't know. It'll be interesting. But Yeah, I'm o I'm okay with I guess cautiously moving it forward because I'm really wanna wait and see what what happens as a result of all this. But it does this convergence of inflections that maybe means like you know

Shimin (51:13)
we're at six minutes and fifteen seconds.

Dan (51:15)
Mm.

Hmm.

How far forward do you want to go?

Shimin (51:17)
I'm thinking thirty seconds, five forty five.

Dan (51:19)
That's not that far. I was thinking like three minutes, but okay. Fine. Yeah, I guess you're right. It's gonna it's gonna That's true. Not only is though there are a lot of time until that happens, but you're right. Nothing meaning we're not gonna get meaningful signal until probably not even first earnings. It's gonna take a few, right? Cause I bet people will will investors will likely FOMO themselves into

Shimin (51:23)
Well we have we have a we have a lot more time until the IPO actually happens.

Yeah, until we find out

Dan (51:42)
not caring about losing, you know, nine billion dollars a quarter or whatever initially. but after a while I I suspect that may not fly. So okay, that's fair.

Shimin (51:53)
I also think that the brand name of SpaceX may be just so strong that like that alone can keep the unrealistic like one point two trillion valuation c going for a couple of months until people get disillusioned. Or they don't 'cause it's Elon and everybody has the mind virus. I don't know.

Dan (52:11)
Yeah, well, I'm actually not even thinking about that them too much. I'm more focused on anthropic. Cause I think that like like anthropics how do I put this delicately? I think Elon's doing Elon things, which is like basically the same thing he did with Solar City, right? Where he rolled the failing business into the one that was doing okay to keep the parade rolling for a while. It's like have you ever actually seen a solar city roof?

Shimin (52:17)
And R back.

No, never.

Dan (52:36)
I have seen one in person. and it was very much like a special circumstance where it seemed like it was being done as a a favor to one of Elon's family members. Go figure. Yeah. Uh-huh. and it just feels like that to me too. It's like, why would you even roll AI into rocket launches, you know?

Shimin (52:45)
Mm. Mm. ⁓ well. That checks out.

Dan (52:59)
Like

Shimin (52:59)
a a million they

Dan (52:59)
They're gonna build a data center in space.

Besides rent all your compute to anthropic. In this day and age, you'd think that would just be a sustainable business in and of itself.

Shimin (53:05)
Yeah. Yeah.

Dan (53:11)
Yeah. okay, so

Shimin (53:12)
Five forty five.

Dan (53:13)
Okay. I could I could be convinced. That sounds better to me. But am I flopping too much? I feel like I'm a historical flopper when it comes to minutes.

Shimin (53:13)
We can go down. I'm I'm happy to go down to five thirty. Yeah.

Five thirty. Okay. Let's do five thirty.

Well that

that's the shtick. We we are here. you're flipping and you're flopping. we're just need to put a whole bunch of Dans together and we can have ⁓ memory soon enough.

Dan (53:27)
ha ha.

Yeah, one bit of RAM for every two dans that you can scrounge up.

Shimin (53:35)
okay, five thirty it is. Yeah. That is gold. Yeah. We can get enough of it. that's the show, folks. Thank you for joining us again for our study session this week. If you like the show, if you learned something new, please share the show with a friend. You can also leave us a review on Apple Podcasts or Spotify. It helps people to discover the show and we really appreciate it.

Dan (53:35)
I mean th these days that's gold, right, with the ramageton, so

Shimin (53:53)
Got a segment idea, question for us, or a topic you want us to cover, shoot us an email at humans at adipod.ai. We love to hear from you.

Dan (54:00)
Or

if you really, really, really miss Rahul that much, you could send us an email about it too. But don't worry, he'll be back next week. Spoiler alert.

Shimin (54:07)
Yes. He'll be back

from his ⁓ the Mecca Hiller layer soon enough. You can find ⁓ photo show notes, transcripts and everything else mentioned today at www.adipod.ai. Thank you again for listening and we'll catch you next week. Bye.

</details>