I.
The concrete floor glistened with a forest-green sheen, as if sweating from the labor of the morning. The air was dense with the quiet tension of trade. Men in wader boots filed slowly through rows of tuna, hundreds of them, laid out like stone monoliths, forming a kind of aquatic colonnade. These quiet arrangements, this obsession with order, felt characteristically Japanese.
The auction floor was divided in two. On one side, freshly caught tuna shimmered with the navy tint of the deep ocean from which they were summoned. On the other, most of the morning’s inventory lay frozen in the pale stillness of liquid nitrogen. If it was not for the pink steaks that topped the carcass like some pescatarian sundae, they could’ve been mistaken for marble waiting to be sculpted.
A buyer approached one of the frozen fish. He struck the carcass with a long metal hook, carved out a shard of meat, and rubbed it between his thumb and fingers like a Roman general pressing dirt before battle. Then he scribbled a quick note and moved on.
“This is how they determine the quality of the tuna,” Toshi, our tour guide, whispered. “They can tell the fat content of the tuna based on how oily it feels. It takes many, many years to gain this skill. I after 17 years still have not mastered it.”
Toshi, a restauranteur, narrated as we observed the ceremony. It was 5:20am on the auction floor of Tokyo's Toyosu Fish Market where over $1.6 billion in seafood is transacted each day. I was joined on the tour by a couple from San Francisco — he a software engineer at Nvidia, she an art teacher.
"I hear there is AI now that judges fish quality by picture," he laughed, then paused. “Maybe in a few years, this will be how auction is done.” His tone shifted, not sarcastic, but uncertain.
While I watched these men do their bidding in a half-sleep-reverie, It came to me then how grand and varied the scope of human intelligence truly is. Carve, rub, pause, write. A full sensory loop, compressed into a few seconds.
Some information we capture through sight, some through sound, some through a touch of frozen meat that only takes a second to register. We gather information through our senses, process it, and arrive to an outcome in remarkable efficiency.
What does it take to distinguish a $5,000 fish from a $500,000 one with just a twist of frozen meat? How much intelligence did it require for that buyer to process that information? How many calories did that cognition exert?
On a marginal basis, remarkably little. But such intelligence required decades of apprenticeship. Such apprenticeship requires incredible training costs and a hyper specialized dataset with accurate reinforcement learning. This is a dataset of feel — small, narrow, and extremely expensive to acquire. There is no dataset on the Internet that can train an AI model this information, or at least none that I know of. The training of that intelligence, on an upfront basis, is super expensive and specialized.
Apprenticeship is undervalued in the grand scheme of human and artificial intelligence. It’s compressed inference built from ultra-specific training. The tuna buyers have likely handled tens of thousands of tuna prior to even being tasked with the job of making a purchase. They were not trained on images or proxies. But via a multimodal process that required the nuance of resistance and texture and cold that only results from feel paired with the proper labeling and reinforcement learning from an expert that can only result from sound and language.
This is apprenticeship as dataset. It’s a dataset that’s small, high-quality, non-transferable, and expensive to create.
While modern AI thrives on large, labeled, low-friction data, apprenticeship data is the opposite:
Narrow domain
Long time horizon
High training cost
No standardized labels
Unquantifiable feedback
You can’t crowdsource “feels right.” You can’t scrape “finger feedback.” You have to be there, over and over again. That’s why tuna evaluation hasn’t been outsourced to software. Because even if you could digitize the sensory input, you can’t digitize the expertise required to interpret it.
Toshi mentioned AI that uses images to assess tuna quality. It will likely work on the margins. The moment the image is slightly distorted, the angle imperfect, or the lighting atypical, the model breaks. That’s because vision is just one slice of cognition. The tuna buyer isn’t merely seeing fat content. He’s feeling it and making a multi-sensory inference rooted in years of embodied learning.
This leads to a bigger question: Can AI learn to touch?
Technically, yes. We have haptic sensors and pressure readers and robotic skin. But the deeper point isn’t about data ingestion, it’s about judgment. The buyer doesn’t just feel softness. He knows what that softness means in the context of origin, auction history, and chef preference. This requires a massive context window that moves beyond sensing, but synthesis.
AI models are improving rapidly. But the labor required to train an agent to sense like a human — not just with sensors, but with accumulated context and internalized standards — is staggering. This is low-calorie cognition with high-cost training: effortless in execution, brutal in formation.
What is this trying to teach me?
Not all intelligence is scalable. Not all expertise is downloadable. There are whole domains of human knowledge that live in scar tissue, in calloused hands, in movements too small to name.
AI will replicate many things. But not the intuition of the tuna buyer.
Reflecting on this, I’m reminded of the guiding mantra Jiro shares in Jiro Dreams of Sushi —
All I want to do is make better sushi. I do the same thing over and over, improving bit by bit. There is always yearning to achieve more. I’ll continue to climb, trying to reach the top…but no one knows where the top is.
The tuna buyer waking up at 1am to claw, feel and purchase the days inventory. The sushi master chef slicing sashimi, massaging rice and squeezing it into perfect nigiri.
On a timeline of days, weeks or months, the monotonous nature of this work could drive the American mind mad. But over years and decades, the bit by bit improvements compound and internalize into truly specialized intelligence.
This is the indirect, compounding upside of performing labor. The process of each repetitive task builds upon the former, slowly molding a skillset rep by rep. No culture better represents this training of intelligence and the value it creates than the Japanese.
This model of apprenticeship appear essential to the next evolution of human and artificial intelligence. Both as it defines how AI begins to embody intelligence in multimodal context, and the boundaries of jobs that will fundamentally remain in the domains of the human master and apprentice.
II.
I am sitting in a rooftop bar in Kyoto, alone, overlooking a quiet skyline.
I am alone. But I don’t think I felt lonely.
What does it actually mean to feel lonely? It is not the state of a being alone. One can feel lonely surrounded by millions of people or a handful of friends.
I reflect on the feeling and try to untangle it. I cannot help but summon Rene Girard. Do I actually feel lonely, or is what I am feeling simply the desire to be desired? Is this feeling a reflection of myself, or is the desire to be desired an something I parrot?
I do not desire to be un-alone; I am content in my solitude. But my emotions stem from an imitation of the qualities I find compelling. What I feel is the absence of that and the empty space that desiring feeds upon. I am not lonely for I desire alone time; I am simply still caught in a prison of status seeking.
The whiskey helps blossom these thoughts. I am now completely alone, savoring the most astonishing view. It is much more tranquil now than when there were three in this bar. I no longer feel like I live in the eyes of other people.
There’s a kind of liberation in solitude that mimetic culture never gives you.
III.
On my third day in Kyoto, I got sick. I pushed through and visited Saihō-ji, the moss temple, a lesser-known sanctuary blanketed in emerald greens and solemn quiet that spoke to the pride of Mother Nature.
Later, I walked into a Japanese pharmacy, overwhelmed by a wall of unfamiliar characters.
I took a picture, told ChatGPT my symptoms, and asked it to recommend what I should buy. It immediately recommended three items and shared their photos. Within the same interface, I used ChatGPT to translate my conversation with the pharmacist and recommend additional question — in a way acting as my advocate. What an incredible multimodal use case that compressed the time, pain and cost of such a difficult transaction down to cents.
Saddled with my eastern medicine, I returned to my hotel to rest. Intrigued as to what I would find, I jumped through the various Japanese TV channels. One after another, it seemed, I found broadcasts that felt more like watching Youtube than watching cable TV. Each program boiled down to people watching people watching people. Whether it was I, the viewer, watching the Japanese TV hosts watch mukbang, man-on-the-street videos or restaurant tours, the consumption experience was layered, recursive, voyeuristic.
It reminded me of a portfolio company called Nero that helps music livestreamers manage and monetize livestreams where they listen to, comment on and review their fans’ music. This multiplayer entertainment experience seems to be more on the edges of American culture, popping up in particularly genres such as gaming and music on Twitch, TikTok Duets and Youtube, yet even still we are seeing the rise of this form factor in sports broadcasting — The Manning Cast being the epitome example.
What is clear is that we are entering the age of meta-multiplayer media. Where watching is the entertainment and commentary becomes the content. As McLuhan said: the medium is the message.
This was such a beautifully written piece, it got me on Substack for the first time. Loved the reflections on apprenticeship.