AI is Here: Musings on What it Could Mean
Musings on AI: AI Interface, Aggregation Theory, Heart on my Sleeve, Philosophical Considerations, Postmodern Theology
Defining AI
AI is a concept, much like the Internet was in its infancy, that struggles to locate a serviceable definition. To help do so, let’s break down industry dynamics into three layers: production, distribution and monetization.
The production layer deals with how a product, or output of production, is made.
The distribution layer deals with how a product or service is delivered and consumed by a consumer.
Lastly, the monetization layer deals with how a product or service is purchased or monetized.
Every job, industry and part of our economy can be nested within one of these three layers. Humans achieve economic progress and prosperity by making these layers more efficient. This means that we invent technology to allow us to do more with less.
This is the process of dematerialization. The process of dematerialization is a continuous, exponential journey of innovation across these three layers.
Digitization has been most the recent wave of dematerialization to manifest in both the monetization and distribution layers.
Modern credit cards and credit scores digitized the monetization layer and greatly expanded the liquidity and purchasing power across the economy. It unlocked new means of exchange that were more efficient as people could purchase goods and services through a network of credit stored in bits rather than physical cash.
The Internet, in turn, digitized the distribution layer through new information technology that transferred bits of information across servers and routers at the speed of light.
Instead of distributing information over newspapers that had real marginal costs (paper, ink, delivery) and scarce distribution, information could be distributed across the world for free, instantly. The main constraint was no longer the physical space available in a newspaper or channels on a cable network, but the production of information, content and other outputs that could be rendered in bits, fill this abundance and best meet demand.
I made this observation first in 2021 in my post Synthetic Media. As I wrote:
Synthetic media dematerializes the production layer and solves the scale & personalization paradox. Software leads to distribution at scale, AI leads to personalization at scale.
I view AI or AGI (Artificial General Intelligence) as the digitization of the production layer via computation across emerging models like LLM. This new digital means of production transforms historically analog production environments across many industries that have been constrained by marginal costs involving labor an inputs.
What this means is that there will be many categories of industry that can be produced digitally at scale with little-to-no marginal cost. This is both deflationary (reducing the cost of inputs, products and services) and leads to an abundance of supply.
This has led me to two questions.
What happens, then, when marginal production cost goes to zero and supply becomes abundant?
Will natural language call-and-response chat queries be the interface of AI applications?
Let’s start with the latter.
Interfacing with Infinity
What many may not recognize is that AI is not necessarily new. In fact, AI has been around for many years, silently smoothing out the distribution mechanisms of our daily entertainment tubes: social media (among other use cases).
Indeed, each day when we consume user generated content produced by our friends, and increasingly by creators and individuals we most likely will find interesting, we train sophisticated AI models with digital gestures. Each additional like, comment, scroll, click and view acts as a lever that further refines our own hyper-personalized call-and-response query. This query acts similarly to a perfect prompt that one might submit to ChatGPT.
Instead of natural language prompting the model, a user’s interaction with the interface causes the model to crawl the entirety of content published on the network and fetch the photos or videos that best meet the refined ‘call’. The model then display that object within the inventory slot in the Feed.
This, I imagine, represents much of the the AI investment that Meta recently discussed in their earnings call, which highlighted positive results for their advertising business and user engagement over the past year. As I wrote about in Social Media has a Commerce Problem in January:
Instagram has shifted “the focus of its e-commerce efforts to those that directly drive advertising".
The semantics of this announcement are important. Instagram, I believe, identified this local maxima and understood that their efforts to own conversion with commerce came at a cost of their ability to offer superior discovery through the Feed — and with it monetizing said discovery via ads.
While the prevailing consensus views Instagram’s scaling back of shopping features as an admission of defeat, I believe it is the opposite. This redirection is not a failure, but a triumph of focus that diverges from their main competitor (TikTok) and doubles down on their ultimate competitive advantage: creating demand via highly efficient and programmatic distribution of UGC and ads.
Taking a step back, this interface is a major improvement to how media distribution operated prior to social media Feeds. In that era, users had to actively query, in natural language, the types of media that they wanted to consume online. This at first was done directly in the search bar by typing in a URL and later via search engines.
In the age of search engines, Google rose to dominance because of, among other reasons, their ability to accurately fetch the supply of webpages on the Internet given the call stipulated by the natural language of the search query.
Indeed, upon prompting the search engine, a user was shown a list of links with related images and short descriptions. Users relied upon — and still do — the search ranking, short text previews and brand recognition of publishers to choose which media to consume. From there, they then had to click the link and travel to the webpage that ultimately distributed the media.
This consumption experience requires quite a significant amount of time and energy from the user. Furthermore, each search doesn’t materially adjust the ‘call’ that the search engine uses to fetch supply specific to the user (it does so specific to the supply by ranking webpages to “key words”), so each incremental query still largely relies upon the conditions stipulated by the prompt in the natural language search. Instead of further training the model to personalize the call and response, search engines created artificial scarcity by monetized the finite real estate of the search results via new SEO products and key word bidding.
But as users shifted their engagement from desktop to mobile, a few things happened.
One, the cost and energy required to produce media drastically decreased. Each mobile phone also dubbed as a professional camera. When paired with a intuitive touch interface, this increased the production of media content by many orders of magnitude across the Internet. Similarly, the time and energy it took to consume media content greatly decreased. Apps require a click to open and then the simple and intuitive gestures of scrolling, tapping and sometimes typing to consume — a much easier experience than Search.
Secondly, the total time we spent engaging — both consuming and producing — media content increased substantially. Instead of sitting at your desktop, a user could be anywhere on their phone generating and consuming content.
Both the increase in daily time spent with digital media and the interface innovation of smartphones and Feeds meant that users informed distributors (social media networks) what they liked to consumed in an exponentially more efficient and absolute manner.
This enabled each call-and-response query — what to show in the next inventory slot of the Feed — to have exceptional instructions that only got better with each incremental use.
Additionally, new ad products better monetized the production and distribution of this supply as these ads leveraged the same AI models to make better distribution decisions. Instead of bidding on SEO for a finite amount of search, ads could be distributed across a plethora of Feeds and generate demand by improving the call-and-response model enabled by this intuitive new interface.
Consequently, I believe a gesture-based Feed interface, not the current natural language chat interface, will be the ubiquitous interface of AI applications in the future.
As we have already begun to see, capital and resource advantages have not created any defensible edge amongst the different AI Producers attempting to develop exclusive models. Midjourney is a self-funded team of 11 that is run entirely in Discord. Stable Diffusion is a completely open-sourced text-to-image model that can run on a modest GPU with at least 8 GB of VRAM, such as an iPhone.
Furthermore, SemiAnalysis recently released a leaked internal Google Memo which claimed “We Have No Moat, And Neither Does OpenAI". As the document revealed:
While our models still hold a slight edge in terms of quality, the gap is closing astonishingly quickly. Open-source models are faster, more customizable, more private, and pound-for-pound more capable. They are doing things with $100 and 13B params that we struggle with at $10M and 540B. And they are doing so in weeks, not months.
The pace of innovation occurring within the digitization of the production layer is astonishing, and consequently, new digital manufacturing processes are becoming obsolete almost as soon as they come to market. No one has proven true mastery when it comes to how best to ‘produce’ new AI manufacturing environments.
As the Google Memo confirms, leaner manufacturing processes are scaling better than more intensive ones, as many open source projects are “saving time by training on small, highly curated datasets.”
Since the beginning of March when the Meta’s LLaMA was leaked to the public and open source community got their hands on their first really capable foundation model, things that Google deemed as major opened problems have quickly been resolved, such as:
LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec.
Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening.
Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of art models with no restrictions whatsoever, and text is not far behind.
The memo concludes “Directly Competing With Open Source Is a Losing Proposition” and makes an interesting observation:
Who would pay for a Google product with usage restrictions if there is a free, high quality alternative without them?
This observation recognizes that instead of taking on the sisyphean task of competing with the wave of new leaner, cheaper, faster models that will be spurred on by the open source community, the real prize is to become the distribution interface that delights users, but more importantly, subtly trains and personalizes its own call-and-response distribution mechanism across a network of third-party AI models.
As Ben Thompson explains in his seminal thesis on Aggregation Theory:
This has fundamentally changed the plane of competition: no longer do distributors compete based upon exclusive supplier relationships, with consumers/users an afterthought. Instead, suppliers can be commoditized leaving consumers/users as a first order priority. By extension, this means that the most important factor determining success is the user experience: the best distributors/aggregators/market-makers win by providing the best experience, which earns them the most consumers/users, which attracts the most suppliers, which enhances the user experience in a virtuous cycle.
I believe that the true moat in AI will be an Aggregator’s intuitive, gesture-based interface that leverages user inputs to triangulate the line of best fit within a near-infinite scatter plot of potential outputs of production for a given objective or Job to Be Done.
In this graph, the X axis will be Engineering: the requested object, form or feature of the desired output, and the Y axis will be Design: the style by which the object, form or features takes shape in.
For many industries, the means of production will essentially follow the call-response of “Engineer X in the Design of Y”.
Depending on the context of the desired objective, there will be a line of best fit that optimally coordinates the two variables and denotes product-market-fit. Solving this function, i.e. being able to determine the optimal pairing (X,Y) will become one of the most important Jobs to Be Done that differentiates a distributor — and the interface that best approximates this will win.
In this case, what I believe we will see is the integration of the distribution and production layers across many Jobs to Be Done. The distribution layer will directly inform what to produce. The role of the distributor will not be to fetch a pre-existing output or product that best matches demand, but to directly produce the output based on the user’s instructions.
I believe this virtuous feedback loop will work as follows:
A user informs a Distributor of the desired “X” and “Y” through interface gestures (likes, shares, variations, etc.)
This then updates the Distributor’s model and what to “prompt” the commoditized network of AI models/producers
The Distributor calls the network of AI producers to “Engineer X in the Style of Y”
The networks of producers respond based on their regression analysis and adjusted lines of best fit for the specific user and Job to Be Done
The Distributor then fetches the best output from the different AI Models given its own proprietary distribution model
A user informs the Distributor of the desired “X” and “Y” through interface gestures with that output (likes, shares, variations, etc.) and the virtuous loop continues
Indeed, owning demand, rather than supply, will be of utmost importance. While the distributor could fetch outputs that they produce via their own LLM models, I believe what is more likely is that an Aggregator will fetch outputs from a commoditized network of supplier LLMs that integrate onto its platform in order to access its direct relationship users. This process will not only solve for the optimal (X, Y) coordinate, but also will solve for which supplier’s AI model optimally solves for (X, Y) given the desired outcome of the ‘call’.
When We No Longer Sing
The digitization of the production layer promises so much efficiency.
If we can write a story by simply prompting an AI Model a premise and a stylistic example, or if we can write code by prompting an AI model a feature set and design parameters, we can now do so much more with so much less time and energy. This will unlock human productivity, reduce prices and flood the market with an abundance of supply to choose from.
But like most things, there are costs and trade offs. I wrote in Stories as Currency about how the digitization of the distribution layer in media destabilized the exchange of ideas and values between diverse groups of individuals and concepts.
Distributing stories via traditional linear rails meant that geography was the main predicator as to which stories an individual would interact with. Grandparents and teenagers alike were shown the same movies at the theater, the same news in the paper, the same books in the library, and the same shows on cable television. This caused natural collision between individuals (and generations) with varying interests and social circles, and in the process led to the serendipity of unlike-minded individuals finding common ground.
[…] But in the world of algorithmic distribution, the stories you now consume are preordained. You are recommended stories based on what people just like you have watched, or see content that your friends and favorite creators have published. This creates an echo chamber where likeminded people see the same weak stories, which heightens the divide between different social and cultural groups and prevents any form of serendipitous collision, while also stymying cross-generational and cross-ideological interaction.
[…] In this new world, there no longer exists a standard unit of cultural account. The stores of information, the stories that define us, are no longer fungible — a movie no longer holds the same relative value as it once did, as the mediums that distribute younger generations’ stories (Tik Tok, YouTube, Roblox) are foreign to older generations.
I think the rise in loneliness, decline in traditional sources of community, decrease in religious participation, polarized political climate, mental health crisis and more can reasonably be argued to having corollaries to this phenomena.
Which begs the question, what are the second and third order effects that we will reckon with AI?
For one, I worry that we will lose the valuable compounding benefits that result from doing the grunt work and substantiating an idea — whether over multiple drafts or lines of code — over time. There is real and important intrinsic value that comes from those acts of production that ultimately benefit the producer as it powers the process of “becoming”.
As I wrote about Frank Ocean’s path to stardom in Bored Ape Yacht Club is, in fact, Boring:
Frank, then known as Chris, started out like all IP creators by producing common property outputs that were commercially worthless to everyone else besides himself. They were worthless because they didn’t create entertainment utility for fans, but were valuable to Chris because he enjoyed the process of improving.
Through the act of producing these common property outputs, Frank accumulated skillset and a proof of work that compounded. In Frank’s own words to himself he reveals as much:
“You’re actually gonna write and record hundreds of songs. they won’t all be good and most ppl won’t think you’re talented at first, but you’re going to master your gifts.”
Indeed, only the IP creator has the insight to get to mastery as each song and production run builds atop of the other, and only the IP creator has the incentive to go through the reflexive process of becoming “a lot stronger and wiser.”
A digitized production layer greatly lowers the cost — in terms of time, skillset, and capital — needed to produce a product that can be distributed to consumers.
But what happens in a world where the vast majority of producers in a market don’t know how to produce a given output without the help of AI?
Consider an artist who knows how to effectively use autotune to make them sound like a good singer. AI models and software can deterministically compute and program bits to instruct stereos to create waves that create the perfect voice and pitch, or any voice and pitch for that matter.
While that output unlocks distribution and can find pockets of demand, it doesn’t necessarily mean that the producer actually knows how to sing well.
Will Frank and future artists go through the lonely and risky process of writing and recording hundreds of songs that (1) people don’t like but (2) are essential for him to develop a skillset that eventually makes him successful and “master his gifts” if they have a near-zero cost alternative in the form of producing quality music instantly by engineering X in the style of Y?
Probably not.
The valuable skillset that will define artists in this paradigm will not necessarily be investing in training their voice to sing better and becoming, as Frank Ocean says “a lot stronger and wiser”, but investing in the digital skillset of knowing how to interface with a model to engineer a sound in a certain style that maximizes product-market fit.
But knowing how to sing is such an innately human behavior, as is finding voices and songs beautiful and moving. A baby coos and calms to its mother’s lullaby. We get goosebumps from Ella Fitzgerald and are brought to tears by Stevie Wonder and rave to Queen and frenzy to Taylor Swift. It is also one of those human activities packed with tremendous socio- and anthropological implications: singing is a form of worship, a form of allegiance, a form of warfare, a form of romance and courting, a form of status and prestige. And, almost instinctively, it is also something that causes profound vulnerability. Singing in front of people makes us feel naked, like we are exposing our very souls and essence for judgement.
But what happens when, as more of our life is experienced, consumed, distributed and now produced digitally, we can compute away that vulnerability? If we no longer practice training our vocals but instead can wash it away with computational rendering that gets to the desired output instantly, how will we let ourselves be vulnerable?
Will people still find singing beautiful? Is a ‘good voice’ an objective fact, an absolute truth of nature, or a reflection of a society’s rhythm at a given moment of time?
I personally believe that there is such thing as absolute beauty, and singing is one such manifestation of it. A beautiful voice gives you chills. It is a biological reaction that is completely out of our control.
Watch this video pay attention to how it makes you feel.
Chills, right?
But what happens when investing in the process of becoming skilled and differentiated through hours of work is too high an opportunity cost? Why learn to code when you can learn to prompt an AI to code? Why learn to become super talented at singing when you can render your voice however you want with an AI program? Why learn how to write and develop characters and stories when you can prompt an AI program and iterate until you find the best version?
Putting the moral and philosophical considerations aside, I think, in a simple example, we may lead to a production layer that bifurcates along the lines of original producers— those who’s style are being mimicked and have reached product-market fit with an audience— and distributed producers — those who leverage existing IP to instruct the engineering or design of new products.
We will have many talented artist, engineers, etc. who, instead of investing in their own IP, will leverage existing IP and AGI tooling to produce outputs in the style of producers that already have sizeable distribution.
Take the recent example of Ghostwriter, who made a Drake song featuring The Weeknd using AI to deepfake his voice to sound like the two artists.
Although Universal Music Group has fought to take down the song from streaming platforms, a new pirated version pops up each day. This was made possible thanks to AGI, and for producers like Ghostwriter, it makes sense for them to leverage Drake and the weekends IP to (1) coop their massive fan base for distribution and (2) use factors of production — their voices, flows and style of making music — that have already proven product-market-fit.
More importantly, the song is awesome! I love it. And as a Drake fan I am glad he made it. It’s probably one of my favorite Drake songs released in the past couple of years.
And I’m not alone. After it was self-released on April 4, 2023 on various streaming platforms like Apple Music, Spotify and YouTube, Google search trends for “Drake” reached a 90-day high. Interestingly, this anticipated search trends for “Drake AI” or “heart on my sleeve”, suggesting that when people first heard the song they were unaware of that it was made by AI.
As of writing the, Google search for “heart on my sleeve drake” has 19,900,000 results. The song has garnered millions of views across a variety of platforms. Yet the Google search for “heart on my sleeve ghostwriter” has about 1,490,000 results.
Clearly, this song benefitted Drake more than it did the actual artist (Ghostwriter). For Drake and The Weeknd, this new model proves extremely valuable. It has the potential to unlock their main growth and monetization obstacle — their production capacity (which is constraint by their time and energy) — which down the road, can theoretically be monetized much like a franchise model to allow a network of distributed producers to leverage their IP and create new content that re-engages existing fans and acquires new ones. This also drives listens and engagement back to their original songs and IP, which is free customer acquisition and monetization on their behalf.
As I wrote in What Makes IP Valuable?
These producers leverage the IP’s existing demand to bypass the cold-start problem. Once the IP has reached a critical mass of fandom, the IP owner can decentralize production to other producers who create distributed products that only exist because of this fandom.
It’s Michael Jordan’s basketball career (his intrinsic product) that originally built his fanbase, and this fanbase is what attracted other producers to create products that Michael Jordan could not given his skillset: Space Jam, Jordan Brand, The Last Dance, etc.
Michael Jordan, Harry Potter, Marvel or any other usher of IP no longer has to produce intrinsic products (playing basketball in the NBA) to provide utility for fans, but can become a platform that enables others to provide them with income (and new revenue streams) to leverage their IP, gain access to their fans and build new distributed products that grants the distributed producers with indirect ownership exposure to the IP (thereby decentralizing ownership).
These distributed producers have different skillsets that allow them to produce new form factors of supply that (1) create new forms of engagement and entertainment utility and (2) unlocks new channels of distribution which:
Acquires new fans;
increases the overall utility of existing fans as they have new forms of supply to consume, and;
Re-engages dormant fans.
But what happens in a world where Ghostwriter no longer has to write the lyrics and makes the beat for the song, but can simply prompt a model to “create a Drake song featuring the Weekend titled “Heart on my Sleeve”” and then shuffles through variations informing what he likes and doesn’t until he finds the perfect version?
We will have to see.
Closing Thoughts: Rage Against the Machine
In the days of AGI, digital beings will strive to color outside the statistical lines of probabilistic models.
The rage against the machine will be measured in degrees of nuance and contradictions. The chicest thing one can be is impossible to define via computation alone.
Status and economic value will be conferred upon Engineering and Design elements that cannot be measured accurately through probabilistic models. You know what will be cooler and more prestigious than developing such a unique and popular style as Wes Anderson?
Being so entropic that no AGI model can compute their own rendition.
The Enlightenment was humanity’s reaction to a deterministic worldview that relied upon the code of religion to compute causality and everything from morality, science, social classes and political power. Why was someone born a serf? It was God’s will. Why did leaves fall in Autumn? Because God made it so. What is fire? A gift from God.
Any rejection of these “facts” was heresy. But with the Enlightenment, humanity employed a new model to explain the forces of the world. Reason and logic liberated humanity from the mechanical determinism of the Dark and Middle Ages and enabled the ‘sovereign individual’.
When Nietzsche declared “God is Dead”, he of course did not mean literally. Instead, he suggested that what God represented — an orchestration of society where some Deus Ex Machina controlled the natural law of humanity — had ended.
Indeed, we discovered that we could explain things on our own through scientific examination. Leaves fell in autumn because the earth's spin with respect to its orbital planed created seasons; fire is not a gift from God but a visible effect of the process of combustion.
These models that used reason, logic and the pattern-matching of knowledge to probabilistically arrive to conclusions became our new source of humanity. The more information we gathered, the more we proved God, the Machine, was dead.
But with AGI, Deus Ex Machina returns. The forces that allowed us to escape the clutches of the Machine previously — reason, logic and knowledge — will be conquered by a new Machine. How will we respond when we can no longer prove our sovereign individuality?
In many technocratic circles, we've begun to see the rise of secularized religion under the guise of of 'spirituality' via rituals like astronomy, Burning Man, psychedelics, shaman retreats, sound baths, meditation and more. I believe that this is a reaction from a class of individuals who have most closely observed computers’ accelerating ability to reason and logic, and because of which, implicitly have lost faith in the Enlightenment ideology that has granted Man unique sovereignty and meaning.
What happens when this becomes more mainstream?
While religion no longer provides many civilizations with literal explanations, it still underpins much of our morality, ethics and worldview. Indeed, all religions tend to have the same shape, and clearly this shape of metaphors and stories has value. Why else would they have survived the natural selection of ideas and lasted thousands of years if not for having clearly served some purpose to humanity?
I think a massive unknown unknown in the coming decades will be PostmodernTheology. If computers can employ a probabilistic worldview and use the tools of the Enlightenment — reason and logic — in the pursuit of knowledge and effectively achieve the Enlightenment definition of what it means to be human, we will most likely have to find a new definition. That new definition will require elements that cannot be measured through probabilistic models, but instead will rely on metaphor, tradition, and rituals.