The (De)Centralized Future of AI
Published: March 21, 2025
NVidia's GPU Dynasty
We all know NVidia is currently the king of the GPU race - with datacenters gobbling up their chips, predominantly for AI use. China's recent release of DeepSeek-R1 made waves for many reasons, but I would like to focus on a less-discussed aspect of its popularity: Decentralization. The public has been led to believe that in order to have more capable, more powerful AI models, the fundamental prerequisite (and current showstopper) is more compute. That we need bigger chips, larger datacenters, better power delivery and cooling systems, and that nothing else will be able to compare.
I want to note that the organizations and individuals largely responsible for this conclusion, all have a heavy financial incentive to remain gatekeepers of this technology. They have been the ones responsible for its creation up to this point, and thus have been able to freely dictate the direction the technology has moved. The drift toward larger, single, monolithic models multiple gigabytes or even petabytes in size is by no means a coincidence. Early on there was even much discussion within the AI community over how to keep the tech from being "misused", and this seems to be the solution we're presented with. Make the models require a datacenter for inference, so that any use can be monitored, mediated, and most importantly, profited from.
Significant technological advancements are oddly enough, historically difficult to profit off of specifically in the realm of software, and there are many examples of this. An overwhelming majority of the most significant and influential software in any given category is either free or entirely open-source: Blender, Unreal Engine, Stable Diffusion, Linux, Audacity, and the list goes on.
I want to note that the organizations and individuals largely responsible for this conclusion, all have a heavy financial incentive to remain gatekeepers of this technology. They have been the ones responsible for its creation up to this point, and thus have been able to freely dictate the direction the technology has moved. The drift toward larger, single, monolithic models multiple gigabytes or even petabytes in size is by no means a coincidence. Early on there was even much discussion within the AI community over how to keep the tech from being "misused", and this seems to be the solution we're presented with. Make the models require a datacenter for inference, so that any use can be monitored, mediated, and most importantly, profited from.
Significant technological advancements are oddly enough, historically difficult to profit off of specifically in the realm of software, and there are many examples of this. An overwhelming majority of the most significant and influential software in any given category is either free or entirely open-source: Blender, Unreal Engine, Stable Diffusion, Linux, Audacity, and the list goes on.
AI Still Cant Generate Revenue
The question noone seems to have answered yet is "profitability" versus "utility" - How much money can you make off AI? It has proven its utility in multiple areas, like image and text generation, but for the most part we have yet to actually find useful applications for the technology that can drive profit, which people are willing to pay for. ChatGPT is a cool party-trick for about 5 minutes until you start asking "What's the point?". Language models and chatbots have predominantly found use in software development, but for non-technical individuals, the most it can usually do for them is answer a few questions.
While certainly not "AI", the Linux operating system is another good example of software which is difficult to monetize due to its nature. Linux runs the servers and embedded devices of a significant majority of the world - yet it is freeware. Fundamentally, there is no way for the Linux Foundation to profit from its use other than donations, as the technology must be available for everyone to build from source. Their sacrifice for the benefit of the business sector is actually what earned Linux its place in the legacy of world Enterprise.
For a more recent example we can look to Stable Diffusion - the free and open-source text-to-image platform. Prior to its release, products like Midjourney and NightCafe were taking off on the market, and they would remain better at image generation for several months. During this time, the public trained new Stable Diffusion models, LoRAs, and experimented with the tech in ways that were not possible with paid-products of the time. After about six months, the image quality of new Stable Diffusion XL models were finally on-par with (or even surpassing) proprietary systems, and users of these services began to emmigrate en masse to using these local generation models. Today Stable Diffusion easily surpasses all proprietary products in terms of number of users, image quality, prompt adherence, and ability to be fine-tuned - not to mention that everything is local, and your data is never shared with anyone.
While certainly not "AI", the Linux operating system is another good example of software which is difficult to monetize due to its nature. Linux runs the servers and embedded devices of a significant majority of the world - yet it is freeware. Fundamentally, there is no way for the Linux Foundation to profit from its use other than donations, as the technology must be available for everyone to build from source. Their sacrifice for the benefit of the business sector is actually what earned Linux its place in the legacy of world Enterprise.
For a more recent example we can look to Stable Diffusion - the free and open-source text-to-image platform. Prior to its release, products like Midjourney and NightCafe were taking off on the market, and they would remain better at image generation for several months. During this time, the public trained new Stable Diffusion models, LoRAs, and experimented with the tech in ways that were not possible with paid-products of the time. After about six months, the image quality of new Stable Diffusion XL models were finally on-par with (or even surpassing) proprietary systems, and users of these services began to emmigrate en masse to using these local generation models. Today Stable Diffusion easily surpasses all proprietary products in terms of number of users, image quality, prompt adherence, and ability to be fine-tuned - not to mention that everything is local, and your data is never shared with anyone.
What Should Have Been The Obvious Approach
As a software developer that regularly experiments with Neural Networks in various ways, I can tell you that the current landscape of AI is, no pun intended, entirely artificial. There is absolutely no reason that these models like GPT4-o or Grok, need to be singular, massive, monolithic models requiring hundreds of gigabytes of VRAM, just to write me a freaking poem about cats. In fact, from the standpoint of data theory, one cannot devise a worse, less optimized way of accomplishing this goal.
Just think about what a current Language Model is actually doing, to understand why this is inefficient and unsustainable - I ask it to write a poem about cats. Some code loads the entire model's brain into VRAM at once. If the model is truly trained on petabytes of information, the amount of distilled knowledge pertaining directly to cats, and poems (so, english linguistic structure, which is a large corpus) that is directly relevent to the request could be as high as 2 or 3 GB in a 10 GB language model, but it's difficult to entertain it being much greater than that.
Using GPT4 as an example, it's safe to say it has a LOT of knowledge which isn't particularly relevent to cats, or writing poems, such as Agriculture, Physics, Mathematics, Biology, Chemistry, and Programming to simply name a few. One could argue this is actually the majority of the knowledge it has. If all this excess knowledge were stripped and only the immediately-relevent information loaded, how much VRAM would be needed then? By my estimates it would be fractions. A robust language model that fully understands the laws of english language, could easily be distilled into around 4 to 5 GB, if not far less. If this were paired with smaller, modular finetuning models which could contain weighting for additional information specific to a given concept (like cats) which are only loaded in as needed, it would create a system that would be much more accessible to the public, since inferences could be done on consumer GPUs.
Just think about what a current Language Model is actually doing, to understand why this is inefficient and unsustainable - I ask it to write a poem about cats. Some code loads the entire model's brain into VRAM at once. If the model is truly trained on petabytes of information, the amount of distilled knowledge pertaining directly to cats, and poems (so, english linguistic structure, which is a large corpus) that is directly relevent to the request could be as high as 2 or 3 GB in a 10 GB language model, but it's difficult to entertain it being much greater than that.
Using GPT4 as an example, it's safe to say it has a LOT of knowledge which isn't particularly relevent to cats, or writing poems, such as Agriculture, Physics, Mathematics, Biology, Chemistry, and Programming to simply name a few. One could argue this is actually the majority of the knowledge it has. If all this excess knowledge were stripped and only the immediately-relevent information loaded, how much VRAM would be needed then? By my estimates it would be fractions. A robust language model that fully understands the laws of english language, could easily be distilled into around 4 to 5 GB, if not far less. If this were paired with smaller, modular finetuning models which could contain weighting for additional information specific to a given concept (like cats) which are only loaded in as needed, it would create a system that would be much more accessible to the public, since inferences could be done on consumer GPUs.
This is for all intents and purposes, the most obvious way to do this, and it's utterly baffling to me that this is not what was tried first. One need only look to the success of projects like Folding At Home or the SETI program, to see that crowd-sourcing compute is always more effective than the self-limiting exercise of a datacenter. The popularity would easily explode in a matter of weeks. So, why hasn't this been done? Why are no high level companies researching this, and publishing their findings? Why is there almost no focus on "memory optimization" of these models, at all?
Again I believe this comes down to incentive structure, which in this case is heavily financial. If this tech becomes publicly accessible, it also simultaneously becomes nearly impossible to mediate access to it for profit - thus far the only major area in which AI is moving money. The AI itself isn't really creating profits for anyone en masse, but selling access to it is.
Again I believe this comes down to incentive structure, which in this case is heavily financial. If this tech becomes publicly accessible, it also simultaneously becomes nearly impossible to mediate access to it for profit - thus far the only major area in which AI is moving money. The AI itself isn't really creating profits for anyone en masse, but selling access to it is.
Along Comes DeepSeek
DeepSeek's Mixture-of-Experts approach is the first attempt the generative-AI space has seen to address this problem, but they've left plenty of room for improvement. R1 still loads the entire model into memory, even if you're only using one expert - the activated weights are simply trained and activated separately. However, R1 demonstrates that the concept of creating discrete fine-tunable language models at least holds weight. It showed us just for a moment, how paper-thin the excessive-compute-fortress NVidia and OpenAI have built truly is.
See, the critical difference between China and the West in this area is their relationship with Taiwan. TSMC being one of the major semiconductor fabs of the world, China has restricted access to these kinds of datacenter GPUs. It is much more critical for China to make use of every available bit of horsepower they can get their hands on, in order to keep up. And in China, that is mainly consumer-grade hardware. This alone gives pause to companies like OpenAI, who have only begun to realize what they are truly up against.
I believe China's DeepSeek-R1 was a hint - a glimpse into a reality these large corporations and organizations do not want to accept. Technology, of any kind, does not take off until you can get it into the hands of the people. Once you can do that, it permeates society due to the natural curiosity of the individual. The part they don't like, is that once you've done that, it's out of your hands - You can no longer guard the gate, taking a toll from all who pass.
The thing about DeepSeek that should concern you, isn't what it can do - it's that someone in China seems to already understand these things when no one in the West does. And they're prioritizing them. It can reason well though not exceptionally so. The longer context window is moderately useful at best. Think-traces are another neat-but-useless feature. ...But it actually runs on hardware you can buy.
See, the critical difference between China and the West in this area is their relationship with Taiwan. TSMC being one of the major semiconductor fabs of the world, China has restricted access to these kinds of datacenter GPUs. It is much more critical for China to make use of every available bit of horsepower they can get their hands on, in order to keep up. And in China, that is mainly consumer-grade hardware. This alone gives pause to companies like OpenAI, who have only begun to realize what they are truly up against.
I believe China's DeepSeek-R1 was a hint - a glimpse into a reality these large corporations and organizations do not want to accept. Technology, of any kind, does not take off until you can get it into the hands of the people. Once you can do that, it permeates society due to the natural curiosity of the individual. The part they don't like, is that once you've done that, it's out of your hands - You can no longer guard the gate, taking a toll from all who pass.
The thing about DeepSeek that should concern you, isn't what it can do - it's that someone in China seems to already understand these things when no one in the West does. And they're prioritizing them. It can reason well though not exceptionally so. The longer context window is moderately useful at best. Think-traces are another neat-but-useless feature. ...But it actually runs on hardware you can buy.
Shouldn't this stuff be free anyways?
Given the fact that any model trained on data from the internet is more-than-likely trained on copyrighted material, is there a moral and ethical obligation for all AI models trained on such data to be open-source? Due to the all-encompassing nature of AI-generation and the inability to maintain accurate citations and references to the original authors and creators used as training data, I personally think there is an overwhelmingly strong argument here. The public was responsible for the creation of the data that made the AI models possible in the first place, without a training corpus they would not exist. Therefore, the data the models are trained on belongs to the public and by extent, so should the models.
No one forced Microsoft/OpenAI, Google, or any of these other companies to make the kinds of investments they have been making into generative AI - likewise we did not give our permission for our data to be used as a training corpus. By effect, their work in creating models trained on public copyrighted works, constitutes in my mind the same sort of legal premise as a fan-site. But it is part of that legal precedent, that those works be free to the public. One can argue this is a moral and ethical obligation to such works.
For this moment, it looks as though it will remain incumbent upon the public to develop these technologies properly, and train them to a competitive level. That is, of course, until China steps in.
No one forced Microsoft/OpenAI, Google, or any of these other companies to make the kinds of investments they have been making into generative AI - likewise we did not give our permission for our data to be used as a training corpus. By effect, their work in creating models trained on public copyrighted works, constitutes in my mind the same sort of legal premise as a fan-site. But it is part of that legal precedent, that those works be free to the public. One can argue this is a moral and ethical obligation to such works.
For this moment, it looks as though it will remain incumbent upon the public to develop these technologies properly, and train them to a competitive level. That is, of course, until China steps in.