🇪🇸 ES

Newsletter

From 10 Terabytes to Zero Parameter: The LLM 2.0 Revolution

In this article, I discuss LLM 1.0 (OpenAI, Perplexity, Gemini, Mistral, Claude, Llama, and the likes), the story behind LLM 2.0, why it is becoming the new standard architecture, and how it delivers better value at a much lower cost, especially for enterprise customers.

1. A bit of history: LLM 1.0

LLMs have their origins in tasks such as search, translation, auto-correct, next token prediction, keyword associations and suggestion, as well as guessing missing tokens or text auto-filling. Auto-cataloging, auto-tagging, auto-indexing, text structuring, text clustering, and taxonomy generation also have a long history but are not usually perceived as LLM technology, except indirectly as knowledge graphs and contextual windows.

Image retrieval and processing, video and sound engineering are now part of the mix, leveraging metadata and computer vision, and referred to as multimodal. Solving tasks such as mathematical problems, filling in forms, or making predictions are being integrated via agents frequently relying on external API calls. For instance, you can call the Wolfram API for math: it has been around for over 20 years to automatically solve advanced problems with detailed step-by-step explanations.

However, LLMs’ core engine is still transformers and deep neural networks, trained on predicting next tokens, a task barely related to what modern LLMs are used for these days. After years spent in increasing the size of these models, culminating with multi-trillion parameters, there is a realization that “smaller is better”. The trend is towards removing garbage via distillation, using smaller, specialized LLMs to deliver better results, as well as using better input sources.

Numerous articles now discuss how the current technology is hitting a wall, with clients complaining about lack of ROI due to costly training, heavy use of GPU, security, interpretability (Blackbox systems), and hallucinations – a liability for enterprise customers. A key issue is charging clients based on token usage, favoring multi-billion token databases with atomic tokens over smaller token lists with long contextual multi-tokens, as the former commands more revenue for the vendors, at the expense of ROI and quality for the client.

2. The LLM 2.0 revolution

It has been brewing for a long time. Now it is becoming mainstream and replacing LLM 1.0, for its ability to deliver better ROI to enterprise customers, at a much lower cost. Much of the past resistance towards its adoption lied in one question: how can you possibly do better with no training, no GPU, and zero parameter?  It is as if everyone believed that multi-billion parameter models are mandatory, due to a long tradition.

However, this machinery is used to train models on tasks irrelevant to the purpose, relying on self-reinforcing evaluation metrics that fail to capture desirable qualities such as depth, conciseness or exhaustivity. Not that standard LLMs are bad: I use OpenAI and Perplexity a lot for code generation, writing my investor deck, and even to answer advanced number theory questions. But their strength comes from all the sub-systems they rely upon, not from the central deep neural network.  Remove or simplify that part, then you get a product far easier to maintain and upgrade, costing far less in development, and if done right, delivering more accurate results without hallucination, without prompt engineering and without the need to double-check the answers: many times, OpenAI errors are quite subtle and can be overlooked.

Good LLM 1.0 still saves a lot of time but requires significant vigilance. There is plenty of room for improvement, but more parameters and Blackbox DNNs have shown their limitations.

I started to work on LLM 2.0 more than two years ago. It is described in detail in my recent articles:

See also my two books on the topic:

It’s open source, with large Git repository here. See also a web API featuring the corpus of a Fortune 100 company where it was first tested, here. Note that the UI is far more than a prompt box, allowing you to fine-tune intuitive front-end parameters in real time.

In the upcoming version (Nvidia), you will get a relevancy score attached to each entity in the results, to help you judge the quality of the answer. Embeddings will help you dig deeper by suggesting related prompts. It will also allow you to choose agents, sub-LLMs or top categories, negative keywords, return recent results only, and more.

3. An interesting analogy

Prior to LLMs, I worked for some time on tabular data synthetization, using GANs (generative adversarial networks). While GANs work well in computer vision problems, their performance is a hit-and-miss for synthesizing data. It requires considerable and complex fine-tuning depending on the real data, significant standardization, regularization, feature engineering, pre- and post-processing, and multiple transforms and inverse transforms to perform decently on any data set, especially those with multiple tables, time stamps, multi-dimensional categorical data, or small datasets. In the end, what made it work is not GAN, but all the workarounds built on top of it.

GANs are unable to sample outside the observation range, a problem I solved in this article. The evaluation metrics used by vendors are poor, unable to capture high-dimensional patterns, generating false positives and false negatives, a problem I solved in this article. See also my Python library, here, and web API, here. In addition, vendors were producing non-replicable results: running GAN twice on the same training set produced different results. I actually fixed this, designing replicable GANs, and of course everything I developed outside GAN also led to replicability.

In the end, I invented NoGAN, a technology that works much faster and much better than synthesizers that rely on deep neural networks. It is also discussed in my book published by Elsevier, available here. The story is identical to LLM 2.0, moving away from DNNs to a far more efficient architecture with no GPU, no parameter, no training, fast and easy to customize with explainable AI.

Interestingly, the first version of NoGAN relied on hidden decision trees, a hybrid technique sharing similarities with XGBoost, and that I created for scoring unstructured text data as far back as 2008. It has its own patents and resulted in my first VC-funded startup, focused on click fraud detection and later on, for keyword monetization, based on the same nested hash database structure that I use today in LLM 2.0. The precursor to this is my work at Visa around 2002, to detect credit card fraud in real time.

4. How LLM 2.0 came to life

Besides the historical context discussed in section 3, LLM 2.0 (the xLLM system) really started about two years ago. It was motivated by my experience in analyzing billions of search queries to create a better taxonomy for digital catalogs while working at InfoSpace, my experience writing professional crawlers to parse millions of websites, and my inability to find the references I was looking for when writing research papers. Neither Google and Stack Exchange search boxes, nor GPT, were able to retrieve the documents I was looking for. I knew they were somewhere on Stack Exchange but could not find them anymore. The query that literally triggered my quest for better tools and jump-start LLM 2.0 was this: what is the variance of the range for Gaussian distributions? Posted here in November 2023, and here.

Year 2023

From there, I crawled the entire Wolfram corpus (15k webpages, 5000+ categories) and designed a tool that does much better than Google, specialized search tools, and GPT, to retrieve what I was looking for. All other tools were aimed mostly at the layman, returning useless material for professional researchers like me. I compare the first version of xLLM with OpenAI, here. The code is on GitHub, here.

Year 2024

I developed different versions of xLLM: for clustering and predictive analytics (here), for taxonomy generation (here), DNA sequence synthetization (here) which is the only version where token prediction matters, and finally the first version of Enterprise xLLM for a Fortune 100 company.

It became clear over time that all professional corpuses are well structured, and that exploiting the structure recovered during the crawl would be a tremendous advantage to design a better architecture. Along the way, I continued to improve models based on deep neural networks, for instance with an adaptive loss function converging to the evaluation metric (here).

Year 2025

Everyone talks about small LLMs as the new panacea. It does not need to be small but instead, broken down into specialized sub-LLMs governed by an LLM router, for increased performance. At this moment, I am working on multi-index, deep contextual and hierarchical chunking, using Nvidia financial reports (PDFs) as a case study, with PDF retrieval capabilities not found anywhere else, agents assigned post-crawling, multimodal, and a unique scoring engine that I call the new “PageRank” for LLMs. See section 2 in this article for details. The most recent documentation is posted here.

I also manage the largest deep tech LLM/GenAI network on LinkedIn, with 190k followers and 200k subscribers to my newsletter, attracting advertising clients such as Nvidia and SingleStore.

5. How can you try LLM 2.0

If you want to learn more about whether and how we can help automate your business processes, with AI designed from the ground up to deliver ROI at scale, created by tech and enterprise visionaries for enterprise people and their customers, feel free to contact us.

With team members in India, Brazil, and Seattle, we serve clients around the world. For investor or press inquiries, contact Danilo and/or Vincent.

Vincent Granville

Vincent Granville is a pioneering GenAI scientist, co-founder at BondingAI.io, the LLM 2.0 platform for hallucination-free, secure, in-house, lightning-fast Enterprise AI at scale with zero weight and no GPU. He is also author (Elsevier, Wiley), publisher, and successful entrepreneur with multi-million-dollar exit. Vincent’s past corporate experience includes Visa, Wells Fargo, eBay, NBC, Microsoft, and CNET. He completed a post-doc in computational statistics at University of Cambridge.

Ebook

Piercing the Deepest Mathematical Mystery

Any solution to the mythical problem in question has remained elusive for centuries.

Take your company into the new era of Artificial Intelligence

Recent Articles

From 10 Terabytes to Zero Parameter: The LLM 2.0 Revolution

In this article, I discuss LLM 1.0 (OpenAI, Perplexity, Gemini, Mistral, Claude, Llama, and the likes), the story behind LLM 2.0, why it is becoming the new standard

Why BondingAI: Transforming business by bonding people and AI with simplicity

Scaling GPUs Alone Won’t Solve AI Challenges—The Future Demands a New Gen of LLMs, Targeted Solutions, specially for Enterprises. Why Bonding? In the ever-evolving landscape of Artificial Intelligence,

xLLM: New Generation of Large Language Models for Enterprise

I get many questions about the radically different LLM technology that I started to develop 2 years ago. Initially designed to retrieve information that I could no longer

Scaling Business Value with GenAI

Email

© 2024 Copyright - BondingAI.

Designed by LKTCV.WORK