How to be agile with AI: a case for few-shot learning models in the modern enterprise
But are we ready for them? Maybe not.
A closer look at the data reveals that there is an elephant in the room, overshadowed by all the ChatGPT buzz: the cost of running such monolithic natural language processing (NLP) models.
ChatGPT is trained on a vast corpus of knowledge. It is hugely data-hungry, resource intensive and costly to operate.
Microsoft's injection of billions of dollars into this “new and shiny technology” reveals much about the potential. But along with the benefits, there are costs to enterprises wanting to leverage zero-shot models.
Maximilian Michel, Chief Product Officer at Automation Hero is here to take us on an deeper dive, than most of the current articles around AI and NLP even try. We will investigate the pros and cons of classical many-shot versus zero-shot learning to analyze the "sweet spot" for truly agile, enterprise-ready performance.
Addressing the ChatGPT elephant in the room…
It is estimated that OpenAI spends a minimum of three million to thirty million dollars per month just to operate on AWS. Of course, we can’t know the exact operating costs. But one thing is certain. There is a reason Microsoft has invested billions in the company, and it's not just about making Bing a more competitive search engine.
The hard truth is OpenAI needs thousands of Azure servers to scale up its operations.
First, a little background: What’s a “shot”?
Training and interacting with an AI model is like hiring a new employee. When supervising an employee to learn a specific task, the first training step usually involves "leading by example" — or demonstrating the job you want the employee to perform.
Of course, you can supplement an employee's training with additional instructions that reinforce the examples shown. But, if the employee already brings some general knowledge to the table, you could skip much of the demonstrative approach and only provide the instructions you want the employee to follow. Either way, the new hire will eventually understand the need to perform the task without supervision.
The approach you choose (demonstrative vs. instructional) will depend on the task's complexity and the level of expertise the new hire already brings to the job.
For example, suppose your new hire is a chemical engineer with a B.S. in chemistry plus ten years of relevant experience. This new hire brings a great deal of general on-the-job expertise plus a deep level of specialized knowledge. In this case, you would waste time and resources retraining this employee to be a competent chemist. A zero-shot learning AI model is exactly like this — but it would be redundant and more than you need for this business scenario.
Instead, the more agile approach would be to train your chemist to follow quality assurance procedures that ensure the safety of products that your company manufactures. Training your chemical engineer to be a chemist would be wasteful when the optimal approach is to narrow the scope of training by only showing the chemist examples of your company's institutional knowledge.
Approaches to AI and human interaction
The same example would apply when training and interacting with an AI model.
By demonstrating how to do the task, you might need to collect some data before showing the AI how to perform it. But, it's easy for the AI to understand what it needs to do once it has been trained on enough sample data.
You can demonstrate a task by providing a set of training examples that you want the machine to perform. Or, if the AI is already pre-trained with a large amount of general knowledge relevant to your task, then you could provide the instructions through "prompt engineering" the AI.
Similar to training a human, either approach (demonstrative or instructional) is a valid way of interacting with AI models. However, the method you choose to build and train the model significantly impacts its performance and accuracy.
For example, when teaching a human to fish, providing the instructions might prove challenging from a training perspective compared to simply demonstrating by example. One is like casting a wide net to catch several small fish at once — while the other is akin to having a harpoon for spear hunting larger fish.
The question is: “Which approach is more effective for your particular goal?”
Classical AI training versus zero-shot learning
Training an AI model requires many samples (or "shots") of data that inform the AI how to perform a particular task. To produce an accurate model, teams of data scientists traditionally must manually collect and label the data and then tune the model through various trials and errors.
That traditional approach will exclude many organizations from accessing the next stage of the industrial revolution. But that "many shot" learning approach will get you the best performance since it is a custom solution built from the ground up for a particular enterprise problem.
Traditionally, enterprises have had no choice but to rely on data scientists who collect data, label it, and train AI models with many shots of data. But this is slow and tedious and causes a bottleneck in most production environments.
Therefore, zero-shot learning offers a productivity shift for enterprises wanting to quickly prototype and deploy working AI models to production.
Zero-shot learning is still evolving, but it's the only approach that allows users to produce impressively accurate results without additional training or data upfront from the user. This will enable users to solve specific problems faster while addressing the bottleneck that has blocked most enterprises from deploying their own AI models and tools.
Zero-shot learning is proving especially useful in scenarios with limited amounts of labeled data — or when the labels are unknown in advance and need to be discovered on the fly.
However, it can be challenging to interact with zero-shot learning. This is especially true in cases where the user needs to learn how much the AI already knows about the task or how it will react to instructions. Furthermore, zero-shot models, such as a large language model (LLM) pre-trained on extensive language-related data and logic, are resource-intensive and expensive.
Which approach is more agile?
Neither.
Many-shot learning requires significant time and resources to produce a working AI model, which most organizations don't have — particularly when ROI is part of the viability analysis.
For example, organizations must consider the upfront costs of time to collect and label datasets. Additionally, the physical resources in equipment to execute and run many-shot models require significant investments in CPU power and servers. Therefore, upfront resources and execution costs are substantial ROI considerations organizations must address with this approach.
On the other end of the spectrum, zero-shot lowers the barrier of entry and provides the most efficient path to prototyped solutions. Moreover, since zero-shot learning models already come pre-trained, they address the data bottleneck by enabling the fastest deployment to production.
This is exciting. However, the “sweet spot” for truly agile performance in enterprise production environments is in the middle of this spectrum. Few-shot learning is a sophisticated approach to AI that provides accurate results from only a few examples of training data.
Like zero-shot models, few-shot learning is another new class of AI models designed to learn from very small data sets. However, from all our experience at Automation Hero, we have found that "few-shot learning" is the sweet spot for enterprises looking to achieve that truly agile performance.
If you can't precisely demonstrate what you need from an LLM model, for example, then you will have to prompt-engineer a zero-shot model, and the results will look very different depending on the instructions you give.
On the other hand, the approach that will provide the best performance for a specific business problem will allow you to share a few samples of data demonstrating your end goal without having to locate volumes of data to train your AI entirely from the ground up.
Few-shot learning offers the best approach to achieving this level of flexibility with above-human accuracy.
Intelligent document processing (IDP) is a great use case for few-shot learning. IDP relies on your specific documents as the input and not on an unlimited universe of irrelevant data. Older IDP offerings at one time made similar claims about "only needing 50 samples to customize the model," but these were mostly marketing hype. Like zero-shot models, few-shot learning is a revolutionary new approach for fine-tuning a many-shot model — not re-wrapping old claims into a new package.
“Few-shot” means our customers don't have to spend time training document models. They can get business value from our platform faster using just a handful of sample documents. All you need is a few specific documents your organization handles, and you can start processing them immediately.
We have learned that few-shot learning in IDP is the best approach to achieving the highest accuracy while saving time and money. After helping organizations worldwide analyze millions of unstructured documents in multiple languages, we have learned a lot about unstructured data and how to extract critical data points with above-human accuracy.
For modern businesses looking for the best return on their investment, there is no better approach to IDP than few-shot learning for agility, performance, and accuracy. It’s the “sweet spot” and the best way to get the most value from AI.
Concluding thoughts
Until ChatGPT, the field of AI had never seen a good-performing zero-shot model with so much general language understanding. Now that we have these examples, we can go through the whole spectrum of AI, which is super exciting. Zero-shot is one approach but will only fit some business tasks well.
To be agile, I always recommend business leaders ask their teams these two questions:
- "How do I provide as little data as possible to get a higher performance model for my tasks?"
- "What kind of production environment do I need so I can connect my model to my existing systems and enable my teams to switch between different types and versions of models as our workflow evolves?
While it's exciting to see access to more deployable AI solutions opening up on such a large scale, there are limits blocking truly agile performance. You could start with a zero-shot or a few-shot approach and quickly develop a working solution. Start by casting a wide net and deploy your zero-shot model for six months.
Once you know what works well and what doesn't, build the harpoon for your specific business application. Utilize the power of a many-shot learning model, scale it up, and never look back!
Über den Autor
As Chief Product Officer at Automation Hero, Max is focused on scaling the product organization by further delivering world-class innovations that address and delight customer needs while aligning with the company's strategic goals. He also leads research and development in AI, machine learning and data science.
Previously, Maximilian Michel was a data scientist at Datameer where he built smart AI tools for big data analytics and helped the company leverage the power of machine learning to optimize sales operations. He is a Bauhaus alumnus and holds a master's degree in computer science and media. He spent four years publishing research on natural language processing and information retrieval.