Editor’s note: NVIDIA has a free online course called “AI for All: From Basics to Gen AI Practice”. The team at RCR Wireless News has enrolled and, as we complete the units, are posting write-ups of the sessions along with a bit of additional context from our ongoing coverage of AI infrastructure. Think of this as us trying to do my job better and maybe, along the way, helping you with your own professional development—that’s the hope at least.
Generative AI (gen AI) is all about using existing data of various modalities to create, or generate, new data of various modalities—text, natural language, images and videos. This is done by building foundational models capable of synthesizing patterns and learning from datasets, both large and general and domain-specific and narrow.
Foundational models, typically large language models (LLMs), serve as the basis for gen AI systems by providing a framework for understanding complex language structures, semantics and contextual nuances. These neural networks are trained on massive, unlabeled datasets using unsupervised learning. The idea is to sidestep the costly, time-consuming process of labeling data prior to training a model.
The transformer architecture
LLMs use the transformer architecture, a specialized neural network architecture, to grasp patterns and relations in textual data. The development of the transformer architecture was a breakthrough in the arc of gen AI; it was initially described in a 2017 paper, “Attention Is All You Need,” written by a team of Google AI researchers.
Prior to transformers, models relied on recurrent neural networks (RNNs) or convolutional neural networks (CNN), whereas the transformer architecture is based on attention mechanisms, specifically self-attention mechanisms that let the model weigh the importance of different words in a sentence regardless of where they fall in a sequence; essentially the model can capture long-range dependencies and relationships in the data, and enables more efficient parallel processing during training and inference. In a nutshell, the transformer architecture pushed AI from sequential processing to parallel processing.
Tokens and GPU-accelerated parallel processing
In the context of an LLM, tokens are the smallest units of meaning in a language, including words, characters, subwords or other symbols representing linguistic elements. An input prompt is broken down into tokens that are fed into the model. The model, using the input tokens, then predicts the next word in the sequence until the user stops providing input tokens or the model arrives at a predetermined stopping point.
Doing all of that requires computational-intensive complex mathematics which GPUs are uniquely suited for given the chips’ ability for massively parallel processing, matrix operations, memory processing and memory bandwidth, as compared to CPUs. Early cuts of ChatGPT, for instance, were trained on 10,000 NVIDIA GPUs for several weeks.
Gen AI vs. traditional AI for enterprises, and how to get started
Traditional AI is all about understanding historical data and making accurate predictions using classification, pattern recognition, text-to-speech and optical character recognition. Gen AI excels at creating new data based on patterns and trends learned from training data; it’s marked by unsupervised learning, chatbots, summarization tools and copilots. If you think of it as a Venn diagram, in the overlap between traditional and gen AI, sits sentiment analysis and language translation.
That’s not to say that the two are mutually exclusive. In fact, they’re complementary and traditional AI is well-suited for a range of tasks and deliver desired outcomes with less cost, complexity, compute overhead and cost. For enterprises looking to leverage gen AI, the considerations are around adapting a general LLM for business-particular use cases and applications. That adaptation process ranges from moderate customization via fine-tuning a pre-trained model or even building a custom model using your proprietary data. In either case, customization of an LLM comes with additional costs related to both computational resources and necessary workforce expertise.
As far as getting started with gen AI, here’s a relatively generalized process:
- Identify the business opportunity—target use cases that have meaningful business impact and can be customized with unique data.
- Building out AI resources—identify existing compute resources and internal human resources while also identifying necessary partners.
- Analyze data for training/customization—acquire, refine and safeguard data to build either data-intensive foundational models or customize existing models.
- Invest in accelerated infrastructure—access infrastructure, architecture and operating model, while gaming out costs, including energy costs.
- Develop a plan for responsible AI—leverage tools and best practices to ensure responsible AI principles are adopted across the company.
The workflow to get to a production-ready application is:
- Data acquisition—collect and prepare data to train or fine-tune the LLM, and ensure the data is diverse and representative of the target domain.
- Data curation—clean, filter and organize the acquired data.
- Pre-training—expose the model to a vast corpus of text data to let it learn language patterns, relationships and representations.
- Customization—adapt the generic model to specific requirements of a task or domain to improve accuracy, efficiency and effectiveness.
- Model evaluation—measure how well the model has learned from training data and how accurately it can make predictions on new data.
- Inference—process input data to produce output data.
- Keep in mind guardrails—this is crucial for risk mitigation and ensuring the ethical, responsible and safe use of the AI model and attendant applications.
For more of the AI 101 series, check out the following: