This blog post aims to offer a top-level overview of (generative) AI. Starting from its inception, touching on how it operates, and most importantly, how to get the most out of it. It was inspired by the excellent video “Generative AI in a Nutshell” by Henrik Kniberg. For those who appreciate an excellent overview of artificial intelligence in video format, I highly recommend this one. If reading is more your style, or you wish to deepen your understanding, this article is for you. Ideally, engaging with both will provide the most comprehensive insight.
Generative AI represents a major shift in the landscape of technology. In the past, the field of Information Technology (IT) was dominated by systems designed to execute precise instructions. Computers acted more like sophisticated calculators, methodically processing data without the capacity to create content.
However, the concept of machine learning and AI is not new. These technologies have been utilized in various forms, such as predictive models seen in YouTube’s recommendation algorithm or Amazon’s shopping suggestions. But true generative AI, which produces new content or ideas, has become widely recognized and accessible primarily through recent advancements.
A notable example of early generative AI is in speech recognition technologies, which I have worked with since 1998. Back then, witnessing a Pentium II processor laboriously process spoken words, taking seconds to associate the word “door” with “open,” highlighted the limitations of early AI systems. This experience underscores the significant strides made in generative AI, marking its evolution from basic task execution to creating and generating new, usable content in real-time, a leap forward in making AI practical for widespread use.
Terminology
“AI” stands for Artificial Intelligence, which includes areas like machine learning. Generative AI is a specific type of AI that creates new content. A central piece in this puzzle is the”‘LLM” or Large Language Model, with ChatGPT being a prime example. “GPT” means Generative Pre-trained Transformer, highlighting its ability to generate text after being pre-trained on a large corpus of data.
“Transformer”, a novel architecture introduced in the seminal 2017 paper “Attention is All You Need“. This architecture revolutionized the field by focusing on “attention mechanisms”, which allow the model to weigh the importance of different words when processing language, making it particularly powerful for understanding and generating text. This is what makes the LLMs we have today so good in “understanding” and creating text.
Function
AI models are neural networks, modeled loosely after the human brain, which learn from vast quantities of data. Text, audio, images, or video, is converted into a numerical format, often referred to as feature vectors. These vectors, for example [-2, 3, 1], are then fed into the neural network.
The network consists of interconnected nodes (neurons) organized in layers. Each connection has a weight, which is adjusted during the learning process. The data is processed through the network, transformed by the weights and biases at each node.
Finally, the network outputs a new set of numerical values, here shown as [0, 0, -4], which represent the network’s prediction or understanding of the input. These output vectors can then be translated back into a human-understandable format, such as text, audio, image, or video, depending on the task the network is designed to perform.
These models, especially those designed for natural language processing like GPT (Generative Pre-trained Transformer), are often trained through a method similar to a “guess the next word” game. The model is presented with a piece of text and must predict what word comes next. This illustrates the speech recognition roots of AI. The process is repeated across vast datasets, with the model’s predictions being compared to the actual continuation of the text.
Backpropagation is a key mechanism in training neural networks. It provides feedback on the accuracy of predictions. If the model incorrectly predicts a word, backpropagation adjusts the weights within the network, effectively telling it “This was right” or “This was wrong.” This iterative process continues until the model achieves satisfactory results. This will produce the “Parameters” of the neural net, the weights and the bias that makes up the vast amount of floating point values neural nets consist of.
For instance, if a language model is given the beginning of a sentence, “The sunset over the mountains”, and it needs to predict the next word “is”, but instead predicts “was”, backpropagation provides a correction. This helps the model make a better prediction next time. Through millions of such iterations, the model learns to make increasingly accurate predictions, significantly enhancing its text generation capability. Human feedback also plays a crucial role, fine-tuning the models to align more closely with desired outcomes and to increase their accuracy.
Models
Models like GPT-4, which is a text-to-text model, take written input and generate written output. This output isn’t limited to natural language; it can include structured data like code, JSON, HTML, and more. These models are a great for programmers, saving significant time and also serving as a learning tool from the code they generate.
Text-to-image models are another example, where you describe what you want visually, and the model produces the image, sometimes even allowing you to specify a particular style. Similarly, image-to-image models are capable of transforming or combining images in creative ways.
There are also image-to-text models that can describe the contents of a visual input. Speech-to-text models are prevalent as well, providing transcriptions for audio which is extremely useful for creating documentation like meeting notes. Text-to-audio models can even generate music or sounds from a written prompt.
The evolution doesn’t stop there; there are text-to-video models that can generate videos from textual descriptions. These models are inching us closer to a future where we might see infinitely continuing movie series that auto-generate episodes tailored to our preferences.
A trend in AI development is multimodal products, which combine various models to handle text, images, and audio seamlessly in one platform. An example is the ChatGPT mobile app. It represents a convergence of AI capabilities, enabling interactions with different content types without needing separate tools.
Language models started as mere word predictors but are evolving into tools with emergent capabilities, surprising even their developers. They can roleplay, write poetry, draft high-quality code, and even provide insights on strategy, legal, or medical advice. They demonstrate creative and intellectual tasks previously thought to be exclusively human.
These models, when exposed to enough text and images, begin to recognize patterns and understand high-level concepts, drawing parallels to a child’s learning process. For instance, when presented with a simple scenario involving objects and actions, these models can infer outcomes showing a basic grasp of physics and cause-effect relationships.
The implication might be that this could be a crossing point in history. While human intelligence has remained relatively stable over millennia, AI is advancing at an exponential rate, taking over tasks that were once the domain of humans. This growth may continue or level off eventually, but it seems to me that we are at least stepping into a new era of IT.
Doom and Bloom
Different mindsets shape how individuals and companies approach AI. On one end, there’s denial – some believe AI can’t impact their jobs or that exploring AI technology isn’t worth their time. This mindset can be risky; a common saying warns that even if AI doesn’t replace your job, someone leveraging AI might.
On the opposite end is a sense of panic, where people fear, AI will inevitably lead to job loss or corporate bankruptcy. Both denial and panic are counterproductive in my opinion.
A more balanced approach views AI as an enhancer of productivity for individuals and organizations. With this mindset, AI becomes like a mentor, streamlining the journey from idea to result and reducing time spent on mundane tasks. This outlook is not just positive but strategic, preparing people and companies to thrive in the age of AI.
To challenge your personal viewpoint and consider alternative perspectives, I prescribe the following treatment:
If you believe AI is overrated just listen to the Lex Fridman Podcast episodes #419 & #367 with Sam Altman.
If you’re overly optimistic and somewhat inattentive to potential risks consider anything Elon Musk has to say about AI, such as in Lex Fridman Podcast #400.
And if you believe AI will take over the world and eradicate humanity anyway soon, listen to Yann LeCun’s insights on, you guessed it, Lex Fridman #416.
After absorbing these discussions, you’ll gain insights what some of the greatest minds of our time, who are actual on the forefront of AI, think about where this Cambrian Explosion will lead us.
I might add:
None of them is predicting a slow down or even stagnation in the field of AI and what it can do. So, if you’re hesitant to bet on the slim chance that AI’s rapid progress is just a flash in the pan, it’s wise to educate yourself and learn how to use AI and where to fit in.
The human role
Although some jobs may become obsolete, human expertise remains crucial. Humans must direct the AI, frame the questions, provide context, and critically evaluate the outcomes.
AI models are not infallible; they can be impressively accurate or bafflingly off-base. Because they are guessing. Most of the time! It’s up to human experts to navigate these inconsistencies, ensuring legal compliance, data security, and appropriate use of AI outputs. Envision AI as a quirky, genius colleague – capable of brilliant insights but also prone to errors. Learning when to trust this colleague and when to rely on your own expertise is THE KEY.
AI can augment human ability, such as assisting doctors in diagnosing uncommon diseases, aiding lawyers in legal research, or supporting teachers with grading and course content. The synergy of human intelligence with AI is where the potential lies, creating an ensemble that is more effective than either alone.
To discover how AI can assist you in your job, simply pose the question “I am a Business Analyst and my biggest challenge right now is…X. Can you help me? And take it from there.
The Products
Users typically interact with a product – an app or website – that connects to the AI model. These products provide user interfaces and adding capabilities not inherent to the model. For instance, ChatGPT has a messaging history feature, while GPT-4 itself does not retain messages.
Developers can use these AI models through APIs to create innovative features for their products. For example, an e-learning platform could implement a chatbot that answers course-related queries, or a recruitment agency might develop a tool to automatically assess candidate resumes.
This even can be done in PowerShell with the ‘Invoke-RestMethod’
For .NET/C# I recommend you take a look at these blog posts. However, whether you’re a user or a developer, effective interaction with AI relies heavily on prompt engineering – creating prompts that get the most useful responses from the model. It’s an iterative process, refining prompts based on the responses.
Prompt engineering
Prompt engineering is a necessary skill when dealing with AI. The quality of the input dramatically influences the output. To reiterate, it’s an iterative process of crafting queries that guide the AI to understand and generate the needed information. See what I did there?
When constructing a prompt, provide context; the AI needs to understand the specific scenario to deliver accurate and relevant responses.
Imagine you’re planning a wedding. A vague prompt like ‘Help me with a wedding’ won’t yield useful results because the AI lacks context. A more detailed prompt would include when, where, how many people are invited, if it is on the beach or in a church, etc. etc. You might start with a broad question and ask the AI to ask you question if it needs more information, instead of making things up.
Prompt engineering is a dialogue with the AI. It’s not unlike teaching someone new on the job. Initial instructions are followed by detailed guidance. Like this AI becomes more like a sophisticated tool that can assist with increasingly complex tasks, enhancing your productivity and creativity.
Agents
The Future of generative AI might be in autonomous Agents. If you believe Andrej Kapathy, he thinks finally the technology is here. AI powered software entities capable of operating independently rather than awaiting our instructions. Imagine assigning a high-level mission to an AI, equipped with tools like internet access, financial resources, and the capability to communicate or even order groceries. These agents navigate and act in the world as needed, guided by the objectives you’ve set.
Prompt engineering takes on even greater significance here. The autonomy granted to these agents means the clarity and foresight in your mission statement are crucial to ensure they bring about beneficial outcomes rather than unintended havoc.
Key Takeaways
Potential
This technology is a game-changer. Understanding and leveraging it can transform it from a potential threat to a significant opportunity.
Imagination
The real limit to what AI can achieve lies not in the technology itself but in our capacity to envision its applications. What you can achieve with AI is bounded only by your creativity.
Prompt Engineering
Mastering the art of creating prompts is essential. This skill is not just about communicating with AI; it’s about refining your ability to articulate thoughts and intentions that even a guessing machine has little room for error.
Experimentation
Experimentation isn’t just beneficial; it’s necessary. By weaving AI into our daily routines, we’ll discover its true potential organically.
If you need assistance and/or consulting with AI and how to integrate this tool into your business feel free to reach out for expert guidance and support.