Generative AI and LLM(Large Language Models) are the new buzzwords of the tech world. In this write up we will briefly discuss the LLMs and different terms associated with them.
LLMs or large language models come under the Generative Artificial Intelligence domain which is a subsection of traditional machine learning. By now many of you readers must have used one or many of the LLMs for personal or professional uses. A few of the trending LLMs are
GPT 4 of OpenAI (1.76 Tn Parameters)
PaLM of Google (540 Bn Parameters)
LLaMA family of Meta AI (65 Bn Parameters)
Claude of Anthropic AI (52 Bn Parameters)
Bloom of Hugging Face (175 Bn Parameters)
So the question is how they mimic the human ability to generate texts, break down tasks and solve complex problems. They achieve this by finding statistical patterns from the large datasets on which they are trained.
LLMs are based on transformer neural network architecture. This architecture enables the model to comprehend the contextual data and long-term dependencies of the input. In this architecture, multiple encoders and decoders are linked to understand the input prompt.
One of the key concepts of transformer architecture is Self-attention. Self-attention allows the model to understand the most important token of the input. Encoders and decoders of the transformer architecture use the self-attention to process the input data.
The models are trained on datasets containing trillions and billions of words for weeks and months using complex computational resources and powers. The models developed by such strenuous efforts are called Foundational Models.
These foundational models have millions of parameters which are akin to human brain’s memory or cognitive ability. The larger they are the more sophisticated problems they can solve. Generative AI models have capabilities beyond language processing. Some models are developed for processing multiple types of data like images, videos, speeches etc. simultaneously. Gemini Pro by Google is a multi- modal LLM.
Unlike the traditional way of interacting with machines where we follow a systematic pattern of syntaxes to interact with libraries and APIs, LLMs are able to take instructions written in human language.
A few keywords that need to be known in the LLM universe.
Prompt:-The input given to an LLM.
Context Window:- It is the span of words that a model can take as input.
Completion: It is the output of the LLM model.
Inference: It is the process of using LLMs to generate responses.
Many have underestimated LLMs for its power as developer. The LLMs pretrained on a vast amount of data allows them to learn statistical patterns quickly and efficiently. This enables them to generate responses or prediction without any explicit programming. So LLM can be used to develop applications in weeks which otherwise may take many months using traditional machine learning.