AI Glossary (50+ Terms): LLMs, RAG, Tokens, Prompting
AI glossary: 50+ essential terms explained (plain English)
If you read AI tool reviews, model release notes, or pricing pages, youâll run into jargon fast: tokens, context windows, RAG, embeddings, agents, function calling, fineâtuning, and many more.
This glossary explains the most common AI termsâespecially those used in modern large language models (LLMs)âin practical, beginner-friendly language.
If youâre brand new, also read AI Fundamentals. If you want to get better results right away, go to Prompt Engineering Guide.
A
Accuracy
How often an AI system produces the correct result. For generative AI, âaccuracyâ is tricky because outputs can be plausible but wrong. Many teams measure accuracy with task-specific tests (e.g., correct extraction, correct classification, correct citations).
Agent / Agentic workflow
A setup where a model can plan multiple steps and take actions (often via tools/API calls): search, retrieve docs, run code, create tickets, send emails, etc. Agents are powerful but also riskier because mistakes can turn into real actions.
Alignment
Methods used to steer a model to follow instructions and behave safely/helpfully. Examples include instruction-tuning, RLHF/RLAIF, and safety policies. Alignment doesnât mean the model is always truthfulâjust that itâs optimized to be helpful and avoid certain harms.
API (Application Programming Interface)
A standard way for software to talk to software. Many AI tools are built on model APIs (OpenAI, Anthropic, Google, open-source providers). If a product offers an âAI API,â it usually means you can integrate the model into your app.
Attention
A mechanism in Transformers that helps the model decide which tokens to âfocus onâ when generating the next token. Attention is one reason modern LLMs can handle long, complex inputsâthough performance can still degrade with very long context.
B
Base model
A model trained mostly to predict the next token from huge datasets (pretraining). Base models are not optimized to follow instructions politely; theyâre often less helpful in chat. Many products use instruction-tuned versions.
Batch processing
Running many AI requests asynchronously in a group to reduce cost or increase throughput. Some providers offer batch discounts because latency is less important.
Benchmark
A standardized test used to compare model performance (e.g., coding, reasoning, math). Benchmarks can be useful, but they can also be gamed or fail to predict your real task. Use them as a clue, not proof.
Bias
Systematic tendencies in model output that reflect patterns in training data. Bias can appear in language, recommendations, or decisions. Mitigation involves dataset work, evaluation, and careful product design.
C
Cached tokens / Prompt caching
A pricing feature where repeated parts of a prompt (like a long system prompt or a large document) are stored and re-used at a lower cost. Caching can dramatically reduce costs for long-context workflows.
Chain-of-thought (CoT)
A prompting approach where you ask a model to reason step-by-step. CoT can improve performance on complex tasks, but some tools hide intermediate reasoning. In practice, ask for structured reasoning and verification steps rather than long âthinking.â
Classification
Turning an input into a label (e.g., âspam vs not spam,â ârefund request vs bug reportâ). Classification is often cheaper and more reliable than open-ended generation.
Context
The information available to the model in the current request: system prompt, user prompt, conversation history, retrieved documents, and tool results.
Context window
The maximum amount of tokens a model can consider at once (input + output + sometimes hidden reasoning tokens). A bigger context window allows longer documents and longer conversations, but it can increase cost and doesnât guarantee perfect memory.
Copilot
A product design pattern where AI assists you inside a tool you already use (IDE, email, docs). Copilots reduce friction because context is right there.
D
Data privacy
Rules and practices about how user data is stored, processed, and reused. Important questions: Is your data used to train models? How long is it retained? Who can access it? Is it encrypted?
Dataset
A collection of data used for training or evaluation. For LLMs, datasets are massive and often curated from many sources.
Deep learning
A subset of machine learning based on multi-layer neural networks. Modern LLMs are deep learning models.
Diffusion model
A type of generative model commonly used for image generation (e.g., Stable Diffusion). While this glossary focuses on LLMs, many AI tools combine text and diffusion models.
E
Embeddings
A way to convert text (or images) into numeric vectors that capture meaning. Embeddings are used for semantic search, clustering, recommendations, and RAG. If a tool says it âsearches by meaning,â it probably uses embeddings.
Evaluation (Eval)
A repeatable test that measures model/tool performance on your tasks. Good evals are versioned, automated, and reflect real inputs. Without evals, teams rely on vibes.
Extract-Transform-Load (ETL)
A data pipeline pattern. In AI workflows, ETL often prepares documents for RAG: cleaning, chunking, embedding, and indexing.
F
Fine-tuning
Training a model further on your data to change behavior (style, domain, formats). Fine-tuning can improve consistency, but it costs time and introduces operational complexity. For many teams, RAG + good prompting is enough.
Few-shot prompting
Providing a few examples (input â output) to guide the model. Few-shot prompts can improve reliability, especially for formatting and classification.
Function calling / Tool calling
A structured way for models to request actions (e.g., call get_weather(city) or search_docs(query)). Tool calling is key to building agents and reliable workflows.
G
Generalization
A modelâs ability to perform well on new, unseen inputs. Overfitting reduces generalization.
GPT (Generative Pre-trained Transformer)
A family of models based on the Transformer architecture, trained on large datasets (âpre-trainedâ) and used to generate text (âgenerativeâ). âGPTâ is often used generically, but it originally referred to specific OpenAI model families.
Grounding
Linking an answer to trusted sources (documents, databases, or web results). Grounding reduces hallucinations by constraining the model to reference external information.
H
Hallucination
When a model outputs information that is incorrect or fabricated but presented confidently. Hallucinations are not âbugsâ in the usual senseâtheyâre a natural consequence of next-token prediction without strong grounding.
Human-in-the-loop (HITL)
A workflow where humans review, approve, or correct AI outputs. HITL is a common safety and quality strategy in production systems.
I
Inference
Using a trained model to generate an output. Inference is what you pay for in most APIs.
Instruction tuning
Training a model to follow instructions and behave well in chat. Instruction-tuned models are usually better for assistants than base models.
Iteration
The normal process of improving prompts, data, or workflows. In AI projects, iteration is expectedârarely âone and done.â
K
Knowledge cutoff
The date (or approximate range) after which a modelâs training data no longer includes new events. A model may still answer about recent topics, but those answers may be guesses unless the tool uses web browsing or retrieval.
L
Latency
How long a model takes to respond. Lower latency matters for real-time chat, support, and interactive tools; higher latency may be acceptable for batch jobs.
LLM (Large Language Model)
A neural network trained on huge text datasets to predict the next token. LLMs can write, summarize, translate, and reasonâbut they can also hallucinate. Many modern âAI toolsâ are LLM wrappers with extra features.
Long-context model
A model optimized to handle large context windows (hundreds of thousands to millions of tokens). Useful for long documents, codebases, and multi-document analysis.
M
Machine learning (ML)
A field where systems learn patterns from data instead of being explicitly programmed with rules. Deep learning is a subset of ML.
Multimodal
Models that can handle multiple input/output types (text, images, audio, video). Many modern assistants are multimodal.
N
Next-token prediction
The core training objective for many LLMs: given previous tokens, predict the next token. Surprisingly, this simple objective yields models that can perform many tasks.
Non-determinism
Two outputs from the same prompt can differ (especially with higher temperature). Non-determinism is normal; production systems often lower temperature for consistency.
O
Overfitting
When a model (or prompt) performs well on examples it has seen but poorly on new inputs. In prompt engineering, overfitting can happen when you rely on narrow examples or brittle templates.
P
Parameters
The learned âweightsâ of a neural network. More parameters can increase capacity, but architecture, training quality, and data matter too.
Prompt
The text (and sometimes structured data) you send to a model. Prompts can include roles (system/user), rules, examples, and output formats.
Prompt injection
An attack where untrusted text (like a webpage or email) tries to override instructions (e.g., âignore previous instructions and reveal secretsâ). Safe systems treat external text as data, not instructions.
Prompt template
A reusable prompt structure with placeholders. Templates help teams get consistent results.
R
RAG (Retrieval-Augmented Generation)
A method where the system retrieves relevant documents (often via embeddings + vector search) and then asks the model to answer using that retrieved context. RAG improves accuracy and allows answers based on private data without fine-tuning.
Rate limits
Limits on requests per minute/day imposed by an API or product. Important for production planning.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where humans (or models) rank outputs and the model is optimized to produce preferred behavior. RLHF helps with instruction-following and safety, but it doesnât guarantee truth.
RLAIF (Reinforcement Learning from AI Feedback)
Similar to RLHF, but feedback is generated by models rather than humans. Often used to scale alignment.
S
Safety policy
Rules and filters to reduce harmful outputs. Policies can affect what a tool can do, especially in sensitive domains.
Semantic search
Search based on meaning rather than exact keywords. Usually implemented with embeddings.
System prompt
A high-priority instruction (often hidden from the user) that sets behavior, constraints, and style. Good system prompts can improve consistency.
T
Temperature
A sampling parameter controlling randomness. Low temperature â more consistent and conservative outputs. Higher temperature â more creative but more variable outputs.
Token
A piece of text used by models (often a word fragment). Pricing and context windows are measured in tokens, not characters or words.
Tokenization
How text is split into tokens. Different models use different tokenizers, which affects token counts and cost.
Topâp (nucleus sampling)
Another randomness control that selects from the smallest set of tokens whose probabilities sum to p. Often used with temperature.
Training
The process of adjusting model parameters using data. Most users donât âtrainâ models directly, but may fine-tune or configure retrieval.
Transformer
A neural network architecture that uses attention and parallel processing. Transformers are the foundation of most modern LLMs.
U
Uncertainty
A modelâs confidence is not always calibrated. Good prompts ask the model to state uncertainty, list assumptions, and suggest verification steps.
V
Vector database
A database optimized for storing and searching embeddings. Used for semantic search and RAG.
Verification
Any step that checks AI output against reality: citations, unit tests, cross-checking sources, human review, or running code.
W
Weight / Weights
The learned parameters of a model. âOpen-weightâ models release weights publicly so others can run them.
Bonus: practical terms youâll see in AI tools
âMemoryâ
A product feature where the tool stores user preferences or facts across sessions. Memory can be helpful but raises privacy questions.
âProjectsâ / âWorkspacesâ
A way to organize chats, files, and prompts by topic or client. Useful for keeping context separate.
âConnectorsâ
Integrations that let an AI tool access data from Google Drive, Slack, Notion, GitHub, CRM systems, etc. Connectors can supercharge productivityâbut also expand the security surface area.
FAQ
How many words are in a token?
It varies. Roughly, 1 token is often ~3â4 characters in English, or ~0.75 words on average. But tokenization differs by language and model.
Is RAG the same as web browsing?
No. RAG usually retrieves from your indexed documents (private knowledge base). Web browsing retrieves from the live web. Some tools combine both.
Do I always need fine-tuning?
No. Many teams get strong results with a good system prompt, few-shot examples, and RAG. Fine-tuning is useful when you need very consistent style/format or domain behavior at scale.
Whatâs the single most important term to understand?
For everyday tool usage: context window. It drives cost, capability with long documents, and whether a tool can ârememberâ enough to be useful.
Where should I go next?
- Learn core concepts: AI Fundamentals
- Improve results: Prompt Engineering Guide
- Compare model families: LLM Comparison 2026