AI glossary: 50+ essential terms explained (plain English)

If you read AI tool reviews, model release notes, or pricing pages, you’ll run into jargon fast: tokens, context windows, RAG, embeddings, agents, function calling, fine‑tuning, and many more.

This glossary explains the most common AI terms—especially those used in modern large language models (LLMs)—in practical, beginner-friendly language.

If you’re brand new, also read AI Fundamentals. If you want to get better results right away, go to Prompt Engineering Guide.


A

Accuracy

How often an AI system produces the correct result. For generative AI, “accuracy” is tricky because outputs can be plausible but wrong. Many teams measure accuracy with task-specific tests (e.g., correct extraction, correct classification, correct citations).

Agent / Agentic workflow

A setup where a model can plan multiple steps and take actions (often via tools/API calls): search, retrieve docs, run code, create tickets, send emails, etc. Agents are powerful but also riskier because mistakes can turn into real actions.

Alignment

Methods used to steer a model to follow instructions and behave safely/helpfully. Examples include instruction-tuning, RLHF/RLAIF, and safety policies. Alignment doesn’t mean the model is always truthful—just that it’s optimized to be helpful and avoid certain harms.

API (Application Programming Interface)

A standard way for software to talk to software. Many AI tools are built on model APIs (OpenAI, Anthropic, Google, open-source providers). If a product offers an “AI API,” it usually means you can integrate the model into your app.

Attention

A mechanism in Transformers that helps the model decide which tokens to “focus on” when generating the next token. Attention is one reason modern LLMs can handle long, complex inputs—though performance can still degrade with very long context.


B

Base model

A model trained mostly to predict the next token from huge datasets (pretraining). Base models are not optimized to follow instructions politely; they’re often less helpful in chat. Many products use instruction-tuned versions.

Batch processing

Running many AI requests asynchronously in a group to reduce cost or increase throughput. Some providers offer batch discounts because latency is less important.

Benchmark

A standardized test used to compare model performance (e.g., coding, reasoning, math). Benchmarks can be useful, but they can also be gamed or fail to predict your real task. Use them as a clue, not proof.

Bias

Systematic tendencies in model output that reflect patterns in training data. Bias can appear in language, recommendations, or decisions. Mitigation involves dataset work, evaluation, and careful product design.


C

Cached tokens / Prompt caching

A pricing feature where repeated parts of a prompt (like a long system prompt or a large document) are stored and re-used at a lower cost. Caching can dramatically reduce costs for long-context workflows.

Chain-of-thought (CoT)

A prompting approach where you ask a model to reason step-by-step. CoT can improve performance on complex tasks, but some tools hide intermediate reasoning. In practice, ask for structured reasoning and verification steps rather than long “thinking.”

Classification

Turning an input into a label (e.g., “spam vs not spam,” “refund request vs bug report”). Classification is often cheaper and more reliable than open-ended generation.

Context

The information available to the model in the current request: system prompt, user prompt, conversation history, retrieved documents, and tool results.

Context window

The maximum amount of tokens a model can consider at once (input + output + sometimes hidden reasoning tokens). A bigger context window allows longer documents and longer conversations, but it can increase cost and doesn’t guarantee perfect memory.

Copilot

A product design pattern where AI assists you inside a tool you already use (IDE, email, docs). Copilots reduce friction because context is right there.


D

Data privacy

Rules and practices about how user data is stored, processed, and reused. Important questions: Is your data used to train models? How long is it retained? Who can access it? Is it encrypted?

Dataset

A collection of data used for training or evaluation. For LLMs, datasets are massive and often curated from many sources.

Deep learning

A subset of machine learning based on multi-layer neural networks. Modern LLMs are deep learning models.

Diffusion model

A type of generative model commonly used for image generation (e.g., Stable Diffusion). While this glossary focuses on LLMs, many AI tools combine text and diffusion models.


E

Embeddings

A way to convert text (or images) into numeric vectors that capture meaning. Embeddings are used for semantic search, clustering, recommendations, and RAG. If a tool says it “searches by meaning,” it probably uses embeddings.

Evaluation (Eval)

A repeatable test that measures model/tool performance on your tasks. Good evals are versioned, automated, and reflect real inputs. Without evals, teams rely on vibes.

Extract-Transform-Load (ETL)

A data pipeline pattern. In AI workflows, ETL often prepares documents for RAG: cleaning, chunking, embedding, and indexing.


F

Fine-tuning

Training a model further on your data to change behavior (style, domain, formats). Fine-tuning can improve consistency, but it costs time and introduces operational complexity. For many teams, RAG + good prompting is enough.

Few-shot prompting

Providing a few examples (input → output) to guide the model. Few-shot prompts can improve reliability, especially for formatting and classification.

Function calling / Tool calling

A structured way for models to request actions (e.g., call get_weather(city) or search_docs(query)). Tool calling is key to building agents and reliable workflows.


G

Generalization

A model’s ability to perform well on new, unseen inputs. Overfitting reduces generalization.

GPT (Generative Pre-trained Transformer)

A family of models based on the Transformer architecture, trained on large datasets (“pre-trained”) and used to generate text (“generative”). “GPT” is often used generically, but it originally referred to specific OpenAI model families.

Grounding

Linking an answer to trusted sources (documents, databases, or web results). Grounding reduces hallucinations by constraining the model to reference external information.


H

Hallucination

When a model outputs information that is incorrect or fabricated but presented confidently. Hallucinations are not “bugs” in the usual sense—they’re a natural consequence of next-token prediction without strong grounding.

Human-in-the-loop (HITL)

A workflow where humans review, approve, or correct AI outputs. HITL is a common safety and quality strategy in production systems.


I

Inference

Using a trained model to generate an output. Inference is what you pay for in most APIs.

Instruction tuning

Training a model to follow instructions and behave well in chat. Instruction-tuned models are usually better for assistants than base models.

Iteration

The normal process of improving prompts, data, or workflows. In AI projects, iteration is expected—rarely “one and done.”


K

Knowledge cutoff

The date (or approximate range) after which a model’s training data no longer includes new events. A model may still answer about recent topics, but those answers may be guesses unless the tool uses web browsing or retrieval.


L

Latency

How long a model takes to respond. Lower latency matters for real-time chat, support, and interactive tools; higher latency may be acceptable for batch jobs.

LLM (Large Language Model)

A neural network trained on huge text datasets to predict the next token. LLMs can write, summarize, translate, and reason—but they can also hallucinate. Many modern “AI tools” are LLM wrappers with extra features.

Long-context model

A model optimized to handle large context windows (hundreds of thousands to millions of tokens). Useful for long documents, codebases, and multi-document analysis.


M

Machine learning (ML)

A field where systems learn patterns from data instead of being explicitly programmed with rules. Deep learning is a subset of ML.

Multimodal

Models that can handle multiple input/output types (text, images, audio, video). Many modern assistants are multimodal.


N

Next-token prediction

The core training objective for many LLMs: given previous tokens, predict the next token. Surprisingly, this simple objective yields models that can perform many tasks.

Non-determinism

Two outputs from the same prompt can differ (especially with higher temperature). Non-determinism is normal; production systems often lower temperature for consistency.


O

Overfitting

When a model (or prompt) performs well on examples it has seen but poorly on new inputs. In prompt engineering, overfitting can happen when you rely on narrow examples or brittle templates.


P

Parameters

The learned “weights” of a neural network. More parameters can increase capacity, but architecture, training quality, and data matter too.

Prompt

The text (and sometimes structured data) you send to a model. Prompts can include roles (system/user), rules, examples, and output formats.

Prompt injection

An attack where untrusted text (like a webpage or email) tries to override instructions (e.g., “ignore previous instructions and reveal secrets”). Safe systems treat external text as data, not instructions.

Prompt template

A reusable prompt structure with placeholders. Templates help teams get consistent results.


R

RAG (Retrieval-Augmented Generation)

A method where the system retrieves relevant documents (often via embeddings + vector search) and then asks the model to answer using that retrieved context. RAG improves accuracy and allows answers based on private data without fine-tuning.

Rate limits

Limits on requests per minute/day imposed by an API or product. Important for production planning.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where humans (or models) rank outputs and the model is optimized to produce preferred behavior. RLHF helps with instruction-following and safety, but it doesn’t guarantee truth.

RLAIF (Reinforcement Learning from AI Feedback)

Similar to RLHF, but feedback is generated by models rather than humans. Often used to scale alignment.


S

Safety policy

Rules and filters to reduce harmful outputs. Policies can affect what a tool can do, especially in sensitive domains.

Search based on meaning rather than exact keywords. Usually implemented with embeddings.

System prompt

A high-priority instruction (often hidden from the user) that sets behavior, constraints, and style. Good system prompts can improve consistency.


T

Temperature

A sampling parameter controlling randomness. Low temperature → more consistent and conservative outputs. Higher temperature → more creative but more variable outputs.

Token

A piece of text used by models (often a word fragment). Pricing and context windows are measured in tokens, not characters or words.

Tokenization

How text is split into tokens. Different models use different tokenizers, which affects token counts and cost.

Top‑p (nucleus sampling)

Another randomness control that selects from the smallest set of tokens whose probabilities sum to p. Often used with temperature.

Training

The process of adjusting model parameters using data. Most users don’t “train” models directly, but may fine-tune or configure retrieval.

Transformer

A neural network architecture that uses attention and parallel processing. Transformers are the foundation of most modern LLMs.


U

Uncertainty

A model’s confidence is not always calibrated. Good prompts ask the model to state uncertainty, list assumptions, and suggest verification steps.


V

Vector database

A database optimized for storing and searching embeddings. Used for semantic search and RAG.

Verification

Any step that checks AI output against reality: citations, unit tests, cross-checking sources, human review, or running code.


W

Weight / Weights

The learned parameters of a model. “Open-weight” models release weights publicly so others can run them.


Bonus: practical terms you’ll see in AI tools

“Memory”

A product feature where the tool stores user preferences or facts across sessions. Memory can be helpful but raises privacy questions.

“Projects” / “Workspaces”

A way to organize chats, files, and prompts by topic or client. Useful for keeping context separate.

“Connectors”

Integrations that let an AI tool access data from Google Drive, Slack, Notion, GitHub, CRM systems, etc. Connectors can supercharge productivity—but also expand the security surface area.


FAQ

How many words are in a token?

It varies. Roughly, 1 token is often ~3–4 characters in English, or ~0.75 words on average. But tokenization differs by language and model.

Is RAG the same as web browsing?

No. RAG usually retrieves from your indexed documents (private knowledge base). Web browsing retrieves from the live web. Some tools combine both.

Do I always need fine-tuning?

No. Many teams get strong results with a good system prompt, few-shot examples, and RAG. Fine-tuning is useful when you need very consistent style/format or domain behavior at scale.

What’s the single most important term to understand?

For everyday tool usage: context window. It drives cost, capability with long documents, and whether a tool can “remember” enough to be useful.

Where should I go next?