A glossary of language-oriented AI (work in progress)
This glossary forms part of GenAI@ITMK's Technical Curriculum on Artificial Intelligence in Translation, Interpreting and Specialised Communication.

Author: Ralph Krüger
A
Activation function
Alignment
Array
Artificial general intelligence (AGI)
Artificial intelligence (AI)
artificial narrow intelligence (ANI)
Attention
Attention mask
Attention score
Attention weight
Auto-regressive decoder
B
Backpropagation
Backward pass
BART (Bidirectional and Auto-Regressive Transformer)
Beam search
BERT (Bidirectional Encoder Representations from Transformers)
BertViz
Bias value
Bidirectional encoder
Byte-pair encoding (BPE)
C
Causal language modelling
Chain of thought (CoT)
Chain rule
Character-based tokenisation
Classification
Concatenation
Contextualised embedding
Context window
Continuous Bag of Words (CBOW)
Convolutional neural network (CNN)
Cosine similarity
Cross-entropy loss
D
Decoder
Decoder-only language model
Decontextualised embedding
Deep learning (DL)
Deep neural network
Derivative
Dot product
E
Embedding
Embedding matrix
Encoder
Encoder-decoder language model
Encoder-only language model
Euclidean distance
F
Feedforward neural network
Few-shot
Fine-tuning
Forward pass
G
General-purpose artificial intelligence (GPAI)
GPT (Generative Pre-Trained Transformer)
Gradient
Gradient descent
Greedy decoding
H
Hidden layer
Hugging Face
Hyperparameter
I
In-context learning
Inference
Input embedding
Input layer
J
K
Key matrix
Key vector
Knowledge distillation
Knowledge-enhanced language model
Knowledge graph
L
Layer normalisation
Linear layer
LLM-as-a-judge
Logits vector
Loss
Loss function
Low-rank adaptation (LoRA)
M
Machine learning (ML)
Masked language modelling
Masked token prediction
Massively multi-task learning
Matrix
Matrix product
Modality
Modality encoder
Modality interface
Model compression
Multi-head attention
Multimodal language model
N
Natural language generation (NLG)
Natural language processing (NLP)
Natural language understanding (NLU)
Negative log-likelihood
Neural network
Network error
Neuron
Next token prediction
Non-linearity
NumPy
O
One-hot vector
Optimisation
Output layer
P
Parameter-efficient fine-tuning (PEFT)
Positional encoding
Preference tuning
Pre-training
Prompt engineering
Prompting
Pruning
PyTorch
Q
Quantisation
Quantized low-rank adaptation (QLoRA)
Query matrix
Query vector
R
ReAct (Reason and Act)
Rectified Linear Unit (ReLU)
Recurrent neural network (RNN)
Representation learning (RL)
Regression
Reinforcement learning
Residual connection
Retrieval-augmented generation (RAG)
S
Scalar
Scaling law
Scaled dot-product attention
Self-attention
Self-supervised learning
Sentence embedding
Shallow neural network
Skip-Gram
Softmax
Subword
Subword-based tokenisation
Supervised learning
T
Tensor
TensorFlow
Test-time scaling
Text embedding
Tiktokenizer
Token
Tokenisation
Train-time scaling
Transformer
Transposition
U
Unigram
Unimodal language model
Unsupervised learning
V
Value matrix
Value vector
Vector
W
Weight
Weighted sum
Weights matrix
Word-based tokenisation
Word embedding
WordPiece
X
Y
Z
Zero-shot
Last updated