A glossary of language-oriented AI (work in progress)

This glossary forms part of GenAI@ITMK's Technical Curriculum on Artificial Intelligence in Translation, Interpreting and Specialised Communication.

The curriculum

Author: Ralph Krüger

A

Activation function

Alignment

Array

Artificial general intelligence (AGI)

Artificial intelligence (AI)

artificial narrow intelligence (ANI)

Attention

Attention mask

Attention score

Attention weight

Auto-regressive decoder

B

Backpropagation

Backward pass

BART (Bidirectional and Auto-Regressive Transformer)

BERT (Bidirectional Encoder Representations from Transformers)

BertViz

Bias value

Bidirectional encoder

Byte-pair encoding (BPE)

C

Causal language modelling

Chain of thought (CoT)

Chain rule

Character-based tokenisation

Classification

Concatenation

Contextualised embedding

Context window

Continuous Bag of Words (CBOW)

Convolutional neural network (CNN)

Cosine similarity

Cross-entropy loss

D

Decoder

Decoder-only language model

Decontextualised embedding

Deep learning (DL)

Deep neural network

Derivative

Dot product

E

Embedding

Embedding matrix

Encoder

Encoder-decoder language model

Encoder-only language model

Euclidean distance

F

Feedforward neural network

Few-shot

Fine-tuning

Forward pass

G

General-purpose artificial intelligence (GPAI)

GPT (Generative Pre-Trained Transformer)

Gradient

Gradient descent

Greedy decoding

H

Hidden layer

Hugging Face

Hyperparameter

I

In-context learning

Inference

Input embedding

Input layer

J

K

Key matrix

Key vector

Knowledge distillation

Knowledge-enhanced language model

Knowledge graph

L

Layer normalisation

Linear layer

LLM-as-a-judge

Logits vector

Loss

Loss function

Low-rank adaptation (LoRA)

M

Machine learning (ML)

Masked language modelling

Masked token prediction

Massively multi-task learning

Matrix

Matrix product

Modality

Modality encoder

Modality interface

Model compression

Multi-head attention

Multimodal language model

N

Natural language generation (NLG)

Natural language processing (NLP)

Natural language understanding (NLU)

Negative log-likelihood

Neural network

Network error

Neuron

Next token prediction

Non-linearity

NumPy

O

One-hot vector

Optimisation

Output layer

P

Parameter-efficient fine-tuning (PEFT)

Positional encoding

Preference tuning

Pre-training

Prompt engineering

Prompting

Pruning

PyTorch

Q

Quantisation

Quantized low-rank adaptation (QLoRA)

Query matrix

Query vector

R

ReAct (Reason and Act)

Rectified Linear Unit (ReLU)

Recurrent neural network (RNN)

Representation learning (RL)

Regression

Reinforcement learning

Residual connection

Retrieval-augmented generation (RAG)

S

Scalar

Scaling law

Scaled dot-product attention

Self-attention

Self-supervised learning

Sentence embedding

Shallow neural network

Skip-Gram

Softmax

Subword

Subword-based tokenisation

Supervised learning

T

Tensor

TensorFlow

Test-time scaling

Text embedding

Tiktokenizer

Token

Tokenisation

Train-time scaling

Transformer

Transposition

U

Unigram

Unimodal language model

Unsupervised learning

V

Value matrix

Value vector

Vector

W

Weight

Weighted sum

Weights matrix

Word-based tokenisation

Word embedding

WordPiece

X

Y

Z

Zero-shot

Last updated