San’s AI Notes

Working notes, derivations, and diagrams

By San Hashimhama
AI Researcher • san.hashimhama@outlook.com • GitHub

This guide covers fundamental AI algorithms with mathematical rigor and practical insights. Each algorithm is explained with clear derivations, real-world applications, and implementation details based on my research experience at Cyrion Labs and SourceMind Labs.

Transformers

Self-attention mechanisms, multi-head attention, and the architecture behind modern NLP.

O(n²·d) attention | Parallelizable

FlashAttention

IO‑aware tiled attention with streaming softmax and huge speedups.

No n² workspace | Fused, tiled kernels

LoRA Fine‑Tuning

Low‑rank adaptation for parameter‑efficient training and deployment.

2·d·r params vs d² | Mergeable adapters

QLoRA Internals

NF4 quantization, double‑quant, dequant math, paged optimizers.

4‑bit base + FP16 adapters | Fused dequant‑GEMM

PPO

Clipped surrogate objective for stable policy updates.

Minibatch epochs | KL early stop

TRPO

KL‑constrained policy improvement with natural gradient.

Conjugate gradient | Line search

SAC

Maximum‑entropy RL with twin critics and temperature tuning.

Replay + twin Q | Stochastic policy

DPO

Direct preference optimization without a reward model.

Pairwise logistic loss | β scaling

GRPO

Generalized preference optimization with grouping and KL/clipping.

Group weighting | KL regularization

My Research Projects

LSLM
Listening-while-Speaking Language Model Self-Correcting LLM
Reinforcement Learning for Language Models OmegaPRM
Process Supervision Framework CoRAG
Chain-of-Retrieval Augmented Generation