CS663

"Prior" to Deep Learning ML Techniques --> Should we learn about them?

* Not including NN (nueral networks) where are present in Deep Learning Models *

Most of these Prior ML Techniques are potentially useful as insights into hybrid methods — e.g., modern speech models that mix HMM-style constraints with Transformers. OR for tiny well defined edge computing type operation.

Deep learning models are data- and compute-hungry.
Tiny and offline systems (IoT, drones, industrial sensors, embedded robots) often:

can’t fit or power a CNN/Transformer/VLM,
can’t send data to the cloud, and
must make decisions locally and instantly.

Hence, classical algorithms still dominate many “TinyML” or “Edge AI” deployments, especially for:

basic detection, classification, or segmentation,
low-resolution or structured sensor input,

or interpretable, rule-based systems

Technique	Edge Viability	When / Why Useful	When Not Useful
KNN	⚙️ Good for small feature spaces	Simple, local classification when #samples < 1000. Good for on-device anomaly detection.	Scales poorly; high memory per instance.
SVM	✅ Excellent for edge	Compact models, low-latency inference. Works well with hand-crafted features (e.g., HOG, MFCC). Used in tiny sensors and microcontrollers.	Requires precomputed features; not good for high-dimensional raw data.
GMM	✅ Excellent for compact probabilistic inference	Used for on-board sound or motion recognition. Simple math (few Gaussians).	Limited in expressiveness; can’t generalize complex visual data.
HMM	✅ Excellent for time-series	Still used in embedded speech recognition (wake word detection, simple ASR).	Transformers vastly outperform in rich language tasks.
Haar Cascades	✅ Very efficient on microcontrollers	Real-time detection of faces, hands, or objects on CPU without GPU. (e.g., OpenCV face detector on Raspberry Pi, ESP32-CAM).	Fails on complex scenes or occlusion; outdated for general vision.
Bag of Words / HOG / SIFT	✅ Excellent as feature extractors	Lightweight, can feed into small classifiers. Used in embedded CV for texture/shape.	Supplanted by CNN features in higher-complexity domains.
AdaBoost / Decision Trees	✅ Excellent for embedded decision logic	Extremely small memory footprint. Used in microcontrollers for gesture recognition or small tabular datasets.	Weak for high-dimensional raw sensory input (like images or audio spectrograms).
Color / Template Matching	✅ Trivial compute	Great for static pattern tracking, motion sensors, or color triggers (e.g., smart toys, simple robots).	Not adaptable or generalizable.

Examples

Environment	Best Approach	Why
Tiny microcontroller (e.g., Arduino, ESP32)	SVM / HMM / Haar / AdaBoost	Lightweight, fast, no GPU needed
Small SoC (e.g., Raspberry Pi, Jetson Nano)	CNN (compressed), SVM fallback	Can run small CNNs; fallback to classical when offline
Industrial IoT (sensor fusion, low data rate)	GMM / HMM	Compact probabilistic modeling of signal patterns
Drone / robot navigation (low latency)	Haar + HOG + SVM	Instant detection; no cloud dependency
Smart home / wearable / offline devices	HMM / SVM	Common in voice-trigger systems, fall detection, heart-rate anomalies
High-complexity perception (text + image)	VLM / DL	Needs big compute & internet; not viable on pure edge

OVERALL VALUE

1. Nearest Neighbor (KNN)

Worth a brief review (conceptually).

✅ Why:
- It’s a conceptual foundation for understanding metric learning and embedding spaces — ideas that recur in few-shot learning, retrieval-augmented models, CLIP, and contrastive learning.
- Helps connect “classical distance-based” thinking to modern embedding similarity (cosine, Euclidean).

2. Gaussian Mixture Models (GMMs)

Marginally useful — worth a skim.

✅ Why:
- Teaches unsupervised probabilistic clustering (precursor to VAEs, mixture-of-experts, etc.).
- Conceptually linked to latent variable models and soft assignments used in modern probabilistic DL.

3. Support Vector Machines (SVMs)

Worth understanding historically and geometrically.

✅ Why:
- Introduces margins, kernels, and decision boundaries, which are mathematically elegant and still inform ideas like maximum margin contrastive loss and feature separation in embedding spaces.

4. Haar Classifiers

Skip, unless you're curious about computer vision history or OpenCV pipelines.

⚙️ Why:
- Outdated; CNNs have completely replaced it for image detection.

5. Hidden Markov Models (HMMs)

Worth reviewing only conceptually if you’re interested in sequence modeling.

✅ Why:
- Predecessor to RNNs, Transformers, and diffusion-like generative models in temporal data.
- Teaches transition/emission probabilities → forms the basis of “probabilistic sequence reasoning.”

6. Bag of Words / Boosting / Color Models

Skip, unless you want to see how early CV worked.

*** Not including NN (nueral networks) where are present in Deep Learning Models ***

Most of these Prior ML Techniques are potentially useful as insights into hybrid methods — e.g., modern speech models that mix HMM-style constraints with Transformers. OR for tiny well defined edge computing type operation.

Deep learning models are data- and compute-hungry. Tiny and offline systems (IoT, drones, industrial sensors, embedded robots) often:

can’t fit or power a CNN/Transformer/VLM,

can’t send data to the cloud, and

must make decisions locally and instantly.

Hence, classical algorithms still dominate many “TinyML” or “Edge AI” deployments, especially for:

basic detection, classification, or segmentation,

low-resolution or structured sensor input,

or interpretable, rule-based systems

Technique

Edge Viability

When / Why Useful

When Not Useful

KNN

⚙️ Good for small feature spaces

Simple, local classification when #samples < 1000. Good for on-device anomaly detection.

Scales poorly; high memory per instance.

SVM

✅ Excellent for edge

Compact models, low-latency inference. Works well with hand-crafted features (e.g., HOG, MFCC). Used in tiny sensors and microcontrollers.

Requires precomputed features; not good for high-dimensional raw data.

GMM

✅ Excellent for compact probabilistic inference

Used for on-board sound or motion recognition. Simple math (few Gaussians).

Limited in expressiveness; can’t generalize complex visual data.

HMM

✅ Excellent for time-series

Still used in embedded speech recognition (wake word detection, simple ASR).

Transformers vastly outperform in rich language tasks.

Haar Cascades

✅ Very efficient on microcontrollers

Real-time detection of faces, hands, or objects on CPU without GPU. (e.g., OpenCV face detector on Raspberry Pi, ESP32-CAM).

Fails on complex scenes or occlusion; outdated for general vision.

Bag of Words / HOG / SIFT

✅ Excellent as feature extractors

Lightweight, can feed into small classifiers. Used in embedded CV for texture/shape.

Supplanted by CNN features in higher-complexity domains.

AdaBoost / Decision Trees

✅ Excellent for embedded decision logic

Extremely small memory footprint. Used in microcontrollers for gesture recognition or small tabular datasets.

Weak for high-dimensional raw sensory input (like images or audio spectrograms).

Color / Template Matching

✅ Trivial compute

Great for static pattern tracking, motion sensors, or color triggers (e.g., smart toys, simple robots).

Not adaptable or generalizable.

Examples

Environment

Best Approach

Why

Tiny microcontroller (e.g., Arduino, ESP32)

SVM / HMM / Haar / AdaBoost

Lightweight, fast, no GPU needed

Small SoC (e.g., Raspberry Pi, Jetson Nano)

CNN (compressed), SVM fallback

Can run small CNNs; fallback to classical when offline

Industrial IoT (sensor fusion, low data rate)

GMM / HMM

Compact probabilistic modeling of signal patterns

Drone / robot navigation (low latency)

Haar + HOG + SVM

Instant detection; no cloud dependency

Smart home / wearable / offline devices

HMM / SVM

Common in voice-trigger systems, fall detection, heart-rate anomalies

High-complexity perception (text + image)

VLM / DL

Needs big compute & internet; not viable on pure edge