WiseMoney AI : AI Job Training: Junior MLOps Engineer — Day 3

Saturday, January 17, 2026

AI Job Training: Junior MLOps Engineer — Day 3 – Linux & Shell for MLOps

Learning Video: https://youtu.be/wzTV0n2b-dE

Goal of the Day

Learn how Linux and shell basics power real-world MLOps pipelines—from organizing ML projects to automating training and deployment.

MLOps engineers live in the terminal. Models don’t fail because of math—they fail because of bad structure, broken scripts, or missing environment variables.

1️⃣ Folder Structures for ML Projects

A clean project structure is non-negotiable in MLOps. It enables:

Reproducibility
Collaboration
CI/CD automation
Easier debugging & monitoring

📂 Standard ML Project Structure

Why This Matters in MLOps

data/ → versioned and tracked
src/ → production code (not notebooks)
scripts/ → automation entry points
configs/ → environment-agnostic settings
models/ → saved artifacts for deployment

This structure maps directly to CI/CD pipelines and ML platforms.

2️⃣ Bash Commands Used in MLOps Pipelines

MLOps pipelines rely heavily on shell commands—especially in Dockerfiles, CI tools, cron jobs, and cloud VMs.

📌 Core Linux Commands You Must Know

Example: Run a Training Script

python src/train.py

import os
import logging
from datetime import datetime

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import joblib

# -----------------------------
# Configuration (via env vars)
# -----------------------------
MODEL_NAME = os.getenv("MODEL_NAME", "demo_model")
MODEL_VERSION = os.getenv("MODEL_VERSION", "v1")
DATA_PATH = os.getenv("DATA_PATH", "data/processed/train.csv")
MODEL_DIR = os.getenv("MODEL_DIR", "models")
LOG_DIR = os.getenv("LOG_DIR", "logs")

os.makedirs(MODEL_DIR, exist_ok=True)
os.makedirs(LOG_DIR, exist_ok=True)

# -----------------------------
# Logging setup
# -----------------------------
log_file = f"{LOG_DIR}/train_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[
logging.FileHandler(log_file),
logging.StreamHandler()
]
)

logging.info("Starting training pipeline")
logging.info(f"Model: {MODEL_NAME}, Version: {MODEL_VERSION}")
logging.info(f"Loading data from {DATA_PATH}")

# -----------------------------
# Load data
# -----------------------------
try:
data = pd.read_csv(DATA_PATH)
except FileNotFoundError:
logging.error("Training data not found. Exiting.")
raise

X = data.drop("label", axis=1)
y = data["label"]

# -----------------------------
# Train / Test split
# -----------------------------
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# -----------------------------
# Model training
# -----------------------------
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

logging.info("Model training completed")

# -----------------------------
# Evaluation
# -----------------------------
preds = model.predict(X_test)
accuracy = accuracy_score(y_test, preds)

logging.info(f"Validation accuracy: {accuracy:.4f}")

# -----------------------------
# Save model artifact
# -----------------------------
model_path = f"{MODEL_DIR}/{MODEL_NAME}_{MODEL_VERSION}.joblib"
joblib.dump(model, model_path)

logging.info(f"Model saved to {model_path}")
logging.info("Training pipeline finished successfully")

🧪 Example: Run Full Pipeline

bash
scripts/run_pipeline.sh

✅ Sample `run_pipeline.sh`

#!/bin/bash

# -----------------------------
# Safe bash settings
# -----------------------------
set -euo pipefail

# -----------------------------
# Project paths
# -----------------------------
PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
SRC_DIR="$PROJECT_ROOT/src"
DATA_DIR="$PROJECT_ROOT/data/processed"
LOG_DIR="$PROJECT_ROOT/logs"

mkdir -p "$LOG_DIR"

# -----------------------------
# Environment variables
# -----------------------------
export ENV="dev"
export MODEL_NAME="demo_model"
export MODEL_VERSION="v1"
export DATA_PATH="$DATA_DIR/train.csv"
export MODEL_DIR="$PROJECT_ROOT/models"
export LOG_DIR="$LOG_DIR"

# -----------------------------
# Logging
# -----------------------------
PIPELINE_LOG="$LOG_DIR/pipeline_$(date +%Y%m%d_%H%M%S).log"
exec > >(tee -a "$PIPELINE_LOG") 2>&1

echo "🚀 Starting MLOps pipeline"
echo "Environment: $ENV"
echo "Model: $MODEL_NAME ($MODEL_VERSION)"
echo "Project root: $PROJECT_ROOT"

# -----------------------------
# Data validation (basic)
# -----------------------------
if [ ! -f "$DATA_PATH" ]; then
echo "❌ Training data not found at $DATA_PATH"
exit 1
fi

echo "✅ Training data found"

# -----------------------------
# Training step
# -----------------------------
echo "🏋️ Running model training..."
python "$SRC_DIR/train.py"

echo "✅ Training completed"

# -----------------------------
# Pipeline finished
# -----------------------------
echo "🎉 Pipeline finished successfully"
echo "Logs saved to $PIPELINE_LOG"

🧪 Example: Check Logs

tail -f logs/train.log

📄 logs/train_20260118_093215.log (Sample)

2026-01-18 09:32:15,104 - INFO - Starting training pipeline
2026-01-18 09:32:15,105 - INFO - Model: demo_model, Version: v1
2026-01-18 09:32:15,105 - INFO - Loading data from data/processed/train.csv
2026-01-18 09:32:15,321 - INFO - Training dataset loaded successfully
2026-01-18 09:32:15,322 - INFO - Total records: 12,000
2026-01-18 09:32:15,323 - INFO - Features: 15 | Label column: label
2026-01-18 09:32:15,330 - INFO - Splitting data (80% train / 20% validation)
2026-01-18 09:32:15,342 - INFO - Train samples: 9,600
2026-01-18 09:32:15,342 - INFO - Validation samples: 2,400
2026-01-18 09:32:15,351 - INFO - Initializing LogisticRegression model
2026-01-18 09:32:15,359 - INFO - Training model...
2026-01-18 09:32:16,912 - INFO - Model training completed successfully
2026-01-18 09:32:16,918 - INFO - Running validation
2026-01-18 09:32:16,934 - INFO - Validation accuracy: 0.8725
2026-01-18 09:32:16,940 - INFO - Saving model artifact
2026-01-18 09:32:16,946 - INFO - Model saved to models/demo_model_v1.joblib
2026-01-18 09:32:16,947 - INFO - Training pipeline finished successfully

How an MLOps Engineer Reads This Log

Log Section	Why It Matters
Pipeline start	Confirms job execution
Model name & version	Traceability & rollback
Dataset size	Detects data drift
Train/val split	Reproducibility
Training duration	Performance & cost
Accuracy metric	Model health
Artifact path	Deployment readiness

3️⃣ Environment Variables (Critical for MLOps)

Environment variables let you separate code from configuration.
They are used for:

Secrets (API keys)

Model paths

Environment flags (dev / prod)

Cloud credentials

🔑 Set Environment Variables

bash

export MODEL_NAME="fraud_detector"
export ENV="production"

Check:

bash

echo $MODEL_NAME

Why MLOps Uses Env Vars

Prevents hardcoding secrets
Makes Docker & CI/CD portable
Enables safe multi-environment deployments

Example in Docker / CI

bash

export AWS_ACCESS_KEY_ID=****

export AWS_SECRET_ACCESS_KEY=****

4️⃣ Practice Exercise (Hands-On)

📌 Task: Create an ML Project Directory

Run the following commands:

bash

mkdir -p ml-project/{data/{raw,processed},src,models,configs,scripts,logs}
cd ml-project
touch README.md requirements.txt

Verify:

bash

tree

(If tree isn’t installed, use ls -R)

Bonus Challenge (Optional)

Create train.py inside src/
Write a bash script run_pipeline.sh that runs training
Add an environment variable for model version

What You Learned Today

✔ Linux project structuring for ML
✔ Essential bash commands for pipelines
✔ Environment variables for secure deployments
✔ Hands-on ML project setup

Junior MLOps Engineer - Day 4: DevOps vs MLOps: https://www.wisemoneyai.com/2026/01/ai-job-training-junior-mlops-engineer.html

WiseMoney AI

Saturday, January 17, 2026

AI Job Training: Junior MLOps Engineer — Day 3 – Linux & Shell for MLOps

Goal of the Day