Saturday, January 17, 2026

Junior MLOps Engineer — Day 3 – Linux & Shell for MLOps

 


Goal of the Day

Learn how Linux and shell basics power real-world MLOps pipelines—from organizing ML projects to automating training and deployment.

MLOps engineers live in the terminal. Models don’t fail because of math—they fail because of bad structure, broken scripts, or missing environment variables.


1️⃣ Folder Structures for ML Projects

A clean project structure is non-negotiable in MLOps. It enables:

  • Reproducibility

  • Collaboration

  • CI/CD automation

  • Easier debugging & monitoring

๐Ÿ“‚ Standard ML Project Structure


Why This Matters in MLOps

  • data/ → versioned and tracked

  • src/ → production code (not notebooks)

  • scripts/ → automation entry points

  • configs/ → environment-agnostic settings

  • models/ → saved artifacts for deployment

This structure maps directly to CI/CD pipelines and ML platforms.


2️⃣ Bash Commands Used in MLOps Pipelines

MLOps pipelines rely heavily on shell commands—especially in Dockerfiles, CI tools, cron jobs, and cloud VMs.


๐Ÿ“Œ Core Linux Commands You Must Know


Example: Run a Training Script

python src/train.py

import os

import logging

from datetime import datetime


import pandas as pd

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

import joblib


# -----------------------------

# Configuration (via env vars)

# -----------------------------

MODEL_NAME = os.getenv("MODEL_NAME", "demo_model")

MODEL_VERSION = os.getenv("MODEL_VERSION", "v1")

DATA_PATH = os.getenv("DATA_PATH", "data/processed/train.csv")

MODEL_DIR = os.getenv("MODEL_DIR", "models")

LOG_DIR = os.getenv("LOG_DIR", "logs")


os.makedirs(MODEL_DIR, exist_ok=True)

os.makedirs(LOG_DIR, exist_ok=True)


# -----------------------------

# Logging setup

# -----------------------------

log_file = f"{LOG_DIR}/train_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"

logging.basicConfig(

    level=logging.INFO,

    format="%(asctime)s - %(levelname)s - %(message)s",

    handlers=[

        logging.FileHandler(log_file),

        logging.StreamHandler()

    ]

)


logging.info("Starting training pipeline")

logging.info(f"Model: {MODEL_NAME}, Version: {MODEL_VERSION}")

logging.info(f"Loading data from {DATA_PATH}")


# -----------------------------

# Load data

# -----------------------------

try:

    data = pd.read_csv(DATA_PATH)

except FileNotFoundError:

    logging.error("Training data not found. Exiting.")

    raise


X = data.drop("label", axis=1)

y = data["label"]


# -----------------------------

# Train / Test split

# -----------------------------

X_train, X_test, y_train, y_test = train_test_split(

    X, y, test_size=0.2, random_state=42

)


# -----------------------------

# Model training

# -----------------------------

model = LogisticRegression(max_iter=1000)

model.fit(X_train, y_train)


logging.info("Model training completed")


# -----------------------------

# Evaluation

# -----------------------------

preds = model.predict(X_test)

accuracy = accuracy_score(y_test, preds)


logging.info(f"Validation accuracy: {accuracy:.4f}")


# -----------------------------

# Save model artifact

# -----------------------------

model_path = f"{MODEL_DIR}/{MODEL_NAME}_{MODEL_VERSION}.joblib"

joblib.dump(model, model_path)


logging.info(f"Model saved to {model_path}")

logging.info("Training pipeline finished successfully")


๐Ÿงช Example: Run Full Pipeline

bash  
    scripts/run_pipeline.sh


✅ Sample run_pipeline.sh

#!/bin/bash

# -----------------------------
# Safe bash settings
# -----------------------------
set -euo pipefail

# -----------------------------
# Project paths
# -----------------------------
PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
SRC_DIR="$PROJECT_ROOT/src"
DATA_DIR="$PROJECT_ROOT/data/processed"
LOG_DIR="$PROJECT_ROOT/logs"

mkdir -p "$LOG_DIR"

# -----------------------------
# Environment variables
# -----------------------------
export ENV="dev"
export MODEL_NAME="demo_model"
export MODEL_VERSION="v1"
export DATA_PATH="$DATA_DIR/train.csv"
export MODEL_DIR="$PROJECT_ROOT/models"
export LOG_DIR="$LOG_DIR"

# -----------------------------
# Logging
# -----------------------------
PIPELINE_LOG="$LOG_DIR/pipeline_$(date +%Y%m%d_%H%M%S).log"
exec > >(tee -a "$PIPELINE_LOG") 2>&1

echo "๐Ÿš€ Starting MLOps pipeline"
echo "Environment: $ENV"
echo "Model: $MODEL_NAME ($MODEL_VERSION)"
echo "Project root: $PROJECT_ROOT"

# -----------------------------
# Data validation (basic)
# -----------------------------
if [ ! -f "$DATA_PATH" ]; then
  echo "❌ Training data not found at $DATA_PATH"
  exit 1
fi

echo "✅ Training data found"

# -----------------------------
# Training step
# -----------------------------
echo "๐Ÿ‹️ Running model training..."
python "$SRC_DIR/train.py"

echo "✅ Training completed"

# -----------------------------
# Pipeline finished
# -----------------------------
echo "๐ŸŽ‰ Pipeline finished successfully"
echo "Logs saved to $PIPELINE_LOG"

๐Ÿงช Example: Check Logs


tail -f logs/train.log

 

๐Ÿ“„ logs/train_20260118_093215.log (Sample)

2026-01-18 09:32:15,104 - INFO - Starting training pipeline
2026-01-18 09:32:15,105 - INFO - Model: demo_model, Version: v1
2026-01-18 09:32:15,105 - INFO - Loading data from data/processed/train.csv
2026-01-18 09:32:15,321 - INFO - Training dataset loaded successfully
2026-01-18 09:32:15,322 - INFO - Total records: 12,000
2026-01-18 09:32:15,323 - INFO - Features: 15 | Label column: label
2026-01-18 09:32:15,330 - INFO - Splitting data (80% train / 20% validation)
2026-01-18 09:32:15,342 - INFO - Train samples: 9,600
2026-01-18 09:32:15,342 - INFO - Validation samples: 2,400
2026-01-18 09:32:15,351 - INFO - Initializing LogisticRegression model
2026-01-18 09:32:15,359 - INFO - Training model...
2026-01-18 09:32:16,912 - INFO - Model training completed successfully
2026-01-18 09:32:16,918 - INFO - Running validation
2026-01-18 09:32:16,934 - INFO - Validation accuracy: 0.8725
2026-01-18 09:32:16,940 - INFO - Saving model artifact
2026-01-18 09:32:16,946 - INFO - Model saved to models/demo_model_v1.joblib
2026-01-18 09:32:16,947 - INFO - Training pipeline finished successfully


How an MLOps Engineer Reads This Log

Log SectionWhy It Matters
Pipeline start
Confirms job execution
Model name & version
Traceability & rollback
Dataset size
Detects data drift
Train/val split
Reproducibility
Training duration
Performance & cost
Accuracy metric
Model health
Artifact path
Deployment readiness

 

3️⃣ Environment Variables (Critical for MLOps)

Environment variables let you separate code from configuration.

They are used for:

  • Secrets (API keys)

  • Model paths

  • Environment flags (dev / prod)

  • Cloud credentials

๐Ÿ”‘ Set Environment Variables

                  bash 

export MODEL_NAME="fraud_detector"

export ENV="production"

           

Check:

                bash 

         echo $MODEL_NAME


    Why MLOps Uses Env Vars

  • Prevents hardcoding secrets

  • Makes Docker & CI/CD portable

  • Enables safe multi-environment deployments


     Example in Docker / CI

            bash
                
                export AWS_ACCESS_KEY_ID=****
                export AWS_SECRET_ACCESS_KEY=****

 

4️⃣ Practice Exercise (Hands-On)

๐Ÿ“Œ Task: Create an ML Project Directory

Run the following commands:

    bash 

mkdir -p ml-project/{data/{raw,processed},src,models,configs,scripts,logs}

cd ml-project

touch README.md requirements.txt

Verify:

    bash

        tree

(If tree isn’t installed, use ls -R)


Bonus Challenge (Optional)

  • Create train.py inside src/

  • Write a bash script run_pipeline.sh that runs training

  • Add an environment variable for model version


What You Learned Today

✔ Linux project structuring for ML
✔ Essential bash commands for pipelines
✔ Environment variables for secure deployments
✔ Hands-on ML project setup

 

Other related links:

Junior MLOps Engineer - Day 2 Training: ML Lifecycle Deep Dive 
https://www.wisemoneyai.com/2026/01/junior-mlops-engineer-day-2-training-ml.html

30-Day Full Course (1 Hour per Day) - Day 1 included
https://www.wisemoneyai.com/2026/01/junior-mlops-engineer-30-day-full.html

No comments:

Post a Comment

Invest Smart: Philippines Stock Market: 1-Year Performance Review

The financial and market information provided on wisemoneyai.com is intended for informational purposes only. W isemoneyai.com is not liab...

Must Read