Machine Learning Midterm Cheat Sheet

Access Google Docs Link

📦 1. Importing Libraries

Import all essential libraries for data analysis, visualization, and ML models.

Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

# sklearn modules for dataset loading, splitting, and modeling
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import r2_score

📊 2. Load and Inspect Dataset

Load dataset and inspect basic structure and stats.

Python
# Example: load your own CSV file
data = pd.read_csv("your_dataset.csv")

# View first few rows
print(data.head())

# View column names and types
print(data.info())

# Get summary statistics for numeric columns
print(data.describe())

🔍 3. Select Features and Target

Define the input features (X) and target variable (y).

Python
# Example: assume 'target' column is what we want to predict
X = data.drop(columns=['target'])
y = data['target']

print("Shape of X:", X.shape)
print("Shape of y:", y.shape)

📈 4. Visualizing Data

Use Seaborn and Matplotlib to explore variable relationships.

Python
# Scatter plot between one feature and target
feature = 'your_feature_name'  # replace with actual column name
plt.figure(figsize=(8, 6))
plt.scatter(X[feature], y)
plt.xlabel(feature)
plt.ylabel("Target Value")
plt.title(f"{feature} vs Target")
plt.show()

# Correlation heatmap
plt.figure(figsize=(8, 6))
sb.heatmap(data.corr(), annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Feature Correlation Heatmap")
plt.show()

🤖 5. Train-Test Split

Split dataset into training and test sets.

Python
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

📉 6. Multiple Linear Regression

Fit a Linear Regression model using all available features.

Python
est = LinearRegression()
est.fit(X_train, y_train)

y_pred = est.predict(X_test)

print("Multiple Linear Regression")
print("R² Score:", r2_score(y_test, y_pred))
print("Weights (Coefficients):", est.coef_)
print("Bias (Intercept):", est.intercept_)

🧠 7. Lasso & Ridge Regression (Regularization)

Lasso shrinks coefficients to zero (feature selection); Ridge penalizes large coefficients (reduces overfitting).

Python
# Lasso
est_lasso = Lasso(alpha=0.1) 
est_lasso.fit(X_train, y_train)
print("Lasso R²:", r2_score(y_test, est_lasso.predict(X_test)))

# Ridge
est_ridge = Ridge(alpha=1.0)
est_ridge.fit(X_train, y_train)
print("Ridge R²:", r2_score(y_test, est_ridge.predict(X_test)))