Perfect — here is Lab 2: Linear Regression in Python, written exactly in the same one-action-at-a-time format your intern can follow in Jupyter Notebook.

This lab assumes they completed Lab 1 and already have:

X_train_scaled
X_test_scaled
y_train
y_test

LAB 2 — Build Your First Machine Learning Model: Linear Regression

Goal: Learn how to train, evaluate, and interpret a Linear Regression model using cleaned, encoded, and scaled data from Lab 1.

🔹 Part 0 – Setup

✅ Action 0.1 – Import required ML libraries

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

🔹 Part 1 – Initialize the Model

✅ Action 1.1 – Create a Linear Regression model object

model = LinearRegression()
model

You should see:

LinearRegression()

This means your model object was created successfully.

🔹 Part 2 – Train the Model (Fit)

✅ Action 2.1 – Train using Xtrainscaled and y_train

model.fit(X_train_scaled, y_train)

Intern should understand: The model learns the best possible line that fits the training data.

🔹 Part 3 – Make Predictions

✅ Action 3.1 – Predict on training data

y_train_pred = model.predict(X_train_scaled)
y_train_pred[:5]

✅ Action 3.2 – Predict on test data

y_test_pred = model.predict(X_test_scaled)
y_test_pred[:5]

Expected: Random-looking numbers close to actual scores (but not perfect — this is normal).

🔹 Part 4 – Evaluate the Model

We will use 3 evaluation metrics:

MAE — Mean Absolute Error
MSE — Mean Squared Error
R² Score — Goodness of fit (0 to 1, higher = better)

✅ Action 4.1 – Training set evaluation

train_mae = mean_absolute_error(y_train, y_train_pred)
train_mse = mean_squared_error(y_train, y_train_pred)
train_r2  = r2_score(y_train, y_train_pred)

train_mae, train_mse, train_r2

You should see numbers like:

MAE around 5–10
R² around 0.6–0.9 (depends on data)

✅ Action 4.2 – Test set evaluation

test_mae = mean_absolute_error(y_test, y_test_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
test_r2  = r2_score(y_test, y_test_pred)

test_mae, test_mse, test_r2

Ideally:

Test MAE similar to training MAE
Test R² close to training R²

If test performance is much worse, the model is overfitting.

🔹 Part 5 – Interpret the Model (Coefficients)

Linear Regression gives us:

Intercept
Coefficients for each feature

✅ Action 5.1 – View the model intercept

model.intercept_

This is the predicted score when all features are zero (scaled values).

✅ Action 5.2 – View the coefficients

model.coef_

This gives a list of coefficient values.

But this is useless without knowing which coefficient belongs to which feature.

✅ Action 5.3 – Create a feature–coefficient table

coef_table = pd.DataFrame({
    "Feature": X_train.columns,
    "Coefficient": model.coef_
}).sort_values(by="Coefficient", ascending=False)

coef_table

This shows:

Which features increase final score
Which features decrease final score
How strong each effect is

Intern should observe:

High positive → strong positive influence
High negative → strong negative influence
Close to zero → weak / no influence

🔹 Part 6 – Visualize Predictions

✅ Action 6.1 – Scatter plot: Actual vs Predicted (Test data)

plt.scatter(y_test, y_test_pred)
plt.xlabel("Actual Final Scores")
plt.ylabel("Predicted Final Scores")
plt.title("Actual vs Predicted – Linear Regression")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color="red")  # perfect prediction line
plt.show()

Intern should understand:

Points close to the red line = good predictions
Far away = poor predictions

🔹 Part 7 – Use the Model for New Predictions (Inference)

Let's simulate a new student:

StudyHours: 4
PrevScore: 70
GradeLevel: 11
Gender: M
City: Mumbai

We must format data with exact same columns as X_train.

✅ Action 7.1 – Create a new sample input

new_student = pd.DataFrame({
    "studyhours": [4],
    "prevscore": [70],
    "gradelevel_encoded": [11],
    "gender_M": [1],
    "gender_Unknown": [0],
    "city_Mumbai": [1],
    "city_Delhi": [0]
})
new_student

✅ Action 7.2 – Apply SAME SCALER used earlier

new_student_scaled = scaler.transform(new_student)

✅ Action 7.3 – Predict using the model

model.predict(new_student_scaled)

You will get a predicted FinalScore (e.g., 75.4).

🎉 Lab 2 Completed!

The intern now knows how to:

Train a Linear Regression model
Make predictions
Evaluate performance
Interpret coefficients
Plot actual vs predicted
Predict for new data