Perfect β€” here is Lab 2: Linear Regression in Python, written exactly in the same one-action-at-a-time format your intern can follow in Jupyter Notebook.

This lab assumes they completed Lab 1 and already have:


LAB 2 β€” Build Your First Machine Learning Model: Linear Regression

Goal: Learn how to train, evaluate, and interpret a Linear Regression model using cleaned, encoded, and scaled data from Lab 1.


πŸ”Ή Part 0 – Setup

βœ… Action 0.1 – Import required ML libraries

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

πŸ”Ή Part 1 – Initialize the Model

βœ… Action 1.1 – Create a Linear Regression model object

model = LinearRegression()
model

You should see:

LinearRegression()

This means your model object was created successfully.


πŸ”Ή Part 2 – Train the Model (Fit)

βœ… Action 2.1 – Train using Xtrainscaled and y_train

model.fit(X_train_scaled, y_train)

Intern should understand: The model learns the best possible line that fits the training data.


πŸ”Ή Part 3 – Make Predictions

βœ… Action 3.1 – Predict on training data

y_train_pred = model.predict(X_train_scaled)
y_train_pred[:5]

βœ… Action 3.2 – Predict on test data

y_test_pred = model.predict(X_test_scaled)
y_test_pred[:5]

Expected: Random-looking numbers close to actual scores (but not perfect β€” this is normal).


πŸ”Ή Part 4 – Evaluate the Model

We will use 3 evaluation metrics:


βœ… Action 4.1 – Training set evaluation

train_mae = mean_absolute_error(y_train, y_train_pred)
train_mse = mean_squared_error(y_train, y_train_pred)
train_r2  = r2_score(y_train, y_train_pred)

train_mae, train_mse, train_r2

You should see numbers like:


βœ… Action 4.2 – Test set evaluation

test_mae = mean_absolute_error(y_test, y_test_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
test_r2  = r2_score(y_test, y_test_pred)

test_mae, test_mse, test_r2

Ideally:

If test performance is much worse, the model is overfitting.


πŸ”Ή Part 5 – Interpret the Model (Coefficients)

Linear Regression gives us:


βœ… Action 5.1 – View the model intercept

model.intercept_

This is the predicted score when all features are zero (scaled values).


βœ… Action 5.2 – View the coefficients

model.coef_

This gives a list of coefficient values.

But this is useless without knowing which coefficient belongs to which feature.


βœ… Action 5.3 – Create a feature–coefficient table

coef_table = pd.DataFrame({
    "Feature": X_train.columns,
    "Coefficient": model.coef_
}).sort_values(by="Coefficient", ascending=False)

coef_table

This shows:

Intern should observe:


πŸ”Ή Part 6 – Visualize Predictions

βœ… Action 6.1 – Scatter plot: Actual vs Predicted (Test data)

plt.scatter(y_test, y_test_pred)
plt.xlabel("Actual Final Scores")
plt.ylabel("Predicted Final Scores")
plt.title("Actual vs Predicted – Linear Regression")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color="red")  # perfect prediction line
plt.show()

Intern should understand:


πŸ”Ή Part 7 – Use the Model for New Predictions (Inference)

Let's simulate a new student:

We must format data with exact same columns as X_train.

βœ… Action 7.1 – Create a new sample input

new_student = pd.DataFrame({
    "studyhours": [4],
    "prevscore": [70],
    "gradelevel_encoded": [11],
    "gender_M": [1],
    "gender_Unknown": [0],
    "city_Mumbai": [1],
    "city_Delhi": [0]
})
new_student

βœ… Action 7.2 – Apply SAME SCALER used earlier

new_student_scaled = scaler.transform(new_student)

βœ… Action 7.3 – Predict using the model

model.predict(new_student_scaled)

You will get a predicted FinalScore (e.g., 75.4).


πŸŽ‰ Lab 2 Completed!

The intern now knows how to: