Machine Learning Internship Tasks

Objective

Give interns hands-on experience with classic machine learning workflows: data preprocessing, feature engineering, model training, evaluation and simple deployment-ready artifacts.

Features

Supervised learning projects (classification & regression)
Model evaluation and comparison
Feature engineering and preprocessing best practices
Notebook-based reproducible work and clear reporting

Tools

Python (scikit-learn) Pandas / NumPy Jupyter / Colab Matplotlib / Seaborn Joblib / Pickle (model save)

Beginner Level Tasks

Set up Python environment and open a Colab/Jupyter notebook.
Load CSV dataset and show basic EDA: head, info, describe, missing values.
Visualize relationships using 4 plots (scatter, boxplot, histogram, pairplot or bar chart).
Train a simple baseline model (e.g., Logistic Regression for classification, Linear Regression for regression) and report metrics.

Note: Out of the 4 main tasks below, you are required to complete any 3 tasks.

Tasks (4)

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/iris-classification-dataset

Goal

Build a classification model to predict iris species using classic features (sepal/petal length & width).

Requirements

Perform EDA and visualize class separability
Train & compare algorithms (k-NN, Logistic Regression, Decision Tree)
Report metrics: accuracy, confusion matrix, precision/recall
Save best model (pickle/joblib) and include example inference code

Deliverables

Notebook with code & plots
Saved model file and README with inference example

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/house-price-prediction

Goal

Build a regression model to predict house prices. Focus on feature engineering, handling missing values and model selection.

Requirements

Perform EDA and feature transformations (log, encoding, scaling)
Compare models: Linear Regression, Random Forest, Gradient Boosting
Report RMSE/MAE and residual analysis
Provide a small notebook cell showing how to use the model for prediction

Deliverables

Notebook with preprocessing, models and evaluation
Saved model and instructions to run predictions

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/titanic-survival-datasets

Goal

Predict survival on the Titanic using passenger attributes. Emphasize feature creation (title, family size), missing value strategies and model explainability.

Requirements

Feature engineering: extract titles, family size, cabin presence
Handle missing data (age, cabin) and encode categorical variables
Train classification models and use SHAP or feature importance for explanation
Provide final metrics and an inference example

Deliverables

Notebook with preprocessing, models, and explanations
Saved model and README with inference example

Goal

Combine datasets or choose a small real-world problem to deliver an end-to-end ML pipeline with clear business insights.

Deliverables

Notebook + README
PPT/PDF summary of approach and action items

How to Submit Your Tasks

For each task:
- Create a separate document (DOC/PDF) including notebook link, model file, screenshots, and a 1‑page executive summary.
Upload artifacts:
- Push notebooks & code to GitHub; upload large files to Google Drive and ensure sharing permissions.
Submit links:
- Go to the Task Submission page and paste your links clearly mentioning task numbers.

Tip: Keep notebooks well-structured with markdown sections and a README listing package versions and exact commands to reproduce results.

Machine Learning — Internship Tasks

Objective

Features

Tools

Beginner Level Tasks

Tasks (4)

1 Iris Classification

Dataset

Goal

Requirements

Deliverables

2 House Price Prediction

Dataset

Goal

Requirements

Deliverables

3 Titanic Survival Prediction

Dataset

Goal

Requirements

Deliverables

4 Capstone — Mini ML Use Case (Optional)

Goal

Deliverables

How to Submit Your Tasks