Objective
Give interns hands-on experience with classic machine learning workflows: data preprocessing, feature engineering, model training, evaluation and simple deployment-ready artifacts.
Features
- Supervised learning projects (classification & regression)
- Model evaluation and comparison
- Feature engineering and preprocessing best practices
- Notebook-based reproducible work and clear reporting
Tools
Beginner Level Tasks
- Set up Python environment and open a Colab/Jupyter notebook.
- Load CSV dataset and show basic EDA: head, info, describe, missing values.
- Visualize relationships using 4 plots (scatter, boxplot, histogram, pairplot or bar chart).
- Train a simple baseline model (e.g., Logistic Regression for classification, Linear Regression for regression) and report metrics.
Note: Out of the 4 main tasks below, you are required to complete any 3 tasks.
Tasks (4)
Dataset
https://www.kaggle.com/datasets/bhanupratapbiswas/iris-classification-dataset
Goal
Build a classification model to predict iris species using classic features (sepal/petal length & width).
Requirements
- Perform EDA and visualize class separability
- Train & compare algorithms (k-NN, Logistic Regression, Decision Tree)
- Report metrics: accuracy, confusion matrix, precision/recall
- Save best model (pickle/joblib) and include example inference code
Deliverables
- Notebook with code & plots
- Saved model file and README with inference example
Dataset
https://www.kaggle.com/datasets/bhanupratapbiswas/house-price-prediction
Goal
Build a regression model to predict house prices. Focus on feature engineering, handling missing values and model selection.
Requirements
- Perform EDA and feature transformations (log, encoding, scaling)
- Compare models: Linear Regression, Random Forest, Gradient Boosting
- Report RMSE/MAE and residual analysis
- Provide a small notebook cell showing how to use the model for prediction
Deliverables
- Notebook with preprocessing, models and evaluation
- Saved model and instructions to run predictions
Dataset
https://www.kaggle.com/datasets/bhanupratapbiswas/titanic-survival-datasets
Goal
Predict survival on the Titanic using passenger attributes. Emphasize feature creation (title, family size), missing value strategies and model explainability.
Requirements
- Feature engineering: extract titles, family size, cabin presence
- Handle missing data (age, cabin) and encode categorical variables
- Train classification models and use SHAP or feature importance for explanation
- Provide final metrics and an inference example
Deliverables
- Notebook with preprocessing, models, and explanations
- Saved model and README with inference example
Goal
Combine datasets or choose a small real-world problem to deliver an end-to-end ML pipeline with clear business insights.
Deliverables
- Notebook + README
- PPT/PDF summary of approach and action items
How to Submit Your Tasks
-
For each task:
- Create a separate document (DOC/PDF) including notebook link, model file, screenshots, and a 1‑page executive summary.
-
Upload artifacts:
- Push notebooks & code to GitHub; upload large files to Google Drive and ensure sharing permissions.
-
Submit links:
- Go to the Task Submission page and paste your links clearly mentioning task numbers.
Tip: Keep notebooks well-structured with markdown sections and a README listing package versions and exact commands to reproduce results.