Machine Learning — Internship Tasks

Objective, Features, Technologies and tasks to learn Machine Learning fundamentals and practical skills.

Objective

Give interns hands-on experience with classic machine learning workflows: data preprocessing, feature engineering, model training, evaluation and simple deployment-ready artifacts.


Features


Tools

Python (scikit-learn) Pandas / NumPy Jupyter / Colab Matplotlib / Seaborn Joblib / Pickle (model save)

Beginner Level Tasks


Note: Out of the 4 main tasks below, you are required to complete any 3 tasks.

Tasks (4)

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/iris-classification-dataset

Goal

Build a classification model to predict iris species using classic features (sepal/petal length & width).

Requirements
  • Perform EDA and visualize class separability
  • Train & compare algorithms (k-NN, Logistic Regression, Decision Tree)
  • Report metrics: accuracy, confusion matrix, precision/recall
  • Save best model (pickle/joblib) and include example inference code
Deliverables
  1. Notebook with code & plots
  2. Saved model file and README with inference example

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/house-price-prediction

Goal

Build a regression model to predict house prices. Focus on feature engineering, handling missing values and model selection.

Requirements
  • Perform EDA and feature transformations (log, encoding, scaling)
  • Compare models: Linear Regression, Random Forest, Gradient Boosting
  • Report RMSE/MAE and residual analysis
  • Provide a small notebook cell showing how to use the model for prediction
Deliverables
  1. Notebook with preprocessing, models and evaluation
  2. Saved model and instructions to run predictions

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/titanic-survival-datasets

Goal

Predict survival on the Titanic using passenger attributes. Emphasize feature creation (title, family size), missing value strategies and model explainability.

Requirements
  • Feature engineering: extract titles, family size, cabin presence
  • Handle missing data (age, cabin) and encode categorical variables
  • Train classification models and use SHAP or feature importance for explanation
  • Provide final metrics and an inference example
Deliverables
  1. Notebook with preprocessing, models, and explanations
  2. Saved model and README with inference example

Goal

Combine datasets or choose a small real-world problem to deliver an end-to-end ML pipeline with clear business insights.

Deliverables
  1. Notebook + README
  2. PPT/PDF summary of approach and action items

How to Submit Your Tasks

  1. For each task:
    • Create a separate document (DOC/PDF) including notebook link, model file, screenshots, and a 1‑page executive summary.
  2. Upload artifacts:
    • Push notebooks & code to GitHub; upload large files to Google Drive and ensure sharing permissions.
  3. Submit links:
    • Go to the Task Submission page and paste your links clearly mentioning task numbers.

Tip: Keep notebooks well-structured with markdown sections and a README listing package versions and exact commands to reproduce results.