Data Science Internship Tasks

Objective

Provide interns with end-to-end data science experience: from data cleaning and exploratory analysis to model building, evaluation and deployment. Projects are directly applicable to real business problems.

Features

Exploratory Data Analysis (EDA) and feature engineering
Supervised ML modeling and evaluation
Deep learning basics (optional) and model deployment
Reproducible notebooks and clear reporting

Technologies & Tools

Python 3.x Pandas / NumPy scikit-learn Matplotlib / Seaborn Jupyter / Colab

Beginner Level Tasks

Download the assigned Kaggle dataset and open it in Jupyter or Google Colab.
Perform basic EDA: missing values, data types, summary statistics.
Create at least 4 visualizations describing key dataset aspects.
Write a short README describing dataset and initial observations.

Note: Out of the 4 main tasks below, you are required to complete any 3 tasks.

Tasks (4)

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/zomato

Goal

Analyze restaurant and review data to extract insights on ratings, cuisines, location preferences and factors affecting ratings.

Requirements

Data cleaning (handle text fields, missing values, currency conversions if needed)
Explore relationships: cuisine vs rating, location hotspots, price vs rating
Build visualizations: heatmaps, wordclouds for reviews or popular cuisines
Provide 5 recommendations for Alfido Tech style platform (e.g., partnership, content ideas)

Deliverables

Notebook (Jupyter/Colab) with cleaned data & visualizations
PDF report with key findings and recommendations

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/loan-approval-prediction-case-study

Goal

Build a supervised model to predict loan approval using borrower features. Focus on preprocessing, handling imbalance and evaluation.

Requirements

Data preprocessing: missing values, encoding categorical variables, scaling
Handle class imbalance (SMOTE, undersampling or class weights)
Compare models (logistic regression, tree-based models) and report precision, recall, F1 and ROC-AUC
Provide business-oriented interpretation of model outputs

Deliverables

Notebook with modeling pipeline and metrics
Short report discussing model trade-offs and suggested threshold for deployment

Dataset

https://www.kaggle.com/datasets/bhanupratapbiswas/instgram

Goal

Analyze Instagram posts/engagement to identify best posting times, content types with high engagement and follower growth signals.

Requirements

Parse dates/times and compute engagement metrics (likes/comments per follower)
Analyze posting schedule, hashtags, and content types
Recommend an optimal content calendar and 5 strategies to increase engagement for Alfido Tech

Deliverables

Notebook with analysis and visuals
One-page strategy document with recommended posting plan

Goal

Choose a dataset or combine datasets to propose an actionable analytics solution for Alfido Tech (e.g., internship analytics, service demand forecasting).

Deliverables

Notebook + README
PPT or PDF summarizing findings and recommended actions

How to Submit Your Tasks

For each task:
- Create a separate document (DOC, PDF) including notebook links, screenshots, charts and an executive summary (1 page).
Upload artifacts:
- Push code & notebooks to GitHub and share repository links. Upload large files/models to Google Drive if needed.
Submit links:
- Go to the Task Submission page and paste your links, clearly mentioning task numbers.

Tip: Keep notebooks tidy: use sections, comments, and a small README so reviewers can reproduce your results quickly.

Data Science — Internship Tasks

Objective

Features

Technologies & Tools

Beginner Level Tasks

Tasks (4)

1 Zomato Dataset Analysis

Dataset

Goal

Requirements

Deliverables

2 Loan Approval Prediction

Dataset

Goal

Requirements

Deliverables

3 Instagram Data Analysis

Dataset

Goal

Requirements

Deliverables

4 Capstone — Business Use Case (Optional)

Goal

Deliverables

How to Submit Your Tasks