Objective
Provide interns with end-to-end data science experience: from data cleaning and exploratory analysis to model building, evaluation and deployment. Projects are directly applicable to real business problems.
Features
- Exploratory Data Analysis (EDA) and feature engineering
- Supervised ML modeling and evaluation
- Deep learning basics (optional) and model deployment
- Reproducible notebooks and clear reporting
Technologies & Tools
Beginner Level Tasks
- Download the assigned Kaggle dataset and open it in Jupyter or Google Colab.
- Perform basic EDA: missing values, data types, summary statistics.
- Create at least 4 visualizations describing key dataset aspects.
- Write a short README describing dataset and initial observations.
Note: Out of the 4 main tasks below, you are required to complete any 3 tasks.
Tasks (4)
Dataset
https://www.kaggle.com/datasets/bhanupratapbiswas/zomato
Goal
Analyze restaurant and review data to extract insights on ratings, cuisines, location preferences and factors affecting ratings.
Requirements
- Data cleaning (handle text fields, missing values, currency conversions if needed)
- Explore relationships: cuisine vs rating, location hotspots, price vs rating
- Build visualizations: heatmaps, wordclouds for reviews or popular cuisines
- Provide 5 recommendations for Alfido Tech style platform (e.g., partnership, content ideas)
Deliverables
- Notebook (Jupyter/Colab) with cleaned data & visualizations
- PDF report with key findings and recommendations
Dataset
https://www.kaggle.com/datasets/bhanupratapbiswas/loan-approval-prediction-case-study
Goal
Build a supervised model to predict loan approval using borrower features. Focus on preprocessing, handling imbalance and evaluation.
Requirements
- Data preprocessing: missing values, encoding categorical variables, scaling
- Handle class imbalance (SMOTE, undersampling or class weights)
- Compare models (logistic regression, tree-based models) and report precision, recall, F1 and ROC-AUC
- Provide business-oriented interpretation of model outputs
Deliverables
- Notebook with modeling pipeline and metrics
- Short report discussing model trade-offs and suggested threshold for deployment
Dataset
https://www.kaggle.com/datasets/bhanupratapbiswas/instgram
Goal
Analyze Instagram posts/engagement to identify best posting times, content types with high engagement and follower growth signals.
Requirements
- Parse dates/times and compute engagement metrics (likes/comments per follower)
- Analyze posting schedule, hashtags, and content types
- Recommend an optimal content calendar and 5 strategies to increase engagement for Alfido Tech
Deliverables
- Notebook with analysis and visuals
- One-page strategy document with recommended posting plan
Goal
Choose a dataset or combine datasets to propose an actionable analytics solution for Alfido Tech (e.g., internship analytics, service demand forecasting).
Deliverables
- Notebook + README
- PPT or PDF summarizing findings and recommended actions
How to Submit Your Tasks
-
For each task:
- Create a separate document (DOC, PDF) including notebook links, screenshots, charts and an executive summary (1 page).
-
Upload artifacts:
- Push code & notebooks to GitHub and share repository links. Upload large files/models to Google Drive if needed.
-
Submit links:
- Go to the Task Submission page and paste your links, clearly mentioning task numbers.
Tip: Keep notebooks tidy: use sections, comments, and a small README so reviewers can reproduce your results quickly.