Project Overview
Developed a privacy-preserving movie recommendation system using federated learning techniques on the MovieLens dataset, comparing performance with centralized approaches.
Key Contributions
- Implemented Neural Collaborative Filtering (NCF) model using PyTorch
- Applied federated learning using PySyft to preserve user privacy
- Conducted comparative analysis between federated and centralized approaches
Technical Highlights
Data Processing:
- Preprocessed MovieLens dataset, encoding user/movie IDs and normalizing ratings
- Implemented data splitting for training and testing
Model Architecture:
- Developed NCF model combining Matrix Factorization and Multi-Layer Perceptron
- Implemented separate embedding layers for enhanced feature learning
Federated Learning Implementation:
- Utilized PySyft for simulating distributed environment
Key Findings
- Federated NCF (RMSE: 1.048) performed comparably to centralized NCF (RMSE: 0.9619)
- Larger batch sizes improved federated learning performance
- Complex models (NCF) outperformed simpler ones (Matrix Factorization) in federated settings
Challenges Overcome
- Balanced communication frequency and computation in federated setup
- Managed memory constraints in PySyft simulations
- Debugged distributed learning environment