About Me
I’m a data-driven professional with a passion for transforming complexity into insight through analytical thinking, strategic problem-solving, and scalable, technology-driven solutions. My approach is guided by curiosity and a drive for continuous learning, allowing me to adapt quickly, explore emerging tools, and stay aligned with the evolving world of data and technology.
With a strong foundation in data analytics, machine learning, and scalable system architecture, I approach problems from multiple perspectives — blending structured analysis with creative reasoning to deliver solutions that are not only technically sound but optimized for real-world impact. I enjoy breaking down complexity, identifying what truly matters, and building solutions that are both forward-thinking and grounded in practical application.
My experience spans the full data lifecycle: from data collection and preprocessing to modeling, deployment, and visualization. I’ve worked on projects involving predictive modeling, natural language processing, time-series forecasting, and real-time dashboarding — applying tools such as Python, R, SQL, Apache Spark, Databricks, TensorFlow, Keras, Tableau, and Flask. I also bring hands-on experience with relational and NoSQL databases, including MySQL, PostgreSQL, and MongoDB, and have deployed solutions on cloud platforms like AWS and GCP.
Whether I’m leading a project or contributing within a collaborative team, I bring clarity, critical thinking, and a strong sense of ownership to everything I do. I’m currently open to opportunities where I can contribute as a Data Analyst, Data Engineer, or in any role that allows me to apply my skills to solve meaningful, real-world problems through data and innovation.
Technical Skills
Programming
Big Data
Tools:
Apache Airflow Google BigQueryNLP Techniques
Cloud Platforms
Machine Learning
Data Analysis
Professional Experience
Associate Data Analyst
InfiStat Analytics, PuneConsumer Segmentation & Product Recommendation System: Developed customer segmentation models using K-Means clustering and market basket analysis (Apriori algorithm) for a retail chain. Engineered RFM (Recency, Frequency, Monetary) features from transactional data, resulting in 18% improvement in average basket value through targeted campaigns.
Data Analyst Intern
MedScope Analytics, BengaluruPatient Readmission Risk Prediction: Built machine learning models (Logistic Regression, Random Forest) predicting 30-day hospital readmissions with 85% accuracy. Processed EHR data, engineered clinical features, and created Tableau dashboards that helped reduce readmission rates by 12% through early intervention strategies.
Data Analyst (Freelance)
Urban Insights IndiaCrime Analysis Initiative: Conducted geospatial analysis of crime patterns using Python and SQL, identifying high-risk zones that helped optimize police patrol routes. Created interactive dashboards that improved law enforcement resource allocation by 22% in pilot districts.
Database Developer
College Collaboration ProjectLibrary Database Management System: Designed and implemented a normalized relational database (3NF) with advanced SQL queries, stored procedures, and triggers. The system automated 90% of manual processes for a public library, reducing operational errors by 65%.
Featured Projects
Credit Score Prediction
Developed a robust credit scoring system using ML/DL techniques with 83% accuracy. Processed data with KNN imputation and chi-squared feature selection. Compared 8 algorithms (XGBoost, Random Forest, Neural Networks) with Random Forest achieving best performance. Identified key predictive features like annual income and payment history. Deployed as Flask web application with model interpretability features.
NYC Taxi Fare Prediction
Built scalable fare prediction system processing millions of rides using Apache Spark/MLlib on Databricks. Integrated NoSQL for real-time data access and LLMs for advanced feature extraction. Identified key cost drivers like trip distance and time of day. System enables dynamic pricing strategies and anomaly detection with distributed computing architecture for high-volume predictions.
Toxic Comment Detection
NLP system detecting harmful content with bias mitigation. Processed text with BoW and LSTM networks, achieving high accuracy while reducing false positives on identity terms. Special preprocessing for fairness with evaluation using MSE. Revealed patterns in language toxicity and improved classification of sensitive terms. Future work includes ensemble modeling and transfer learning integration.
Solar Energy Siting Optimization
Identified optimal solar installation sites in NY using soil analysis. Conducted EDA and regression on NYSERDA data to assess texture, drainage, and productivity. Classified land into low-impact zones, revealing ideal sites with minimal agricultural disruption. Provides policymakers with data-driven recommendations for sustainable energy development aligned with land conservation.
Education
M.S. in Data Analytics Engineering
George Mason UniversitySpecialized program combining data analytics, machine learning engineering, and big data systems. Focus areas include:
- Data Analytics: Business intelligence, exploratory data analysis, and visualization (Tableau, PowerBI)
- Machine Learning: End-to-end pipeline development from data cleaning to model deployment
- Big Data: Distributed computing with Spark, Hadoop, and cloud platforms (AWS, GCP)
- Advanced Analytics: Time series forecasting, NLP, and optimization techniques
B.Tech in Mechanical Engineering
Anil Neerukonda Institute of Technology & SciencesGained strong foundations in computational problem-solving through:
- System dynamics modeling and simulation
- Statistical analysis of mechanical systems
- CAD/CAM programming and automation
- Experimental data collection and analysis
Get In Touch
I'm currently open to new opportunities and interesting projects. Feel free to reach out if you'd like to collaborate or just say hello!