About Me

I’m a data-driven professional with a passion for transforming complexity into insight through analytical thinking, strategic problem-solving, and scalable, technology-driven solutions. My approach is guided by curiosity and a drive for continuous learning, allowing me to adapt quickly, explore emerging tools, and stay aligned with the evolving world of data and technology.

With a strong foundation in data analytics, machine learning, and scalable system architecture, I approach problems from multiple perspectives — blending structured analysis with creative reasoning to deliver solutions that are not only technically sound but optimized for real-world impact. I enjoy breaking down complexity, identifying what truly matters, and building solutions that are both forward-thinking and grounded in practical application.

My experience spans the full data lifecycle: from data collection and preprocessing to modeling, deployment, and visualization. I’ve worked on projects involving predictive modeling, natural language processing, time-series forecasting, and real-time dashboarding — applying tools such as Python, R, SQL, Apache Spark, Databricks, TensorFlow, Keras, Tableau, and Flask. I also bring hands-on experience with relational and NoSQL databases, including MySQL, PostgreSQL, and MongoDB, and have deployed solutions on cloud platforms like AWS and GCP.

Whether I’m leading a project or contributing within a collaborative team, I bring clarity, critical thinking, and a strong sense of ownership to everything I do. I’m currently open to opportunities where I can contribute as a Data Analyst, Data Engineer, or in any role that allows me to apply my skills to solve meaningful, real-world problems through data and innovation.

Technical Skills

Programming

Python
SQL
R

Big Data

Apache Spark PySpark Hadoop Spark SQL HDFS Databricks

Tools:

Apache Airflow Google BigQuery

NLP Techniques

BoW
Word2Vec
NER
Text Classification Sentiment Analysis LLM Prompting

Cloud Platforms

AWS Certified
EC2 S3 BigQuery

Machine Learning

Scikit-learn
TensorFlow
XGBoost

Data Analysis

Pandas NumPy Tableau Power BI Feature Engineering Rapid Miner Weka

Professional Experience

Apr 2024 - Present

Associate Data Analyst

InfiStat Analytics, Pune

Consumer Segmentation & Product Recommendation System: Developed customer segmentation models using K-Means clustering and market basket analysis (Apriori algorithm) for a retail chain. Engineered RFM (Recency, Frequency, Monetary) features from transactional data, resulting in 18% improvement in average basket value through targeted campaigns.

Python Tableau K-Means Apriori
Jan 2024 - Mar 2024

Data Analyst Intern

MedScope Analytics, Bengaluru

Patient Readmission Risk Prediction: Built machine learning models (Logistic Regression, Random Forest) predicting 30-day hospital readmissions with 85% accuracy. Processed EHR data, engineered clinical features, and created Tableau dashboards that helped reduce readmission rates by 12% through early intervention strategies.

Machine Learning Healthcare Analytics Feature Engineering Tableau
Jan 2024 - Mar 2024

Data Analyst (Freelance)

Urban Insights India

Crime Analysis Initiative: Conducted geospatial analysis of crime patterns using Python and SQL, identifying high-risk zones that helped optimize police patrol routes. Created interactive dashboards that improved law enforcement resource allocation by 22% in pilot districts.

Geospatial Analysis Data Visualization Python SQL
May 2024

Database Developer

College Collaboration Project

Library Database Management System: Designed and implemented a normalized relational database (3NF) with advanced SQL queries, stored procedures, and triggers. The system automated 90% of manual processes for a public library, reducing operational errors by 65%.

Database Design SQL Normalization ER Modeling

Featured Projects

Credit Score Prediction

Credit Score Prediction

Developed a robust credit scoring system using ML/DL techniques with 83% accuracy. Processed data with KNN imputation and chi-squared feature selection. Compared 8 algorithms (XGBoost, Random Forest, Neural Networks) with Random Forest achieving best performance. Identified key predictive features like annual income and payment history. Deployed as Flask web application with model interpretability features.

Machine Learning Flask XGBoost Feature Engineering
NYC Taxi Fare Prediction

NYC Taxi Fare Prediction

Built scalable fare prediction system processing millions of rides using Apache Spark/MLlib on Databricks. Integrated NoSQL for real-time data access and LLMs for advanced feature extraction. Identified key cost drivers like trip distance and time of day. System enables dynamic pricing strategies and anomaly detection with distributed computing architecture for high-volume predictions.

Apache Spark Big Data Databricks NoSQL
Toxic Comment Detection

Toxic Comment Detection

NLP system detecting harmful content with bias mitigation. Processed text with BoW and LSTM networks, achieving high accuracy while reducing false positives on identity terms. Special preprocessing for fairness with evaluation using MSE. Revealed patterns in language toxicity and improved classification of sensitive terms. Future work includes ensemble modeling and transfer learning integration.

NLP LSTM Bias Mitigation Text Analysis
Solar Energy Siting

Solar Energy Siting Optimization

Identified optimal solar installation sites in NY using soil analysis. Conducted EDA and regression on NYSERDA data to assess texture, drainage, and productivity. Classified land into low-impact zones, revealing ideal sites with minimal agricultural disruption. Provides policymakers with data-driven recommendations for sustainable energy development aligned with land conservation.

Data Analysis Geospatial Sustainability Regression

Education

2024 - Present

M.S. in Data Analytics Engineering

George Mason University

Specialized program combining data analytics, machine learning engineering, and big data systems. Focus areas include:

  • Data Analytics: Business intelligence, exploratory data analysis, and visualization (Tableau, PowerBI)
  • Machine Learning: End-to-end pipeline development from data cleaning to model deployment
  • Big Data: Distributed computing with Spark, Hadoop, and cloud platforms (AWS, GCP)
  • Advanced Analytics: Time series forecasting, NLP, and optimization techniques
Developing production-grade analytical solutions using Python, SQL, and modern data stacks.

Data Warehousing Predictive Analytics ETL Pipelines Cloud Computing
2019 - 2023

B.Tech in Mechanical Engineering

Anil Neerukonda Institute of Technology & Sciences

Gained strong foundations in computational problem-solving through:

  • System dynamics modeling and simulation
  • Statistical analysis of mechanical systems
  • CAD/CAM programming and automation
  • Experimental data collection and analysis

MATLAB/Python Computational Fluid Dynamics Finite Element Analysis

Get In Touch

I'm currently open to new opportunities and interesting projects. Feel free to reach out if you'd like to collaborate or just say hello!

Send Me a Message

Email

rkarumaj@gmu.edu

Write me

Phone

+1 (732) 971-1339

Call me

Schedule Meeting

Available Mon-Fri, 9am-5pm EST

Book a call