Roadmap to Data Science
1 min readApr 28, 2023
Data science tasks in the order of complexity
- Fundamental knowledge
- Basic mathetics (Linear algebra, Calculus, Probability, and statistics)
- Programming languages: Python;
- Database management tools: SQL;
- Interactive visualization tools: Power Bi, and Tableau;
- MS-Excel - Data manipulation and visualization
- Data manipulation Python libraries: Numpy, Pandas
- Data visualization Python libraries: Matplotlib, Seaborn
- Other libraries: Plotly, ggplot2(R-programming), Dash - Some Exploratory data analysis projects steps
- Exploratory data analysis techniques
- Data Cleaning
- Handling missing data
- Data Scaling and normalization
- Outliers detection and treatment. - Machine learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
- Model evaluation and validation - Deep learning
- neural network
- Convolution neural networks (CNNs)
- Recurrent Neural networks (RNNs)
- long Short-term memory (LSTM) and Gated Recurrent Units (GRU)
- Generative Adversaarial networks (GANs) - Some advanced topics
- Natural Language Processing (NLP): Text processing, word Embeddings (Word2Vec, Glove), Recurrent Neural Networks for NLP
- Time Series Analysis (Time-series Decomposition, Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA, Exponential Smoothing Methods, Prophet)
- Bayesian Statistics and Probabilistic Programming (Bayesian Inference, Markov Chain MMonte Carlo (MCMC)) - Big Data Technologies
- Hadoop
- MapReduce
- HDFS
- Spark (RDDs, DataFrames, Mlip)
- NoSQL databases (MongoDB, Cassandra, HBase, Couchbase)
- Stream Processing frameworks (Apache Kafka, AApache Flink, Apache Storm)
(Credit: Waleed Mousa)