Roadmap to Data Science

Dr. Arun Kumar Pandey (Ph.D.)
1 min readApr 28, 2023

--

Data science tasks in the order of complexity

  1. Fundamental knowledge
    - Basic mathetics (Linear algebra, Calculus, Probability, and statistics)
    - Programming languages: Python;
    - Database management tools: SQL;
    - Interactive visualization tools: Power Bi, and Tableau;
    - MS-Excel
  2. Data manipulation and visualization
    - Data manipulation Python libraries: Numpy, Pandas
    - Data visualization Python libraries: Matplotlib, Seaborn
    - Other libraries: Plotly, ggplot2(R-programming), Dash
  3. Some Exploratory data analysis projects steps
    - Exploratory data analysis techniques
    - Data Cleaning
    - Handling missing data
    - Data Scaling and normalization
    - Outliers detection and treatment.
  4. Machine learning
    - Supervised learning
    - Unsupervised learning
    - Reinforcement learning
    - Model evaluation and validation
  5. Deep learning
    - neural network
    - Convolution neural networks (CNNs)
    - Recurrent Neural networks (RNNs)
    - long Short-term memory (LSTM) and Gated Recurrent Units (GRU)
    - Generative Adversaarial networks (GANs)
  6. Some advanced topics
    - Natural Language Processing (NLP): Text processing, word Embeddings (Word2Vec, Glove), Recurrent Neural Networks for NLP
    - Time Series Analysis (Time-series Decomposition, Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA, Exponential Smoothing Methods, Prophet)
    - Bayesian Statistics and Probabilistic Programming (Bayesian Inference, Markov Chain MMonte Carlo (MCMC))
  7. Big Data Technologies
    - Hadoop
    - MapReduce
    - HDFS
    - Spark (RDDs, DataFrames, Mlip)
    - NoSQL databases (MongoDB, Cassandra, HBase, Couchbase)
    - Stream Processing frameworks (Apache Kafka, AApache Flink, Apache Storm)

(Credit: Waleed Mousa)

--

--

No responses yet