Deep Learning Fundamentals

Dr. Arun Kumar Pandey (Ph.D.)

8 min readFeb 18, 2024

Content

Deep Learning

Deep learning is a subset of machine learning that focuses on artificial neural networks with multiple layers, enabling…

arunp77.github.io

Introduction

Deep learning is a subset of machine learning that focuses on artificial neural networks with multiple layers, enabling the model to learn hierarchical representations of data. These neural networks are composed of multiple layers of interconnected nodes, called neurons, that can learn hierarchical representations of the data. Deep learning algorithms excel at processing complex, high-dimensional data, such as images, speech, and text. DL has gained significant attention and success in recent years, achieving state-of-the-art performance in various domains, including computer vision, natural language processing, and speech recognition.

Key Differences Between Machine Learning And Deep Learning

If deep learning is a subset of machine learning, how do they differ? Deep learning distinguishes itself from classical machine learning by the type of data that it works with and the methods in which it learns.

Machine learning algorithms leverage structured, labeled data to make predictions — meaning that specific features are defined from the input data for the model and organized into tables. This doesn’t necessarily mean that it doesn’t use unstructured data; it just means that if it does, it generally goes through some pre-processing to organize it into a structured format.

Deep learning eliminates some of the data pre-processing that is typically involved with machine learning. These algorithms can ingest and process unstructured data, like text and images, and it automates feature extraction, removing some of the dependency on human experts.

Example: For example, let’s say that we had a set of photos of different pets, and we wanted to categorize them by “cat”, “dog”, “hamster”, et cetera. Deep learning algorithms can determine which features (e.g. ears) are most important to distinguish each animal from another. In machine learning, this hierarchy of features is established manually by a human expert. Then, through the processes of gradient descent and backpropagation, the deep learning algorithm adjusts and fits itself for accuracy, allowing it to make predictions about a new photo of an animal with increased precision.

Deep Learning model classification

Deep learning models can be classified into several categories based on their architecture and the type of data they are designed to process. Here are some common types of deep learning models along with examples:

Convolutional Neural Networks (CNNs): Convolutional Neural Networks (CNNs) are a class of deep learning models primarily used for image and video processing (computer vision) tasks. They excel at learning spatial hierarchies and capturing local patterns. CNNs consist of convolutional layers that apply filters to input images, enabling feature extraction. These filters detect patterns such as edges, textures, and shapes. Following convolutional layers, pooling layers are often used to downsample the data, reducing dimensionality while preserving important features. Finally, fully connected layers are employed for classification tasks, where the extracted features are combined to make predictions. CNNs have achieved breakthrough performance in various computer vision tasks, including image classification, object detection, and image segmentation. Example: An example of CNN is the VGG (Visual Geometry Group) network, which achieved high performance on image classification tasks. Another example is the popular ResNet (Residual Network), known for its deep architecture and efficient training.
Recurrent Neural Networks (RNNs): Recurrent Neural Networks (RNNs) are a class of deep learning models specifically designed for processing sequential data, such as time series or natural language. They are equipped with internal memory, which enables them to maintain a memory of past inputs and process information with temporal dependencies. This memory allows RNNs to capture sequential patterns and context, making them well-suited for tasks like speech recognition, language modeling, and machine translation. RNNs incorporate recurrent connections within their architecture, forming loops that allow information to persist and be passed from one time step to another, facilitating the modeling of sequential relationships in the data. Example: An example of an RNN is the Long Short-Term Memory (LSTM) network, which addresses the vanishing gradient problem in traditional RNNs and is widely used for tasks like speech recognition, language translation, and text generation.
Generative Adversarial Networks (GANs): GANs consist of two neural networks — a generator and a discriminator — competing against each other. The generator network learns to create realistic data instances, such as images, while the discriminator network tries to distinguish between the generated data and real data. GANs are popular for generating synthetic data, image synthesis, image-to-image translation, and even creating deepfake videos. Example: An example of GAN is the DCGAN (Deep Convolutional Generative Adversarial Network), which extends the GAN architecture with convolutional layers. DCGANs are used for generating high-quality images and video synthesis.
Autoencoders: Autoencoders are neural networks trained to reconstruct input data, typically used for dimensionality reduction and feature learning. They consist of an encoder network that compresses the input into a lower-dimensional representation (encoding) and a decoder network that reconstructs the input from the encoding. Example: An example of an autoencoder is the Variational Autoencoder (VAE), which adds a probabilistic interpretation to the latent space learned by the encoder. VAEs are used for generating new data samples and performing data imputation.
Transformer Models: Transformer models are designed for processing sequential data, particularly suited for natural language processing tasks. They rely on self-attention mechanisms to capture long-range dependencies in input sequences without recurrent connections. Example: The Transformer architecture, introduced in the paper “Attention is All You Need”, is widely used in state-of-the-art NLP models such as BERT (Bidirectional Encoder Representations from Transformers) for tasks like text classification, question answering, and language translation.

Neural Networks

Neural networks are a computational model inspired by the structure and function of biological neural networks in the human brain. They consist of interconnected nodes, called neurons, organized in layers. Each neuron receives input signals, performs a computation, and produces an output signal. Neural networks are capable of learning complex patterns and relationships in data, making them powerful tools for various machine-learning tasks.

Here’s the core idea:

Neurons: The basic units of computation. In an artificial neural network, these are usually simple mathematical functions. They receive inputs, process them, and produce an output.
Connections: Like neurons of the brain are connected by synapses, artificial neurons are linked. Each connection has a weight that represents its importance.
Learning: The power of neural networks lies in their ability to adjust the weights of these connections based on training data. This allows them to learn complex patterns and relationships.

Types of Neural Network Diagrams

Simple Feedforward Network:

Structure: Neurons arranged in layers. Data flows in one direction, from input to output.
Layers:
Input Layer: Takes the raw data.
Hidden Layers: One or more layers where computation happens.
Output Layer: Generates the final prediction or classification.

2. Convolutional Neural Network (CNN):

Specialization: Highly effective for image and video analysis.
Convolutional Layers: Extract features from images using filters. Think of these filters as sliding windows searching for edges, shapes, textures, etc.
Pooling Layers: Reduce data size and make the network more robust to slight variations in the input.

3. Recurrent Neural Network (RNN)

Memory: Designed to handle sequential data (text, time series). These networks have internal memory (hidden states) which allows them to ‘remember’ past information.
Common Architectures: LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit) networks are more advanced types of RNNs designed to address potential shortcomings in the plain RNN.

4. Autoencoders: Used for dimensionality reduction and data compression. They consist of an encoder and decoder; the encoder learns to compress data into a compact representation, while the decoder tries to reconstruct the original input from this compressed form.

5. Generative Adversarial Networks (GANs): Two neural networks playing a game. One network (generator) produces synthetic data, while the other (discriminator) tries to distinguish real data from the generated data. This leads to models capable of creating incredibly realistic fake images, audio, and even text.

Key Points:

Neural networks are incredibly flexible — the right architecture depends on the problem you’re solving.
Diagrams make a complex topic visual and easier to grasp.
Deep learning utilizes multiple layers of neurons, which is where the ‘deep’ comes from.

Why Deep Learning Is So Effective

Learning Complex Patterns: Deep learning systems can discover intricate patterns in huge datasets too complex for humans to spot.
Generalization: They generalize well to new, unseen data because these complex patterns help them make intelligent inferences.
Feature Engineering: Traditionally, extracting meaningful features from data was labor-intensive. Deep learning automates much of this, allowing the model to discover helpful representations.

Understanding the Neural Network Jargon

The important jargon in the world of neural networks is:

Neuron (or Perceptron): The basic computational unit in a neural network. It receives inputs, performs a weighted sum of those inputs, potentially applies an activation function, and produces an output.

As seen in the figure, it works in two steps — it calculates the weighted sum of its inputs and then applies an activation function to normalize the sum. The activation functions can be linear or nonlinear. Also, there are weights associated with each input of a neuron. These are the parameters that the network has to learn during the training phase.
Input Layer: Where you feed the raw data (pixels of an image, words in a sentence) into the network.
Hidden Layers: Layers between the input and output. This is where the majority of the computation and “learning” occurs. The more hidden layers, the ‘deeper’ the neural network.
Output Layer: Delivers the final prediction or classification from the network. The activation function to be used in this layer is different for different problems. For a binary classification problem, we want the output to be either 0 or 1. Thus, a sigmoid activation function is used. For a Multiclass classification problem, a Softmax (think of it as a generalization of sigmoid to multiple classes ) is used. For a regression problem, where the output is not a predefined category, we can simply use a linear unit.
Weights: Each connection between neurons has a weight, signifying its relative importance to the neural network’s final prediction. Training a neural network involves fine-tuning these weights.
Bias: Acts as a sort of intercept term, adding an extra value to a neuron’s calculation. This helps give the network extra flexibility when fitting the data.

References

Visit my portfolio for Deep learning
Getting started with Keras, Tensorflow, and Deep Learning.
Introduction to Deep Learning
What is deep learning?
Deep learning architectures
Hands-on Machine Learning with Scikit-Learn, Keras, & TensorFlow, Aurelien Geron
Github repo