Foundations of Machine Learning: Inside Python’s NumPy Library

by Selwyn Davidraj     Posted on September 28, 2025

Foundations of Machine Learning: Inside Python’s NumPy Library

In this blog, we will explore NumPy, which is one of the most essential Python libraries for data science and numerical computation. Whether you’re just beginning your data science journey or refreshing your knowledge, NumPy will help you handle large datasets, perform mathematical operations, and power advanced machine learning workflows.


Table of contents

Introduction to NumPy

1. Getting Started with Python for Data Science

If you’re new to Python, NumPy is one of the first libraries you should learn. It’s included with distributions like Anaconda and easily available in Google Colab or Jupyter Notebooks.


2. What is NumPy?

NumPy stands for Numerical Python. It is designed to handle:

  • Large multi-dimensional arrays
  • Matrices
  • A rich set of mathematical functions

NumPy provides speed, efficiency, and the building blocks for other libraries such as pandas, scikit-learn, tensorflow, and pytorch.


3. Importing NumPy

To use NumPy, first import it. By convention, we use the alias np.

import numpy as np

This saves typing and is widely adopted in the Python community.


4. NumPy Arrays vs Python Lists

Feature Python List NumPy Array
Data Types Can store mixed types (int, str, etc.) All elements must be of the same type
Performance Slower, less memory efficient Faster, optimized for numerical ops
Use Case General-purpose storage Numerical computation, data science

Example

import numpy as np

# Python list
brands = ['Mercedes', 'BMW', 'Audi', 'Ferrari', 'Tesla']
brands_array = np.array(brands)

# Numeric data
numbers = [5, 4, 6, 7, 3]
numbers_array = np.array(numbers)

print(type(brands_array))
print(type(numbers_array))

Output:

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

5. Working with Matrices

NumPy makes it easy to create and manipulate two-dimensional arrays (matrices), which are fundamental for linear algebra and machine learning.

import numpy as np

matrix = np.array([[1, 2, 1],
                   [4, 5, 9],
                   [1, 8, 9]])

print(matrix)

Output:

[[1 2 1]
 [4 5 9]
 [1 8 9]]

📌 Each sub-list represents a row, and all rows should have the same length.


Key Array Creation Functions in NumPy

When working with data science, simulations, or machine learning, creating arrays efficiently is crucial.NumPy provides multiple functions for initializing arrays with custom values, ranges, and shapes. This section covers some of the most commonly used array creation functions.


1. np.arange()

Purpose: Creates an array with evenly spaced values within a given range.

Syntax:

np.arange(start, stop, step)

Parameters:

  • start: Starting value of the sequence.
  • stop: End value (exclusive, not included in the output).
  • step: (Optional) Spacing between values.

Examples:

import numpy as np

np.arange(0, 10)        # Output: [0 1 2 3 4 5 6 7 8 9]
np.arange(0, 20, 5)     # Output: [0 5 10 15]

📌 Tip: Stop value is not included, just like Python’s range() function.


2. np.linspace()

Purpose: Generates an array of evenly spaced numbers over a specified interval, including both start and stop values.

Syntax:

np.linspace(start, stop, num)

Parameters:

  • start: Starting value.
  • stop: Ending value (included in the array).
  • num: (Optional) Number of samples to generate (default = 50).

Examples:

np.linspace(0, 5)           # Output: 50 points between 0 and 5
np.linspace(10, 20, 10)     # Output: 10 points from 10 to 20 (inclusive)

📌 Tip: Use when both start and stop values must be precisely included.


3. np.zeros() and np.ones()

Purpose: Create arrays (of any shape) filled with zeros or ones.

Syntax:

np.zeros(shape)
np.ones(shape)

Examples:

np.zeros([3, 5])    # 3x5 matrix of zeros
np.ones([3, 5])     # 3x5 matrix of ones

Use Cases:

  • Initializing data structures
  • Creating masks
  • Placeholder arrays for computations

4. np.eye()

Purpose: Creates an identity matrix (a square matrix with 1s on the diagonal and 0s elsewhere).

Syntax:

np.eye(N)

Example:

np.eye(5)     # 5x5 identity matrix

📌 Importance: Fundamental in linear algebra, useful for matrix operations and solving systems of equations.


Efficient Array Manipulation and Mathematical Operations with NumPy

In this section, we explore essential concepts around creating, reshaping, and performing advanced mathematical operations on NumPy arrays.


1. Creating and Reshaping Arrays

NumPy makes it easy to create arrays and reshape them into different dimensions.

import numpy as np

# Create an array of numbers from 0 to 9
arr = np.arange(10)

# Reshape into a 2x5 matrix
reshaped_arr = arr.reshape((2, 5))

print("Original Array:", arr)
print("Reshaped Array:\n", reshaped_arr)

📌 Important: When reshaping, the total number of elements must remain the same as the original array.


2. Understanding Methods vs Functions

  • The reshape method is an example of object-oriented programming in Python.
  • It is called directly on NumPy array objects.
  • Using .reshape() does not alter the original array—it returns a new reshaped version.

3. Trigonometric Functions

NumPy provides trigonometric functions (sin, cos, tan) that work element-wise on arrays.

import numpy as np

result = np.sin(np.array([0, np.pi/2, np.pi]))
print(result)

Output:

[0.000000e+00 1.000000e+00 1.224647e-16]

📌 These functions use radians, not degrees.


4. Exponential and Logarithmic Computations

NumPy allows efficient calculations of exponentials and logarithms.

exp_val = np.exp(2)        # e^2
log_val = np.log(2)        # natural log
log10_val = np.log10(100)  # base 10 log

print("e^2 =", exp_val)
print("ln(2) =", log_val)
print("log10(100) =", log10_val)

Output:

e^2 = 7.38905609893065
ln(2) = 0.6931471805599453
log10(100) = 2.0

Mastering Arithmetic Operations and Matrix Math with NumPy

This section explores how NumPy simplifies arithmetic operations and matrix math, making numerical analysis efficient and intuitive for data science tasks.


1. Why Use NumPy for Arithmetic?

  • Python lists → Adding two lists concatenates them:
    [1, 2] + [3, 4]  # Output: [1, 2, 3, 4]
    
  • NumPy arrays → Support true arithmetic (element-wise operations):
    import numpy as np
    
    a = np.array([1, 2])
    b = np.array([3, 4])
    print(a + b)   # Output: [4 6]
    
  • Efficiency: NumPy is optimized for high-speed computations on large datasets.

2. Element-wise Array Operations

NumPy supports element-wise operations like addition, subtraction, multiplication, division, square roots, and more.

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print(a + b)   # [5 7 9]
print(a * b)   # [4 10 18]

📌 Handy for applying mathematical functions across arrays with minimal code.


3. Matrix Operations and Multiplication

  • Matrix addition/subtraction → Performed element-wise if shapes match.
  • Element-wise vs. Matrix Multiplication:
    • * → Element-wise multiplication
    • @ or np.matmul() → True matrix multiplication (linear algebra rules)
m1 = np.array([[1, 2], [3, 4]])
m2 = np.array([[5, 6], [7, 8]])

print(m1 * m2)   # Element-wise multiplication
print(m1 @ m2)   # Matrix multiplication

4. Transposing Matrices

Transpose flips a matrix over its diagonal, turning rows into columns.

m = np.array([[1, 2, 3], [4, 5, 6]])
print(m.T)

Output:

[[1 4]
 [2 5]
 [3 6]]

📌 In NumPy, use .T or np.transpose().


Exploring Matrix Operations and Random Number Generation with NumPy

In this section, we’ll understand on working with matrices and generating random numbers using the NumPy library in Python.


1. Matrix Operations with NumPy

NumPy simplifies complex matrix calculations. Two essential functions discussed are:

np.min and np.max

Find the minimum and maximum values in a NumPy array quickly and efficiently—far faster than standard Python functions, especially for large datasets.

import numpy as np

matrix = np.array([[2, 5, 9],
                   [4, 1, 7]])

print("Minimum value:", np.min(matrix))  # Output: 1
print("Maximum value:", np.max(matrix))  # Output: 9

Explanation:

  • np.min(matrix) finds the smallest value in the matrix (1).
  • np.max(matrix) finds the largest value (9).

2. Generating Random Numbers with NumPy

Simulations and testing often require random data. NumPy offers several ways to generate such numbers:

a. Uniform Distribution

  • Function: np.random.rand
  • Purpose: Generates random numbers between 0 and 1 with equal probability.
uniform_randoms = np.random.rand(5)
print(uniform_randoms)

Sample Output:

[0.6944, 0.5224, 0.626 , 0.818 , 0.654 ]

b. Normal (Gaussian) Distribution

  • Function: np.random.randn
  • Purpose: Generates numbers from a standard normal distribution (mean = 0, variance = 1).
normal_randoms = np.random.randn(5)
print(normal_randoms)

Sample Output:

[-1.23, 0.42, 0.53, -0.96, 1.78]

c. Random Integers

  • Function: np.random.randint(low, high, size)
  • Purpose: Generates random integers within a specified range.
  • Note: The high value is exclusive (not included).
random_ints = np.random.randint(1, 5, 10)
print(random_ints)

Sample Output:

[2, 4, 2, 1, 1, 4, 4, 2, 2, 2]

📌 Notice that 5 does not appear, since the upper bound is exclusive.


Mastering NumPy Array Access and Manipulation

This section focuses on accessing, slicing, and modifying NumPy arrays and matrices.


1. Accessing Entries in a NumPy Array

You can access individual elements using square brackets, just like lists.

import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[2])  # Output: 30

Explanation:

  • Indexing starts at 0, so arr[2] gives you the third element.

2. Accessing Multiple Entries (Slicing)

Use slicing to retrieve a range of elements from an array.

print(arr[1:4])  # Output: [20 30 40]

Explanation:

  • Slicing uses the format [start:stop], including the start index and excluding the stop index.

3. Accessing Entries with np.arange

np.arange helps programmatically select index positions.

indices = np.arange(0, 5, 2)  # [0, 2, 4]
print(arr[indices])           # Output: [10 30 50]

Explanation:

  • np.arange(start, stop, step) creates a sequence, which you can use to index arrays.

4. Logical Indexing

Select elements based on conditions.

print(arr[arr > 25])  # Output: [30 40 50]

Explanation:

  • arr > 25 creates a boolean array, which filters arr accordingly.

5. Accessing Values in a Matrix

For two-dimensional arrays (matrices), specify both row and column indices.

mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat[1, 2])  # Output: 6

Explanation:

  • mat[1, 2] accesses the element in the second row, third column.

6. Modifying Array Values

You can update elements via direct or logical indexing.

arr[0] = 99
print(arr)  # Output: [99 20 30 40 50]

arr[arr < 30] = 0
print(arr)  # Output: [99  0 30 40 50]

Explanation:

  • Assign new values to elements or subsets of your array easily.

Matrix Modification and Safe Data Manipulation using NumPy

Manipulating data efficiently is an essential skill in Python for data science. In this section, we discuss key techniques for modifying matrices, including tips for safe data handling using NumPy.


1. Modifying Entries in a Matrix

You can directly change the values of elements in a NumPy matrix (2D array) by specifying their row and column indices.

import numpy as np

# Create a random 3x3 matrix with values between 0 and 10
matrix = np.random.randint(0, 10, (3, 3))
print("Original Matrix:\n", matrix)

# Set all values in the first row to zero
matrix[0, :] = 0
print("After Modifying First Row:\n", matrix)

Explanation:

  • matrix[0, :] = 0 sets the entire first row to zero.
  • Slicing (:) selects all columns in that row.

2. Strategies for Modifying Submatrices

You can modify blocks (sub-matrices) within a matrix the same way.

# Set a 2x2 block in the bottom-right corner to 5
matrix[1:3, 1:3] = 5
print("After Modifying Submatrix:\n", matrix)

Explanation:

  • matrix[1:3, 1:3] = 5 sets the 2x2 bottom-right part of the matrix to value 5.

3. The Pitfall of Views vs Copies

In NumPy, submatrices (extracted using slicing) are views on the original data, not independent copies.
Modifying a submatrix may alter the original matrix—sometimes unintentionally.

sub = matrix[0:2, 0:2]
sub[:] = 9
print("Modified Submatrix:\n", sub)
print("Matrix After Submatrix Modification:\n", matrix)

Explanation:

  • Changing sub also changes those entries in matrix, because sub is just a view into the same data.

4. Safely Copying Data: The .copy() Method

To avoid unwanted side-effects, create a true copy of the data before modifying:

safe_sub = matrix[0:2, 0:2].copy()
safe_sub[:] = 3  # Only changes safe_sub, not matrix
print("Safe Submatrix Modification:\n", safe_sub)
print("Original Matrix Remains Unchanged:\n", matrix)

Explanation:

  • .copy() generates an independent copy, so changes to safe_sub do not affect the original matrix.

How to Save and Load NumPy Arrays

Saving and reusing your data is a fundamental skill in data science. This section will guide you through best practices for saving and loading NumPy arrays—both locally and on the cloud—using code examples you can implement right away.


1. Saving NumPy Arrays to Google Drive in Google Colab

You can use Colab’s built-in functions to connect to your Google Drive for easy file saving:

from google.colab import drive
drive.mount('/content/drive')

import numpy as np

array = np.arange(10)
np.save('/content/drive/My Drive/my_array.npy', array)

Explanation:

  • drive.mount('/content/drive') gives Colab access to your Google Drive.
  • np.save saves your array directly to a location in Drive.

2. Saving NumPy Arrays Locally (Jupyter Notebook/Anaconda)

No cloud? Save directly to your hard drive:

import numpy as np

array = np.arange(10)
np.save('my_local_array.npy', array)

Explanation:

  • The file my_local_array.npy will appear in your working directory.

3. Saving Multiple Arrays in One File

Use np.savez to bundle multiple arrays efficiently:

a = np.arange(5)
b = np.linspace(0, 1, 5)

np.savez('arrays_bundle.npz', first=a, second=b)

Explanation:

  • np.savez saves several arrays into a single .npz file with named keyword arguments.

4. Loading Saved Arrays

Bring your data back with np.load, whether it’s .npy (single array) or .npz (multiple arrays):

# Loading a single .npy array
loaded_array = np.load('my_local_array.npy')
print(loaded_array)

# Loading arrays from a .npz file
bundle = np.load('arrays_bundle.npz')
print(bundle['first'])
print(bundle['second'])

Explanation:

  • For .npz files, access each array as a dictionary item using its name.

5. Saving and Loading as Text Files

For maximum compatibility (e.g., sharing with Excel users), use np.savetxt:

arr = np.random.rand(5, 2)
np.savetxt('my_data.txt', arr, delimiter=',')

loaded_txt = np.loadtxt('my_data.txt', delimiter=',')
print(loaded_txt)

Explanation:

  • Text files are readable by other programs.
  • ⚠️ Note: Numbers will generally be converted to floats.

Conclusion

NumPy is the backbone of Python’s scientific stack: it gives you fast, memory-efficient ndarrays, rich array creation utilities (arange, linspace, zeros/ones, eye), convenient reshaping and slicing, and a powerful suite of vectorized math (trig, exp/log, element-wise ops, and matrix multiplication). Mastering these building blocks unlocks everything that sits on top—pandas for tabular data, scikit-learn for ML, and TensorFlow/PyTorch for deep learning.

What you should now be comfortable with

  • Importing and using NumPy idiomatically (import numpy as np).
  • Choosing NumPy arrays over Python lists for numerical work (speed, broadcasting, vectorization).
  • Creating arrays the right way for your task (np.arange for ranges, np.linspace for fixed endpoints, np.zeros/ones for initialization, np.eye for linear algebra).
  • Reshaping without changing element count and using views vs. copies safely (.copy() when needed).
  • Writing vectorized code (avoid Python loops) for arithmetic, stats, and linear algebra (*, @, .T, np.min/max, np.exp, np.log, etc.).
  • Persisting work with save/load patterns (np.save, np.savez, np.load, np.savetxt, np.loadtxt) locally or in Colab/Drive.

Common pitfalls to avoid

  • Forgetting that many slices are views, not copies—unexpected mutations can leak back into the original array. Use .copy() when you’ll modify a slice.
  • Mismatch between shape and operation (e.g., matrix multiply on incompatible dimensions). Check arr.shape and prefer @/np.matmul for linear algebra.
  • Mixing degrees with trig functions (NumPy uses radians). Convert with np.deg2rad/np.rad2deg when needed.
  • Assuming np.arange includes the stop value—it’s exclusive. Use np.linspace when endpoints matter.

Next steps

  • Practice vectorization by rewriting looped code with array operations and broadcasting.
  • Pair NumPy with pandas for real-world datasets; use NumPy arrays under the hood for feature matrices.
  • Explore random sampling and reproducibility with np.random.default_rng.
  • Learn more linear algebra utilities (np.linalg.svd, np.linalg.solve, np.linalg.eig) to prepare for ML.

With these essentials, you’re ready to move from Python basics to production-grade data pipelines and ML workflows, writing code that’s both readable and fast.