Foundations of Machine Learning: Inside Python’s NumPy Library
by Selwyn Davidraj Posted on September 28, 2025
In this blog, we will explore NumPy, which is one of the most essential Python libraries for data science and numerical computation. Whether you’re just beginning your data science journey or refreshing your knowledge, NumPy will help you handle large datasets, perform mathematical operations, and power advanced machine learning workflows.
Table of contents
- Introduction to NumPy
- Key Array Creation Functions in NumPy
- Efficient Array Manipulation and Mathematical Operations with NumPy
- Mastering Arithmetic Operations and Matrix Math with NumPy
- Exploring Matrix Operations and Random Number Generation with NumPy
- Mastering NumPy Array Access and Manipulation
- Matrix Modification and Safe Data Manipulation using NumPy
- How to Save and Load NumPy Arrays
- Conclusion
Introduction to NumPy
1. Getting Started with Python for Data Science
If you’re new to Python, NumPy is one of the first libraries you should learn. It’s included with distributions like Anaconda and easily available in Google Colab or Jupyter Notebooks.
2. What is NumPy?
NumPy stands for Numerical Python. It is designed to handle:
- Large multi-dimensional arrays
- Matrices
- A rich set of mathematical functions
NumPy provides speed, efficiency, and the building blocks for other libraries such as pandas, scikit-learn, tensorflow, and pytorch.
3. Importing NumPy
To use NumPy, first import it. By convention, we use the alias np.
import numpy as np
This saves typing and is widely adopted in the Python community.
4. NumPy Arrays vs Python Lists
| Feature | Python List | NumPy Array |
|---|---|---|
| Data Types | Can store mixed types (int, str, etc.) | All elements must be of the same type |
| Performance | Slower, less memory efficient | Faster, optimized for numerical ops |
| Use Case | General-purpose storage | Numerical computation, data science |
Example
import numpy as np
# Python list
brands = ['Mercedes', 'BMW', 'Audi', 'Ferrari', 'Tesla']
brands_array = np.array(brands)
# Numeric data
numbers = [5, 4, 6, 7, 3]
numbers_array = np.array(numbers)
print(type(brands_array))
print(type(numbers_array))
Output:
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
5. Working with Matrices
NumPy makes it easy to create and manipulate two-dimensional arrays (matrices), which are fundamental for linear algebra and machine learning.
import numpy as np
matrix = np.array([[1, 2, 1],
[4, 5, 9],
[1, 8, 9]])
print(matrix)
Output:
[[1 2 1]
[4 5 9]
[1 8 9]]
📌 Each sub-list represents a row, and all rows should have the same length.
Key Array Creation Functions in NumPy
When working with data science, simulations, or machine learning, creating arrays efficiently is crucial.NumPy provides multiple functions for initializing arrays with custom values, ranges, and shapes. This section covers some of the most commonly used array creation functions.
1. np.arange()
Purpose: Creates an array with evenly spaced values within a given range.
Syntax:
np.arange(start, stop, step)
Parameters:
start: Starting value of the sequence.stop: End value (exclusive, not included in the output).step: (Optional) Spacing between values.
Examples:
import numpy as np
np.arange(0, 10) # Output: [0 1 2 3 4 5 6 7 8 9]
np.arange(0, 20, 5) # Output: [0 5 10 15]
📌 Tip: Stop value is not included, just like Python’s range() function.
2. np.linspace()
Purpose: Generates an array of evenly spaced numbers over a specified interval, including both start and stop values.
Syntax:
np.linspace(start, stop, num)
Parameters:
start: Starting value.stop: Ending value (included in the array).num: (Optional) Number of samples to generate (default = 50).
Examples:
np.linspace(0, 5) # Output: 50 points between 0 and 5
np.linspace(10, 20, 10) # Output: 10 points from 10 to 20 (inclusive)
📌 Tip: Use when both start and stop values must be precisely included.
3. np.zeros() and np.ones()
Purpose: Create arrays (of any shape) filled with zeros or ones.
Syntax:
np.zeros(shape)
np.ones(shape)
Examples:
np.zeros([3, 5]) # 3x5 matrix of zeros
np.ones([3, 5]) # 3x5 matrix of ones
✅ Use Cases:
- Initializing data structures
- Creating masks
- Placeholder arrays for computations
4. np.eye()
Purpose: Creates an identity matrix (a square matrix with 1s on the diagonal and 0s elsewhere).
Syntax:
np.eye(N)
Example:
np.eye(5) # 5x5 identity matrix
📌 Importance: Fundamental in linear algebra, useful for matrix operations and solving systems of equations.
Efficient Array Manipulation and Mathematical Operations with NumPy
In this section, we explore essential concepts around creating, reshaping, and performing advanced mathematical operations on NumPy arrays.
1. Creating and Reshaping Arrays
NumPy makes it easy to create arrays and reshape them into different dimensions.
import numpy as np
# Create an array of numbers from 0 to 9
arr = np.arange(10)
# Reshape into a 2x5 matrix
reshaped_arr = arr.reshape((2, 5))
print("Original Array:", arr)
print("Reshaped Array:\n", reshaped_arr)
📌 Important: When reshaping, the total number of elements must remain the same as the original array.
2. Understanding Methods vs Functions
- The
reshapemethod is an example of object-oriented programming in Python. - It is called directly on NumPy array objects.
- Using
.reshape()does not alter the original array—it returns a new reshaped version.
3. Trigonometric Functions
NumPy provides trigonometric functions (sin, cos, tan) that work element-wise on arrays.
import numpy as np
result = np.sin(np.array([0, np.pi/2, np.pi]))
print(result)
Output:
[0.000000e+00 1.000000e+00 1.224647e-16]
📌 These functions use radians, not degrees.
4. Exponential and Logarithmic Computations
NumPy allows efficient calculations of exponentials and logarithms.
exp_val = np.exp(2) # e^2
log_val = np.log(2) # natural log
log10_val = np.log10(100) # base 10 log
print("e^2 =", exp_val)
print("ln(2) =", log_val)
print("log10(100) =", log10_val)
Output:
e^2 = 7.38905609893065
ln(2) = 0.6931471805599453
log10(100) = 2.0
Mastering Arithmetic Operations and Matrix Math with NumPy
This section explores how NumPy simplifies arithmetic operations and matrix math, making numerical analysis efficient and intuitive for data science tasks.
1. Why Use NumPy for Arithmetic?
- Python lists → Adding two lists concatenates them:
[1, 2] + [3, 4] # Output: [1, 2, 3, 4] - NumPy arrays → Support true arithmetic (element-wise operations):
import numpy as np a = np.array([1, 2]) b = np.array([3, 4]) print(a + b) # Output: [4 6] - Efficiency: NumPy is optimized for high-speed computations on large datasets.
2. Element-wise Array Operations
NumPy supports element-wise operations like addition, subtraction, multiplication, division, square roots, and more.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
print(a + b) # [5 7 9]
print(a * b) # [4 10 18]
📌 Handy for applying mathematical functions across arrays with minimal code.
3. Matrix Operations and Multiplication
- Matrix addition/subtraction → Performed element-wise if shapes match.
- Element-wise vs. Matrix Multiplication:
*→ Element-wise multiplication@ornp.matmul()→ True matrix multiplication (linear algebra rules)
m1 = np.array([[1, 2], [3, 4]])
m2 = np.array([[5, 6], [7, 8]])
print(m1 * m2) # Element-wise multiplication
print(m1 @ m2) # Matrix multiplication
4. Transposing Matrices
Transpose flips a matrix over its diagonal, turning rows into columns.
m = np.array([[1, 2, 3], [4, 5, 6]])
print(m.T)
Output:
[[1 4]
[2 5]
[3 6]]
📌 In NumPy, use .T or np.transpose().
Exploring Matrix Operations and Random Number Generation with NumPy
In this section, we’ll understand on working with matrices and generating random numbers using the NumPy library in Python.
1. Matrix Operations with NumPy
NumPy simplifies complex matrix calculations. Two essential functions discussed are:
np.min and np.max
Find the minimum and maximum values in a NumPy array quickly and efficiently—far faster than standard Python functions, especially for large datasets.
import numpy as np
matrix = np.array([[2, 5, 9],
[4, 1, 7]])
print("Minimum value:", np.min(matrix)) # Output: 1
print("Maximum value:", np.max(matrix)) # Output: 9
Explanation:
np.min(matrix)finds the smallest value in the matrix (1).np.max(matrix)finds the largest value (9).
2. Generating Random Numbers with NumPy
Simulations and testing often require random data. NumPy offers several ways to generate such numbers:
a. Uniform Distribution
- Function:
np.random.rand - Purpose: Generates random numbers between 0 and 1 with equal probability.
uniform_randoms = np.random.rand(5)
print(uniform_randoms)
Sample Output:
[0.6944, 0.5224, 0.626 , 0.818 , 0.654 ]
b. Normal (Gaussian) Distribution
- Function:
np.random.randn - Purpose: Generates numbers from a standard normal distribution (mean = 0, variance = 1).
normal_randoms = np.random.randn(5)
print(normal_randoms)
Sample Output:
[-1.23, 0.42, 0.53, -0.96, 1.78]
c. Random Integers
- Function:
np.random.randint(low, high, size) - Purpose: Generates random integers within a specified range.
- Note: The
highvalue is exclusive (not included).
random_ints = np.random.randint(1, 5, 10)
print(random_ints)
Sample Output:
[2, 4, 2, 1, 1, 4, 4, 2, 2, 2]
📌 Notice that 5 does not appear, since the upper bound is exclusive.
Mastering NumPy Array Access and Manipulation
This section focuses on accessing, slicing, and modifying NumPy arrays and matrices.
1. Accessing Entries in a NumPy Array
You can access individual elements using square brackets, just like lists.
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
print(arr[2]) # Output: 30
Explanation:
- Indexing starts at 0, so
arr[2]gives you the third element.
2. Accessing Multiple Entries (Slicing)
Use slicing to retrieve a range of elements from an array.
print(arr[1:4]) # Output: [20 30 40]
Explanation:
- Slicing uses the format
[start:stop], including the start index and excluding the stop index.
3. Accessing Entries with np.arange
np.arange helps programmatically select index positions.
indices = np.arange(0, 5, 2) # [0, 2, 4]
print(arr[indices]) # Output: [10 30 50]
Explanation:
np.arange(start, stop, step)creates a sequence, which you can use to index arrays.
4. Logical Indexing
Select elements based on conditions.
print(arr[arr > 25]) # Output: [30 40 50]
Explanation:
arr > 25creates a boolean array, which filtersarraccordingly.
5. Accessing Values in a Matrix
For two-dimensional arrays (matrices), specify both row and column indices.
mat = np.array([[1, 2, 3], [4, 5, 6]])
print(mat[1, 2]) # Output: 6
Explanation:
mat[1, 2]accesses the element in the second row, third column.
6. Modifying Array Values
You can update elements via direct or logical indexing.
arr[0] = 99
print(arr) # Output: [99 20 30 40 50]
arr[arr < 30] = 0
print(arr) # Output: [99 0 30 40 50]
Explanation:
- Assign new values to elements or subsets of your array easily.
Matrix Modification and Safe Data Manipulation using NumPy
Manipulating data efficiently is an essential skill in Python for data science. In this section, we discuss key techniques for modifying matrices, including tips for safe data handling using NumPy.
1. Modifying Entries in a Matrix
You can directly change the values of elements in a NumPy matrix (2D array) by specifying their row and column indices.
import numpy as np
# Create a random 3x3 matrix with values between 0 and 10
matrix = np.random.randint(0, 10, (3, 3))
print("Original Matrix:\n", matrix)
# Set all values in the first row to zero
matrix[0, :] = 0
print("After Modifying First Row:\n", matrix)
Explanation:
matrix[0, :] = 0sets the entire first row to zero.- Slicing (
:) selects all columns in that row.
2. Strategies for Modifying Submatrices
You can modify blocks (sub-matrices) within a matrix the same way.
# Set a 2x2 block in the bottom-right corner to 5
matrix[1:3, 1:3] = 5
print("After Modifying Submatrix:\n", matrix)
Explanation:
matrix[1:3, 1:3] = 5sets the 2x2 bottom-right part of the matrix to value 5.
3. The Pitfall of Views vs Copies
In NumPy, submatrices (extracted using slicing) are views on the original data, not independent copies.
Modifying a submatrix may alter the original matrix—sometimes unintentionally.
sub = matrix[0:2, 0:2]
sub[:] = 9
print("Modified Submatrix:\n", sub)
print("Matrix After Submatrix Modification:\n", matrix)
Explanation:
- Changing
subalso changes those entries inmatrix, becausesubis just a view into the same data.
4. Safely Copying Data: The .copy() Method
To avoid unwanted side-effects, create a true copy of the data before modifying:
safe_sub = matrix[0:2, 0:2].copy()
safe_sub[:] = 3 # Only changes safe_sub, not matrix
print("Safe Submatrix Modification:\n", safe_sub)
print("Original Matrix Remains Unchanged:\n", matrix)
Explanation:
.copy()generates an independent copy, so changes tosafe_subdo not affect the original matrix.
How to Save and Load NumPy Arrays
Saving and reusing your data is a fundamental skill in data science. This section will guide you through best practices for saving and loading NumPy arrays—both locally and on the cloud—using code examples you can implement right away.
1. Saving NumPy Arrays to Google Drive in Google Colab
You can use Colab’s built-in functions to connect to your Google Drive for easy file saving:
from google.colab import drive
drive.mount('/content/drive')
import numpy as np
array = np.arange(10)
np.save('/content/drive/My Drive/my_array.npy', array)
Explanation:
drive.mount('/content/drive')gives Colab access to your Google Drive.np.savesaves your array directly to a location in Drive.
2. Saving NumPy Arrays Locally (Jupyter Notebook/Anaconda)
No cloud? Save directly to your hard drive:
import numpy as np
array = np.arange(10)
np.save('my_local_array.npy', array)
Explanation:
- The file
my_local_array.npywill appear in your working directory.
3. Saving Multiple Arrays in One File
Use np.savez to bundle multiple arrays efficiently:
a = np.arange(5)
b = np.linspace(0, 1, 5)
np.savez('arrays_bundle.npz', first=a, second=b)
Explanation:
np.savezsaves several arrays into a single.npzfile with named keyword arguments.
4. Loading Saved Arrays
Bring your data back with np.load, whether it’s .npy (single array) or .npz (multiple arrays):
# Loading a single .npy array
loaded_array = np.load('my_local_array.npy')
print(loaded_array)
# Loading arrays from a .npz file
bundle = np.load('arrays_bundle.npz')
print(bundle['first'])
print(bundle['second'])
Explanation:
- For
.npzfiles, access each array as a dictionary item using its name.
5. Saving and Loading as Text Files
For maximum compatibility (e.g., sharing with Excel users), use np.savetxt:
arr = np.random.rand(5, 2)
np.savetxt('my_data.txt', arr, delimiter=',')
loaded_txt = np.loadtxt('my_data.txt', delimiter=',')
print(loaded_txt)
Explanation:
- Text files are readable by other programs.
- ⚠️ Note: Numbers will generally be converted to floats.
Conclusion
NumPy is the backbone of Python’s scientific stack: it gives you fast, memory-efficient ndarrays, rich array creation utilities (arange, linspace, zeros/ones, eye), convenient reshaping and slicing, and a powerful suite of vectorized math (trig, exp/log, element-wise ops, and matrix multiplication). Mastering these building blocks unlocks everything that sits on top—pandas for tabular data, scikit-learn for ML, and TensorFlow/PyTorch for deep learning.
What you should now be comfortable with
- Importing and using NumPy idiomatically (
import numpy as np). - Choosing NumPy arrays over Python lists for numerical work (speed, broadcasting, vectorization).
- Creating arrays the right way for your task (
np.arangefor ranges,np.linspacefor fixed endpoints,np.zeros/onesfor initialization,np.eyefor linear algebra). - Reshaping without changing element count and using views vs. copies safely (
.copy()when needed). - Writing vectorized code (avoid Python loops) for arithmetic, stats, and linear algebra (
*,@,.T,np.min/max,np.exp,np.log, etc.). - Persisting work with save/load patterns (
np.save,np.savez,np.load,np.savetxt,np.loadtxt) locally or in Colab/Drive.
Common pitfalls to avoid
- Forgetting that many slices are views, not copies—unexpected mutations can leak back into the original array. Use
.copy()when you’ll modify a slice. - Mismatch between shape and operation (e.g., matrix multiply on incompatible dimensions). Check
arr.shapeand prefer@/np.matmulfor linear algebra. - Mixing degrees with trig functions (NumPy uses radians). Convert with
np.deg2rad/np.rad2degwhen needed. - Assuming
np.arangeincludes the stop value—it’s exclusive. Usenp.linspacewhen endpoints matter.
Next steps
- Practice vectorization by rewriting looped code with array operations and broadcasting.
- Pair NumPy with pandas for real-world datasets; use NumPy arrays under the hood for feature matrices.
- Explore random sampling and reproducibility with
np.random.default_rng. - Learn more linear algebra utilities (
np.linalg.svd,np.linalg.solve,np.linalg.eig) to prepare for ML.
With these essentials, you’re ready to move from Python basics to production-grade data pipelines and ML workflows, writing code that’s both readable and fast.
Previous article