Building a Brain in 10 Minutes with Nvidia
Today, we’re going to build a very simple brain. Not a sci-fi robot brain, but a basic neural network that can learn to recognize images of clothing—specifically using the Fashion MNIST dataset. Our goal? To make a machine look at a fuzzy grayscale image and say, "That's a sneaker."
We’ll be working in TensorFlow and Keras, two popular libraries for deep learning in Python. This walkthrough is inspired by NVIDIA's Deep Learning Institute.
Developed by Frank Rosenblatt in 1958, the perceptron represents one of the earliest milestones in neural network history. It was a simplistic, single-layer neural model designed to mimic how the brain learns through trial and error. Though limited in its capabilities (it famously failed to solve problems like XOR), this foundational research laid the groundwork for today’s deep learning systems. The documentary serves as a reminder of how early ambitions in AI set the stage for the powerful models we build today.
1. Checking the Hardware
First things first: is our computer equipped for deep learning? We ask TensorFlow to list physical GPU devices. These are like the turbochargers of neural network training. A CPU can do it, but it’s like racing a horse against a todler.
import tensorflow as tf tf.config.list_physical_devices('GPU')
If we get a response showing a GPU, we’re in luck. Training will go faster.
2. Loading the Dataset
We’re about to dive into a challenge that would have seemed nearly impossible a few decades ago: training a computer to recognize images using computer vision. Specifically, our task involves identifying different types of clothing in the Fashion MNIST dataset.
Inspired by how humans learn through trial and error, we’ll simulate this process by using digital flashcards. The artificial brain—our neural network—will make guesses about the items of clothing shown, and we’ll correct it by providing the actual labels. Over time, it learns from its mistakes and successes alike.
Much like students preparing for a test, we’ll hold back a portion of the data to quiz the model. This ensures that our neural network isn’t just memorizing what it saw during training but actually understanding the task. Memorization might be fine for trivia, but it won’t cut it for applying reasoning, like classifying new images or doing math.
The data we use to teach the model is known as the training dataset
, while the reserved data for testing is referred to as the validation dataset
. Since Fashion MNIST is a well-known dataset, TensorFlow includes it by default. So, let’s load it and see what it looks like.
fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (valid_images, valid_labels) = fashion_mnist.load_data()
3. Visualizing a Sample
Before we ask a neural network to distinguish between a sneaker and a shirt, it helps to ensure that we, as humans, can actually interpret the data ourselves.
The dataset provides us with train_images
and train_labels
. Think of train_images
as the questions we pose to our model—what it sees—and train_labels
as the corresponding answers. In the data science world, these answers are typically referred to as labels
.
To better understand the input, we can visualize a sample image using Matplotlib, a popular plotting library in Python.
import matplotlib.pyplot as plt data_idx = 69 plt.figure() plt.imshow(train_images[data_idx], cmap='gray') plt.colorbar() plt.grid(False) plt.show()
This shows us one example image. In this case, item #69. Looking at it, you might say, "That looks like a phallic object."
And you're probably right, but it's actually a pair of trousers:
train_labels[data_idx] # Output: 1
Label 1 corresponds to "trouser." Good job, human. Here we have a dandy table to help us view the different categories.
4. Defining the Model
Neurons are the essential units that make up a neural network. In the biological sense, neurons fire electrical signals when they receive the right kind of stimulus, this helps you differenciate between red and yellow for example. Artificial neurons do something similar, except they work with numbers—when you input certain values, they produce a numerical result.
You can think of creating a neuron in three main steps:
-
First, we define its structure (this is its architecture).
-
Next, we train it with data.
-
Finally, we assess how well it performs.
Building the Structure
Biological neurons use a system similar to Morse Code to send information. They receive signals through their dendrites, and if the input meets the right conditions, they send a pulse down the axon to the neuron’s terminals.
It’s thought that both the timing and sequence of these pulses help convey information. Most artificial neural networks don’t yet replicate this timing complexity. Instead, they model the behavior with mathematical equations.
How the Math Works
Computers handle information as sequences of 0s and 1s, while humans and animals operate on more continuous inputs. Early artificial neurons tried to imitate biological ones using a linear regression function: y = mx + b
. Here, x
is the input (similar to signals entering dendrites), and y
is the output (like the impulse exiting the terminals). As the system sees more data and predictions, it adjusts the values of m
and b
to improve accuracy.
Neurons often need to process many inputs at once. In our example, we’ll treat every pixel in an image—each having a value between 0 (black) and 255 (white)—as a separate input. Each of these inputs gets its own weight (similar to our m
), and these weights are denoted with w
. So pixel one has w0
, pixel two has w1
, and so forth. This turns our equation into something like: y = w0x0 + w1x1 + w2x2 + ... + b
.
Since each image is 28 pixels wide and 28 pixels tall, we end up with 784 weights in total. Each pixel contributes to the final result, and we can look directly at the raw values from the image we visualized earlier. Each of those pixel numbers will be multiplied by its corresponding weight.
We’ve been using a simple equation, y = mx + b
, which gives us a single number as an output. But here’s the thing—we’re trying to recognize types of clothing, not just output a number. So how do we turn a plain number into a decision like “This is a T-shirt” or “That’s a sneaker”?
A straightforward solution is to use multiple outputs—specifically, one neuron for each clothing type. Since there are ten categories in the Fashion MNIST dataset, we’ll use ten neurons. Each one will specialize in detecting a specific class (like Trousers, labeled as class 1). The neuron that produces the largest number (i.e., is most “confident”) determines which class the model chooses for the input image.
We can easily create this kind of setup using Keras, a high-level API that's now part of TensorFlow. We’ll use its Sequential API, which lets us organize our model as a series of stacked layers. Think of layers as the processing steps that take input data and transform it into predictions.
The model we’re about to create has two layers:
-
A Flatten layer, which takes our 28×28 image and reshapes it into a single list of 784 values.
-
A Dense layer, which contains 10 neurons—each one connected to every input pixel, and each assigned a weight. These neurons will each compute a score that tells us how likely the input image is to belong to their respective clothing category.
We’ll also specify an input_shape
of (28, 28), which matches the pixel dimensions of each image.
number_of_classes = 10 model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(number_of_classes) ])
Verifying the model
To ensure our model has the structure we think it does, we can call its summary
method. Here's what the output looks like:
Let’s unpack where that number—7,850—comes from. Each 28×28 image has 784 pixels. For every one of the 10 output neurons (representing the 10 classes), we need a unique weight for each of those 784 inputs. That’s 784 * 10 = 7,840
weights. The additional 10 parameters are biases—one for each output neuron—making the full count 7,840 + 10 = 7,850
. Those biases act as the b
in the equation y = mx + b
, allowing each neuron to adjust its activation baseline independently.
In academic literature, model architectures are sometimes visualized as node-link diagrams. Though such diagrams work for small networks, they become unwieldy with modern models. The diagram referenced here represents only a thin slice of our model: 28 input nodes on top (representing one row of the image) and 10 output neurons at the bottom. In reality, we’d have 784 inputs—one for each pixel.
Each dot represents either an input or a neuron, and each connecting line is a weight—one of those 7,840 learnable parameters.
Lastly, there’s another way to visually verify our model: you can plot it. Just be warned—once networks get complex, those diagrams become more abstract art than useful tools.
Model: "sequential"
Layer (type) | Output Shape | Param # |
flatten (Flatten) | (None, 784) | 0 |
dense (Dense) | (None, 10) | 7,850 |
Total params: 7,850 (30.66 KB)
Trainable params: 7,850 (30.66 KB)
Non-trainable params: 0 (0.00 B)
5. Initiate Training
Before training, we compile the model with a loss function and an optimizer.
We’ve built a model—but how do we help it improve? Think of it like giving a student a test and a score sheet. We need a way to evaluate how far off our model's answers are from the correct ones. That’s what the loss
function does: it calculates the difference between the model’s prediction and the actual label, so it knows how to adjust its internal settings.
For classification problems like ours, we’ll use a specific type of loss function called SparseCategoricalCrossentropy
. Here’s what each part of that name tells us:
-
Sparse – Instead of using one-hot encoded labels (where each class is represented by a vector with a single 1 and the rest 0), this function works with integer labels (e.g. 0 to 9). So our categories are indexed numerically.
-
Categorical – It’s built for classification tasks, where we predict a category rather than a continuous value.
-
Cross-entropy – This part penalizes incorrect confident predictions more harshly. If our model is totally sure it’s right and turns out to be wrong, it gets a very bad score—effectively negative infinity!
This loss function is a good match for our model because it evaluates all ten output neurons at once. If multiple neurons signal strongly that their class is the correct one, this function steps in and says, "Sorry folks—only one of you can be right."
To go beyond just loss, we can include other metrics
—like accuracy
—to give us a better picture of performance. Sometimes loss may be low (suggesting good predictions on average), but the actual accuracy might be lagging. Metrics help highlight that distinction.
model.compile( optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] )
-
adam
: A popular optimizer that adjusts learning rates adaptively. -
SparseCategoricalCrossentropy
: Used when our labels are integers (0-9), not one-hot vectors. -
from_logits=True
– This tells the function that the model’s output is in raw score form (logits) and hasn’t yet been transformed into probabilities. The loss function will handle that conversion internally.
6. Training the Model
Here comes the exciting part—where our model gets to train and test itself. The fit method in TensorFlow is what makes this happen. It allows the model to learn from the training data and also test itself using the validation data.
In machine learning lingo, an epoch
represents one complete pass over the entire training dataset. It’s like a student going through all their flashcards once. And just like humans often need to review the same material several times before they fully grasp it, our model also benefits from seeing the training examples multiple times.
At the end of each epoch, the model pauses its studying session and takes a quick quiz using the validation dataset. This helps us measure how much it’s actually learning versus just memorizing. Time to see it grind and grow smarter!
history = model.fit(
train_images, train_labels,
epochs=5,
validation_data=(valid_images, valid_labels)
)
So, how did our model perform? Maybe a solid B-minus? Let’s give it a break—it was only working with 10 neurons, while the human brain has billions firing at once.
We can expect the model’s accuracy to land somewhere around 80%, though that number isn’t fixed. The result can vary slightly due to how the training images (our digital flashcards) were shuffled and the random initialization of the model’s weights at the start of training. A bit of randomness is baked into every run, so no two training sessions are perfectly identical.
7. Making Predictions
Now that our model has finished training, it's time to test it in the real world. We do this using the predict
method, which allows us to see the model's predictions on any image, whether it’s from the training dataset or brand new.
Keep in mind that Keras always expects inputs in batches—even if it’s just one image. So, to make a prediction on a single image, we still need to pass it as a batch with one item.
Let’s take a look at what the model predicts for the first 10 images in our training data. The raw output from the model consists of numerical values called logits. These numbers show how confident each of the 10 output neurons is in labeling the image as a particular clothing item.
We visualize both the input image and a bar chart that maps the logits. A higher number means the model is more confident that class is the correct one. Negative values, on the other hand, show the model is quite sure that particular class is incorrect.
You can change the data_idx
value to view predictions for different examples. How accurate do you think it is? When it makes mistakes, are they understandable ones?
plt.imshow(train_images[data_idx], cmap='gray') plt.show() predictions = model.predict(train_images[data_idx:data_idx+1]) plt.bar(range(10), predictions.flatten()) plt.xticks(range(10)) plt.show() print("correct answer:", train_labels[data_idx])
This gives us a visual breakdown of how confident the model is in each of the 10 categories. The tallest bar should ideally match the actual class. In this example? It was a sneaker. Nailed it.
We get a bar chart showing the model’s confidence in each class. Ideally, the bar for class 7 is the tallest. That means the model thinks this image is most likely a sneaker.
Boom. It was a sneaker.
Conclusion
And that, dear listener, is how you build a brain in 10 minutes. It’s not conscious, it doesn’t dream, but it does recognize shoes.
We used:
-
TensorFlow and Keras for model building
-
Fashion MNIST for data
-
Matplotlib for visualization
More importantly, we saw the end-to-end flow: load data, visualize, build a model, train it, and check its predictions.
In practice, this is just the start. Better models would use deeper layers, dropout for regularization, and data normalization. But if you understood this, you understand the core of machine learning.
The machine didn’t change the world yet, but maybe yours just got a little smarter.