Skepticism on that point has been popular for a long time. Skepticism is good. But your brain has a lot in common with a giant matrix. Your senses are like a giant vector (the data in your favorite song), and your thoughts and actions another giant vector (a synth recording of you covering that song, capturing your input to the instrument), with your brain a matrix that converts one to the other. How far does the similarity go? Unknown... a giant matrix can approximate any process as closely as you want, if it's big enough and has enough time/examples to learn.
If all that - a giant matrix as a brain - seems too simple to be possible, keep in mind that this kind of matrix represents an interacting network. There's a math proof that a matrix can approximate any process, meaning any natural or computational process as far as we understand the word "process," and it's very closely related to the way you can break any sound down into a spectrum of frequencies. The proof actually depends on the same idea.
The "deep" in "deep learning" just means using a bigger matrix. Often that means using fancier hardware to run the learning faster, but not necessarily. This is very similar to cameras and screens with higher and higher resolutions. A newer phone should have a faster chip to keep up with a higher pixel count in camera and screen. But it doesn't technically need a faster chip. It would just slow down otherwise. Images didn't get more complicated, only bigger.
But for that ability to sculpt a matrix into any process to really work, the matrix needs to be broken up into individual vectors, and those are run against the input - the vector representing senses - one at a time, with each result - a work-in-progress vector - put on a curve a bit like a grading curve. This curved result is then sent to interact with the next vector that was broken off the matrix. Rinse and repeat!
Eventually that work-in-progress vector is done, at which point it represents the thoughts/actions that are the output. Think of each number in the vector as the strength of each dimension of possible response, the probability of hitting each note on a piano, or how much to move each muscle, etc. So to put the last paragraph in different words, a "deep learning" matrix, aka neural network, is no more than a bunch of multiplications in the form of dot products between pairs of vectors, with a little filter/curve after each one.
Incidentally, each one of those vectors broken off the matrix can be visualized as a line or edge. You can imagine that you could draw any picture, even a 3D one, even a 5005D one, with enough lines or edges. You can make it as clean and accurate as you want by adding more lines. We know that intuitively, because that's how sketching works. Deep learning is not unlike sketching very fast. Similarly, you can draw a very smooth circle, as smooth as you want, with enough little square pixels. See it? Now we can do that with concepts.
But those are details. Students who think matrix math is boring will typically hear about AI from me, haha. And they do tend to find it interesting.
The curve, or conditioning, after each step is what makes this different from just multiplying a giant vector by a giant matrix to get another giant vector. That would be too simple, and it's kind of the lie I told at the start. Instead, information flows step by step through the layers of the matrix much like energy filtering up through the layers of an ecosystem, towards apex predators and decomposers. And there's that curve/filter between each level. I suppose it's a bit like a goat eating grass which is converted into goat; something changes in the middle. It isn't grass to grass, it's grass to goat, so there's a left turn in there somewhere. That bend is critical but not complicated at all, though why it's critical is more difficult and I don't fully understand why. That filter doesn't even have to be a curve, it can just mean putting a kink in each line - just a bend in each vector, like a knee or elbow. It almost doesn't matter what the bend is, just that it's there. That's surprisingly essential to the universality of neural networks, so apparently it adds a lot for very little. I don't have a good analogy for why that's true, except that the world isn't actually made up of a bunch of straight lines. It's more like a bunch of curves and surfaces and volumes and energy and particles and static and other noise and signals between interconnected systems, and this step, putting kinks in the lines, allows the processing to break out into a much larger possibility space.
Theoretically, the old possibility space (without bends) was the stuff that you could accomplish with the "transformations" you learned in geometry - stretches, rotations, reflections, glides. The new space is all possibility space - or any "before/after" that can be measured and processed as a measurement. Artificially aging your neighbor's cat, painting today's sunset from weather data... If there's any logical connection between input and output, between before and after if there's time involved - even if that connection is just the laws of physics - or even if it's just a random association to memorize, like didn't you know volcanoes and lemons are connected because I said so - that connection can be represented by a big enough matrix.
So instead of pixels, it's lines, and instead of lines, it's bends. Think of bends as moments of change. Maybe this is a little like adding 3D glasses and color to a greyscale picture without altering the resolution. But... the effect of the curving/filtering/bending I've been talking about would be far more shocking than the image upgrade if you could directly experience the difference, given that we get the potential of learning and mimicking every known process. Maybe we do directly experience that difference as a key component of being alive. It's more like adding motion to that image, and an understanding of where the motion comes from and where it's going. Or to rephrase, the greyscale picture with our "kinks" update is now more like a mind than a photo - which, after all, is a simpler kind of matrix, one that is not a network.
The other simplification I made is that the big matrix is actually broken down into multiple matrices first, before those are broken down into individual vectors, each of which is roughly equivalent to a single neuron. What I described was a single-file chain of neurons, but there can be many neurons next to each other. Each layer of neurons in a neural network is its own matrix. Each neuron is its own vector. But I'd say that aspect of the layers is the least important detail here, other than realizing you can see each row of a matrix as a brain cell, which is neat. And you can very roughly imagine each brain cell as knowing how to draw one line-with-bend through concept space and give its vote on that basis.
We have 6 layers of neurons in the cerebral cortex, for reference, so at a gross simplification that would be 6 big matrices in a chain, with the rows of each matrix representing individual neurons.
If all that - a giant matrix as a brain - seems too simple to be possible, keep in mind that this kind of matrix represents an interacting network. There's a math proof that a matrix can approximate any process, meaning any natural or computational process as far as we understand the word "process," and it's very closely related to the way you can break any sound down into a spectrum of frequencies. The proof actually depends on the same idea.
The "deep" in "deep learning" just means using a bigger matrix. Often that means using fancier hardware to run the learning faster, but not necessarily. This is very similar to cameras and screens with higher and higher resolutions. A newer phone should have a faster chip to keep up with a higher pixel count in camera and screen. But it doesn't technically need a faster chip. It would just slow down otherwise. Images didn't get more complicated, only bigger.
But for that ability to sculpt a matrix into any process to really work, the matrix needs to be broken up into individual vectors, and those are run against the input - the vector representing senses - one at a time, with each result - a work-in-progress vector - put on a curve a bit like a grading curve. This curved result is then sent to interact with the next vector that was broken off the matrix. Rinse and repeat!
Eventually that work-in-progress vector is done, at which point it represents the thoughts/actions that are the output. Think of each number in the vector as the strength of each dimension of possible response, the probability of hitting each note on a piano, or how much to move each muscle, etc. So to put the last paragraph in different words, a "deep learning" matrix, aka neural network, is no more than a bunch of multiplications in the form of dot products between pairs of vectors, with a little filter/curve after each one.
Incidentally, each one of those vectors broken off the matrix can be visualized as a line or edge. You can imagine that you could draw any picture, even a 3D one, even a 5005D one, with enough lines or edges. You can make it as clean and accurate as you want by adding more lines. We know that intuitively, because that's how sketching works. Deep learning is not unlike sketching very fast. Similarly, you can draw a very smooth circle, as smooth as you want, with enough little square pixels. See it? Now we can do that with concepts.
But those are details. Students who think matrix math is boring will typically hear about AI from me, haha. And they do tend to find it interesting.
The curve, or conditioning, after each step is what makes this different from just multiplying a giant vector by a giant matrix to get another giant vector. That would be too simple, and it's kind of the lie I told at the start. Instead, information flows step by step through the layers of the matrix much like energy filtering up through the layers of an ecosystem, towards apex predators and decomposers. And there's that curve/filter between each level. I suppose it's a bit like a goat eating grass which is converted into goat; something changes in the middle. It isn't grass to grass, it's grass to goat, so there's a left turn in there somewhere. That bend is critical but not complicated at all, though why it's critical is more difficult and I don't fully understand why. That filter doesn't even have to be a curve, it can just mean putting a kink in each line - just a bend in each vector, like a knee or elbow. It almost doesn't matter what the bend is, just that it's there. That's surprisingly essential to the universality of neural networks, so apparently it adds a lot for very little. I don't have a good analogy for why that's true, except that the world isn't actually made up of a bunch of straight lines. It's more like a bunch of curves and surfaces and volumes and energy and particles and static and other noise and signals between interconnected systems, and this step, putting kinks in the lines, allows the processing to break out into a much larger possibility space.
Theoretically, the old possibility space (without bends) was the stuff that you could accomplish with the "transformations" you learned in geometry - stretches, rotations, reflections, glides. The new space is all possibility space - or any "before/after" that can be measured and processed as a measurement. Artificially aging your neighbor's cat, painting today's sunset from weather data... If there's any logical connection between input and output, between before and after if there's time involved - even if that connection is just the laws of physics - or even if it's just a random association to memorize, like didn't you know volcanoes and lemons are connected because I said so - that connection can be represented by a big enough matrix.
So instead of pixels, it's lines, and instead of lines, it's bends. Think of bends as moments of change. Maybe this is a little like adding 3D glasses and color to a greyscale picture without altering the resolution. But... the effect of the curving/filtering/bending I've been talking about would be far more shocking than the image upgrade if you could directly experience the difference, given that we get the potential of learning and mimicking every known process. Maybe we do directly experience that difference as a key component of being alive. It's more like adding motion to that image, and an understanding of where the motion comes from and where it's going. Or to rephrase, the greyscale picture with our "kinks" update is now more like a mind than a photo - which, after all, is a simpler kind of matrix, one that is not a network.
The other simplification I made is that the big matrix is actually broken down into multiple matrices first, before those are broken down into individual vectors, each of which is roughly equivalent to a single neuron. What I described was a single-file chain of neurons, but there can be many neurons next to each other. Each layer of neurons in a neural network is its own matrix. Each neuron is its own vector. But I'd say that aspect of the layers is the least important detail here, other than realizing you can see each row of a matrix as a brain cell, which is neat. And you can very roughly imagine each brain cell as knowing how to draw one line-with-bend through concept space and give its vote on that basis.
We have 6 layers of neurons in the cerebral cortex, for reference, so at a gross simplification that would be 6 big matrices in a chain, with the rows of each matrix representing individual neurons.