null  null
An overview lecture for EE/Embedded Systems students. During your future
career as engineer it is useful to know about neural architectures. Neural
architectures used in many domains that relate to computation.
Interesting things happen in different domains; Machine Learning,
Neurobiology, Computer Vision, Physics, Computer architecture,
Neuromorphic). This makes Neural Networks more relevant for the
Embedded Systems Community
First Introduce the neural network model, if we don’t know what it is, we don’t
know why it is used.
Inspired by the Biological Neurons in the Brain these Neural network models
are developed.
Later on more information about Neurobiology. For now enough to know that
neurons have a connection with an efficiency that is used to send over
signals. When enough signals arrive at the postsynaptic neuron, it fires an
new spike.
Quick recap of the perceptron model, can separate input data to classes, and
learn separation between classes.
Problem can not solve problems that require non-linear separation. For
example an XOR function, in practice many problems require non-linear
If we want the pattern on the input to give a high output value te one values
on the input should be multiplied with positive weights. If we want that a
different pattern has a low output we should set all weights connected to
other pixels to a negative value. In this situation a pattern should match the
input. We could use the bias input with a big negative value to force the
output to zero if a different pattern that ‘one’ on the input is pressed.
From single perceptron's it is possible to build more powerful classifiers that
can solve problems that are non-linear. That is something which is interesting
for the Machine Learning community. The desirable functionality of learning a
behavior to a machine is very useful, the techniques are closely related to
optimization theory.
Single perceptrons can be connected to form a Multi Layer Perceptron (MLP)
also called Aritificial Neural Network (ANN). Because the different
representations that can be build in the hidden (middle) layer and the nonlinear activation function, this network can separate non-linear problems.
Training is done by stochastic gradient decent this involves updating the
weights in the negative direction of the error gradient. This process is
repeated for a big set of input patterns until the error converges to a low
value. The gradient computation and weight updates can be implemented
efficient by the error back-propagation algorithm.
The idea of a learning perceptron introduced a hype, the famous XOR prove
that it could only solve linearly separable classification problems removed
much interest.
The MLP solution created a hype again, but overtraining and generalization
was still a problem. Training required complex parameter tuning and Support
Vector Machine showed to have better properties for generalization because
they maximize class difference.
A system that can learn from example can also solve many problems an
application designer encounters. Therefore many applications are driven by
neural network based machine learning.
Read this reference for a good description of the CNN approach to face
detection: Garcia C., Delakis M., “Convolutional Face Finder: A Neural
Architecture for Fast and Robust Face Detection”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, 26(11), November 2004, p. 14081423.
Focus on data instead of algorithm complexity
Pre-process data to generate more examples
Use a test set to verify generalization
Classify features with a hierarchy of trained simple detectors. Each stage
simple features are combined into more complex features. If you want to
know all details of this type of neural network read this reference (is a big
paper but contains most of the details): Y. LeCun, L. Bottou, Y. Bengio and P.
Haffner: Gradient-Based Learning Applied to Document Recognition,
Proceedings of the IEEE, 86(11):2278-2324, November 1998.
For more information regarding the speed sign detection and recognition
read our paper:
M.Peemen, B.Mesman and H.Corporaal, Speed Sign Detection and
Recognition by Convolutional Neural Networks, In: Proceedings of the 8th
International Automotive
Congress. pp. 162-170 (2011)
Four example application domains that ANN can solve very well
Read the paper on applications that can be solved with Neural networks:
BenchNN: T.Chen, Y.Chen, M.Duranton, Q. Guo, A. Hashmi, M.Lipasti,
A.Nere, S.Qiu, M. Sebag, O.Temam. On the Broad Potential Application
Scope of Hardware Neural Network Accelerators, IEEE International
Symposium on Workload Characterization (IISWC), November 2012
Due to recent changes in the field of chip fabrication some constraints force
this Tech branch to find solutions that can cope with the new constraints.
Neural nets can provide a few solutions to these new constraints.
Two interesting constraints that motivate the industry to come op with
"What do you do when chips get too hot to take advantage of all of those
transistors that Moore's Law provides? You turn them off, and end up with a
lot of dark silicon — transistors that lie unused because of power limitations.
As detailed in MIT Technology Review, Researchers at UC San Diego are
fighting dark silicon with a new kind of processor for mobile phones that
employs a hundred or so specialized cores. They achieve 11x improvement
in energy efficiency by doing so."
As an efficient multi purpose accelerator Hardware Artificial Neural Networks
could be used.
Functionality can be reprogrammed by updating the connections.
For various application fields these give state of the art results, as shown in
previous slides.
The fundamental operations contain a lot of parallelism.
How would we develop such an accelerator.
We have this mathematical description, and a graphical network. Let’s look at
the code that describes this network.
From a network towards hardware with memories, and computing elements.
How could you load bias values into this system?
In the old days they tried to do this analog. Digital multipliers consume a lot
of logic. Still this system needs sample & hold circuitry to process a net layer
by layer.
Use a lot of MACC processing elements and a sigmoid approximation and
two memories as basic elements of a digital neuro processor.
Commercial implementations of SIMD neuro processors exist! SIMD with an
orthogonal instruction set is quite flexible there exist compilers to code these
chips in languages such as C. But not the most efficient approach.
With multiple input patterns it is possible to perform the multiply accumulate
operations into Matrix-Matrix products.
Could implement these in a systolic array. So it is possible to stream in your
data with much less control. This approach is more efficient but less flexible.
If your operations can only have these specialized functions and the
designers overlooked some functionality, it is not easy to solve as a
programmer. Development of compilers for these architectures is much more
The systolic array used in this accelerator is discussed in another paper:
M.Sankaradas, V.Jakkula, S.Cadambi, S.Chakradhar, I.Durdanovic,
E.Cosatto, H.P.Graf, A Massively Parallel Coprocessor for Convolutional
Neural Networks, In Proc. 20th IEEE
International Conference on Application-specific Systems, Architectures
and Processors (ASAP), 2009, Boston, MA
Recap of the intermediate images that need temporal storage.
The parallel coprocessor connects the systolic arrays in a reconfigurable way
to input pixels or output arrays. This minimizes the amount of stored
intermediate image results.
5x faster and 10x better energy efficiency
Weak spot of a neural accelerator is the memory decoder. The neuron
network can have a few errors before output is broken (see next slide). If
memory decoder is broken the device does not work anymore. A solution to
reduce this probability is unfolding the network. This distributes the memory
over the chip close to the neural processors. This solution can still use timemultiplexing but than you need a memory again. This can be made robust by
increasing the transistor size of the memory decoder. With the unfolded
network less context switches are required to simulate the bigger network.
Read this paper to see all experiments and design ideas: Olivier Temam: A
Defect-Tolerant Accelerator for Emerging High-Performance Applications,
ACM/IEEE International Symposium on Computer Architecture (ISCA), June
Olivier Temam: A Defect-Tolerant Accelerator for Emerging HighPerformance Applications, ACM/IEEE International Symposium on Computer
Architecture (ISCA), June 2012
The tech. improvements also create new possibilities for the field of
Neurobiology. Every year this domain can simulate bigger neural circuits.
Why simulating the brain? Possible with software but this scales very bad.
Only small neural circuits possible. Without the communication overhead the
brain would require over 30 Peta Flops.
Blue brain project simulates small brain structures on the molecular level on
a super computer.
Spinnaker builds a more energy efficient super computer out of many ARM
cores. Compared to Blue Brain Spinnaker uses a more abstract Integrate &
Fire neuron model.
Take a look at the Spinnaker project:
18 Arm9 cores on a chip with a dedicated NoC and Packet router to go off
Neurons that share a lot of interconnections are grouped on a chip with the
local 128MB SDRAM. This minimizes the packet traffic over the off-chip
Spinnaker is still a multiprocessor network of general purpose cores. This is
flexible but also less efficient compared to dedicated circuits
Biological Neuron communicates with spikes. Instead of only computing with
the pike rates also the arrival time can trigger actions.
A model of a leaky Intergrate and Fire neuron. This neuron only requires ~14
transistors. Most area is now consumed by the synapses. Storing the weight
in a capacitance consumes much area. Read about real implementations in:
Antoine Joubert, Bilel Belhadj, Olivier Temam, Rodolphe Heliot: Hardware
Spiking Neurons Design: Analog or Digital?, IEEE International Joint
Conference on Neural Networks (IJCNN), June 2012.
Wafer scale integration of Integrate and Fire neuron models. See:
New technology innovations that open new possibilities for neural hardware.
The memristior developed by HP (2008) looks very promising as a basic
element for the implementation of synapses. Recently Intel has published an
interesting paper about this technology with a crossbar synapse array. Read
the paper for more information.
Growing organic chips, can be very cheap. But it is difficult to read out the
signals form the living neurons. The neurons on these chips are used for
experiments instead of a commercial product. This project was one of the
first, many others have followed by now.
This was a broad overview of the field of neuro computing. It shows many
promising concepts of neural architectures. For many domains this is only a
short summary of the topic. For example Machine Learning has complete
courses to understand the concepts. The chance is quite high that you will
encounter neural networks in your EE/ES career. This is mainly due to the
nice properties of neural networks; (learning, flexible, fault tolerant, and
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF