ARM Cortex-M4 based ARM Cortex M4 based
ARM Cortex
Cortex-M4
M4 based
LPC4000 Family
October/November 2010
Presenter’s Name
Agenda
I t d ti
Introduction
Introducing the Cortex-M4 LPC4300
NXP’s advanced peripherals
Demo
e o
Tools / Getting started
2
Introduction
3
NXP Semiconductors
Eindhoven (NL)
Nijmegen (NL)
Leuven (B)
Hamburg (GER)
Caen (F)
Bellevue (US)
Gratkorn (Austria)
Beijing
S Jose
San
J
(US)
San Diego (US)
Shanghai/Suzhou
Hong Kong
Tempe (US)
Singapore
 Headquarters:
H d
t
Ei dh
Eindhoven,
Th
The N
Netherlands
th l d
B
Bangalore
l
(I di )
(India)
 Employee base: about 28,000 employees
working in more than 25 countries
 Net sales: $3.8 billion in 2009
 Patents: ~14,000 issued and pending
 R&D: Over $600 million investment per year
 Innovation track record dating 50+ years
4
NXP is a leader in ARM Flash MCUs
Clear strategy: 100% focus on ARM
Top performance through leading
technology & architecture
Design flexibility through pin- and
p
solutions
software-compatible
8051
– Scalable memory sizes
– Widest range of peripherals
Unlimited choice through
g complete
families for multiple cores
ARM7
ARM9
Cortex
M0
Cortex
M3
Cortex
M4
5
NXP microcontrollers = One continuum
 Five MCU cores lined up to serve a full range of application requirements
Cortex-M4
Unique new hybrid technology,
combining MCU with powerful DSP
extensions
AP
Cortex-M3
= Application Processor
High performance
microcontroller for max.
b d idth and
bandwidth
d connectivity
ti it
DSC
Cortex-M0
Low power microcontroller
ready to replace traditional
8/16b architectures
= Digital Signal Control
ARM9
LPC3000, low-cost
application processors
up to 270 MHz
MCU
= Microcontroller
ARM7
LPC2000, the industry leading
microcontroller family
6
NXP changing the industry MCU landscape
 Breaking
ea g tthrough
oug ttraditional
ad t o a boundaries
bou da es o
of 8b, 16b,
6b, 32b
3 b and
a d DSP
S
cost
8
8-bit
bit
Very low-end
8-bit not in
scope
p for
NXP ARM
performance
16
16-bit
bit
Cortex-M0
32
32-bit
bit
Cortex-M3
DSP
Cortex-M4
High-end DSP
or MPU are not
in scope for
NXP ARM
Binary and tool compatible
7
Powerful Cortex-M instruction set
8
ARM CortexTM-M0 based parts
Cortex-M0
Low Power
Superior Code Density
32-bit performance
Low Cost
Pin compatible options from M0 to M3
9
ARM CortexTM-M3 based parts
Cortex-M3
General purpose, 32-bit microprocessor
High performance
Very low power consumption.
Pin compatible
p
options
p
from M0 to M3
ARM CortexTM-M4 based parts
Cortex-M4
Adds DSP extensions and a Floating Point Unit
11
Rapidly growing family of Cortex-M microcontrollers
 Check pin- and software compatible options: www.nxp.com/microcontrollers
Cortex-M4
+150MHz
LPC4300
MCU with powerful DSP extensions
Cortex-M3
LPC1800
Memory options up to 1MB flash, 200k SRAM
LPC1700
High-performance with USB, Ethernet, LCD, and more
LPC1300
USB solution, incl. on-chip USB drivers
LPC1200
Memory options up to 128k flash
LPC1100
Best-in-class dynamic power consumption
Up to 150MHz
Cortex-M0
U tto 50MHz
Up
50MH
LPC11A00
Mixed-signal, incl. DAC, Temp Sensor, Comparators
LPC11U00
USB solution, incl. on-chip USB drivers
LPC11C00
CAN solution, incl. on-chip CAN drivers
LPC4000
13
Cortex-M4 Introduction
Feb 22, ARM Launches Class-Leading Cortex-M4 Processor For High
Performance Digital Signal Control
– combination of high-efficiency signal processing and MCU technology
Feb 22, NXP Licenses ARM Cortex-M4 Processor for 32-bit Microcontroller
Signal Processing Applications
– complements NXP
s Cortex
Cortex-M3
NXP’s
M3 and Cortex-M0
Cortex M0 processor-based
processor based devices and
enables us to provide an end-to-end solution to the MCU community
April 12, NXP Demonstrates New Class of DSC Based on ARM Cortex-M4
– first working silicon of newest NXP microcontrollers based on the ARM Cortex-M4
– the DSP extensions of the Cortex-M4 offer significant advantages, for example,
offering 5 to 10 times improvement in complex DSP algorithms
Now,
Close
cooperation
and
developers
N
Cl
ti with
ith llead
d customers
t
d DSP algorithm
l ith d
l
14
NXP Cortex-M4 Roadmap
LPC4xxx
LPC43Dxx
2M Flash
Segment LCD
LPC4xxx
Enhanced
Analog
LPC1800
Cortex-M3
LPC1700
Cortex-M3
LPC1700
LPC1700
Cortex-M3
LPC4300
150MHz
LPC4100
M3:120MHz
15
Introducing the LPC4300 Family
Cortex-M4 based Digital Signal Controller
Cortex-M0 Subsystem
Up to 1 MB Flash
– Dual-Bank Flash provides safe inapplication programming (IAP)
L
Large
SRAM
SRAM: up tto 200 KB SRAM
SPI Flash Interface with four lanes and up
to 80 Mbps/lane
St t C
fi
bl Timer
Ti
S
b
t
State
Configurable
Subsystem
Additional Features
–
–
–
–
–
–
–
–
–
10/100 Ethernet MAC
LCD panel controller (up to 1024H × 768V)
Two 10-bit ADCs and 10-bit DAC at 400ksps
Eight-channel General-Purpose DMA
(GPDMA) controller
Motor Control PWM
Quadrature Encoder Interface
4x UARTs, 2x I2C, I2S, CAN 2.0B, 2x
SSP/SPI
Smart card interface
Up to 80 general purpose I/O pins
SGPIO
Two High-speed USB 2.0 interfaces. An
on-chip
hi High-speed
Hi h
d PHY
Cortex-M0
Cortex-M3
Cortex-M4
Pin compatibility with Cortex-M3 parts
Extends ARM Continuum
16
LPC4300
Code Compatible
LPC4300 Cortex-M4
LPC1800 Cortex-M3
LPC4300
Cortex-M4
Microcontroller Core – CPU
LPC4000 Family Cortex-M4 Features:
– NVIC & WIC
• Supports peripheral interrupts
– MPU (Memory Protection Unit)
• Supports up to 8 regions
– FPU (Floating Point Unit)
• IEEE 754 compliant
– Full Debug Options:
• JTAG/SWD
• ETM
• Flash Patch
– 150MHz Execution
• Flash or SRAM
NXP’s low-leakage 90nm process technology allows operation at up to 150MHz
21
Microcontroller Core – Power Control
Flexible clock generation unit:
– Allows clock to each peripheral to be configured independently.
q
• Can use different source and create different frequencies.
• Unused peripherals can be turned off by disabling the clock.
Low power modes:
– Sleep:
• CPU execution is suspended but peripherals continue
– Deep-Sleep
• Main oscillator and all internal clocks except the IRC are stopped Flash memory is
in standby
standby, ready for immediate use
– Power-down
• Same as Deep-Sleep mode except Flash and IRC are shut down, state is
preserved
– Deep power-down
•
•
•
All clocks including IRC are stopped. Internal voltage is turned off
Complete system state is lost, special registers in the RTC domain are preserved
Wake up via reset, external pin, or RTC Alarm
22
LPC4300
Cortex-M0
Subsystem
Cortex-M0 Subsystem - Overview
Cortex-M4
C t M4
+
Control Algorithm
= LPC4300
Real Time Control
Processing Application
Audio/Image Processing
C t M0
Cortex-M0
+
Peripheral Control
= Solution!
Protocol Emulation
C t M0 subsystem
Cortex-M0
b
t
- unburdens
b d
the
th main
i Cortex-M4
C t M4 core!!
Separates Processing and Real Time Control – in one chip
24
Cortex-M0 Subsystem - Overview
Highly flexible Cortex-M0 subsystem features
–
–
–
–
Connected to the internal bus matrix giving access to all peripherals.
NVIC for dedicated interrupt support.
Separate clock and power control
Shared memory allows easy inter-processor communication
Cortex-M4
Shared
S
Memory
Cortex-M0
AHB Matrix
Peripherals
25
Cortex-M0 Subsystem – Audio Processing
Cortex-M4: Full power devoted to Audio processing
Cortex-M0: Handles the hardware control – I2S & USB
I2S
Cortex-M4
Cortex-M0
LPC4300
USB
26
Cortex-M0 Subsystem – Motor Control
Cortex-M4: Single shunt Field Oriented Control (FOC)
Cortex-M0:
Cortex M0: Receives control commands via CAN interface
Cortex-M4
Cortex-M0
SCT
CAN
Command
LPC4300
27
Cortex-M0 Subsystem - Development
Cortex-M4 and Cortex-M0 share a debug interface allowing a single
JTAG/SWD unit to debug both cores
28
LPC4300
Memory
Memory - Dual Bank Flash
Two 512K byte banks of flash memory.
Flash B
Can be used as a single 1M byte memory area
area.
Enhanced memory controller and 256-bit wide
interface allows operation at up to 150MHz.
Flash A
Contiguous Mode
Flash B
Flash A
Dual Mode
30
Memory - SRAM
Up to 200KB static RAM available.
Block/bus architecture allows simultaneous CPU
and DMA accesses to different SRAM areas.
Data Only
D
32kB and
d 16kB SRAM blocks
bl k with
ith separate
t b
bus
interfaces.
16KB
16KB
32KB
Code & Data
Optimized for DSP use - 96kB and 40kB SRAM
blocks accessible by high speed system bus can
be used for code and data storage.
40KB
96KB
31
LPC4300
SPI Flash
Interface
SPIFI - Overview
SPI Flash Interface
Serial
Flash
Memory
SPIFI
Internal
Memory
Cortex-M4
LPC4300
Patented feature that maps low-cost serial flash memories into the
internal memory system.
33
SPIFI - Supported Devices
Compatible with both standard and Quad SPI flash memory devices
from a majority of suppliers:
Atmel, Gigadevice, Macronix Numonyx (now Micron) SST (now Microchip), Winbond
A couple
l off years ago, PC
PCs started
t t d using
i Q
Quad-SPI
d SPI Fl
Flash
h ffor lloading
di
BIOS . The high PC volumes forced prices down to low levels
Advantages:
High
small p
packages/few
pins, low cost
g
g speeds,
p
g
p
Disadvantages: Not supported by standard MCUs -- UNTIL NOW!
NXP’s patent-pending SPI Flash Interface (SPIFI) on the LPC4300
series is the first and only MCU to take full advantage of Quad SPI
Flash
34
SPIFI - Quad SPI Flash Interface
SPI Flash Interface uses either 4 or 6 lines
– Standard SPI flash uses CLK, CS, MISO and MOSI
– Quad SPI flash uses CLK, CS IO0, IO1, IO2 and IO3
/CS
DO(IO1)
WP(IO2)
GND
Serial
Flash
Memory
VCC
/HOLD(IO3)
CLK
DI (IO0)
35
SPIFI – Image Storage – Problem
Image Storage: Problem
Flash
Memory
CPU
Core
•
Devices with complex user
interfaces
require
i t f
i storage
t
for images that will be
displayed on an LCD.
•
Images can be stored in
external SPI flash but
usually have to be copied
into internal SRAM and
then sent to LCD controller.
•
Problem with this approach
is that it uses large
amounts of internal SRAM
SRAM
LCD
Controller
Microcontroller
36
SPIFI – Image Storage – Solution
Image Storage: SPIFI Solution
Serial
Flash
Memory
SPIFI
LCD
Controller
LPC4300
•
Image stored within external
serial flash memory
•
High speed quad SPI interface
allows
ll
iimages tto b
be
transferred directly to LCD
controller using DMA
•
Advantages of a SPIFI based
solution:
• Does not use precious
internal SRAM – available for
other uses.
37
SPIFI - DSP Algorithm: Problem
CPU
Core
LPC4312
•
DSP applications are often
loaded from flash into internal
SRAM for high performance
execution.
•
These algorithms are stored
within internal or external parallel
memory
•
Problems with this approach:
SRAM
Flash
Memory
• Have to add space consuming
external flash memory to board
OR
• Have to sacrifice precious
internal flash memory for
algorithm storage
38
SPIFI - DSP Algorithm: Solution
CPU
Core
SRAM
•
Store DSP algorithm in low cost
external serial flash memory.
•
Loaded into dedicated internal
SRAM block for high speed
execution.
e
ecut o
•
Advantages of a SPIFI based
solution:
LPC4310
+
Serial
Flash
Memory
• Low cost external SPI flash
memory consumes minimal
board space.
• No waste of p
precious internal
flash memory for code that is
always executed from SRAM
39
LPC4300
State
Configurable
Timer
Subsystem
SCT - Overview
State Configurable Timer (SCT) is a timer/capture unit coupled with a
highly flexible event driven state machine block.
Allows a wide variety of timing, counting, output modulation, and input
capture operations.
K F
Key
Features:
t
–
–
–
–
–
8 inputs
16 outputs
16 match/capture registers
16 events
32 states
+
41
SCT - Operation
Standard Timer
State/Event Logic
42
SCT - Example Application
Simple traffic light:
Car lane red signal
Pedestrian red signal
Car lane yellow signal
Pedestrian green signal
Car lane green signal
Button to request
Car traffic stop
43
SCT - Example Application
Four different allowed combinations (states) of the two display entities
One external input (the button)
Five outputs
#
Car lane lights
Pedestrian lane lights
1
Green
Red
2
Yellow
Red
3
Red
Red
4
Red
Green
SCT allows a this application to be implemented in hardware!
44
SCT – Easy to use
1. Design the state machine
Library of examples will be available!
3. Let the SCT do the work!
2. Set the registers/timer
LPC_SCT->CTRL |= (1UL << 7);
LPC_SCT->TIM
= 0x4534;
LPC_SCT->ENB &= 0x8001;
45
LPC4300
Serial
GPIO
SGPIO - Overview
Serial GPIO (SGPIO) = GPIO + Timer/Shift Register:
– Used to create or captures multiple real time serial data streams.
– No more having to write code loops to manipulate GPIO in real time.
– Say goodbye to CPU intensive big banging!
Key Features:
–
–
–
–
Up to 8 inputs/outputs each with their own timer/shift register unit.
Counter to control the rate at which data is clocked in/out.
Counter to control the number of bits clocked out/in.
high low,
low or high impedance.
impedance
Output has three states high,
Clock
PWM1
PWM2
PWM3
47
SGPIO - Operation
Each SGPIO unit features:
– Two 32-bit shift registers
– Counter to control bit rate
– Counter to control number
of bits clocked out/in
– Register controls the state
(enable/disable) of the
output for each bit that is
clocked
l k d out.
t
48
SGPIO = Proprietary Serial Interface
SGPIO can be used to emulate proprietary serial interfaces
• Problem:
Lots
P bl
L t off peripherals
i h l on the
th market
k t use non-standard
t d d serial
i l
interfaces (LCD drivers, audio codec etc).
• Standard Microcontroller Solution (no SGPIO):
• Application designer has to write CPU intensive loops to create required bit
streams – painful bit banging!
• CPU is 100% occupied while waveform(s) are generated.
• LPC4300 based Solution:
• Configure SGPIO to generate desired waveform(s) with just a few register
writes.
• Interrupt generated when data is clocked out – CPU is not blocked.
49
SGPIO = Standard Serial Interface
To create a 7.1 channel I2S output 5 SGPIO units are required:
– 4x I2S Data for 7.1 channels
– 1x I2S WS
Data is shifted out at 2M.fs, M=data word length, fs=sampling rate
For 32bit data and fs = 96kHz the shift clock = 2.32.96k = 6.144 MHz
The I2S data shift register should be loaded with the 32b audio samples. The
CPU has to read SRAM and load the slices at a rate of 8x96k words/sec.
The WS shift register should be loaded to create a 96kHz WS waveform.
The I2S CLK does not need a dedicated shift register, it can be created from a
shift counter output.
CPU load: if an instruction takes 2 clk, SRAM access 2clk, SGPIO access 2clk
then the CPU load is 6x8x96kHz, this is 3% load at a 150MHz clk rate
Create
I2S
OR
I2C
OR
SPI
OR…
50
LPC4300
USB 2.0
Ethernet
Interfaces – USB & Ethernet
Two USB 2.0 Interfaces:
– USB 2.0 Host/Device/OTG interfaces.
– One with on-chip high-speed PHY.
– One with on-chip full-speed PHY and ULPI interface for external high
speed PHY
Ethernet MAC with RMII and MII interfaces to external transceiver:
–
–
–
–
Supports 10/100 Mbit/s
TCP/IP hardware checksum
DMA support allows high throughput at low CPU load
IEEE 1588 advanced time stamp support.
52
Audio Application
53
Audio Design Example
7-band Graphic Equalizer
– Cortex-M3 LPC1768 running at 120MHz
– Cortex-M4 running at 120MHz
Real-time Demo
• 7 band parametric EQ
• 32
bit precision
32-bit
• Stereo processing
• 48 kHz sample rate
Designed using DSP Concept’s Audio Weaver development environment
– a graphical drag-and-drop design environment and a set of optimized audio
processing libraries.
54
2nd order IIR Filter – AKA “Biquad”
•
Commonly used for control and audio filtering
•
Implemented using a difference equation
equation.
•
Direct Form 1 structure is the most numerically robust - shown below
•
Has 5 coefficients and 4 state variables
•
Coefficients determine the response of the filter (lowpass, highpass,
etc.) and may be computed in a number of different ways
▫ Simple design equations running on the MCU
▫ External tools such as MATLAB
yn  b0 xn  b1 xn  1  b2 xn  2
 a1 yn  1  a2 yn  2
55
Cortex-M Biquad implementation
Cortex-M3
xN = *x++;
yN = xN * b0;
yN += xNm1 * b1;
yN += xNm2 * b2;
yN -= yNm1 * a1;
yN -= yNm2 * a2;
*y++ = yN;
xNm2 = xNm1;
xNm1 = xN;
yNm2 = yNm1;
yNm1 = yN;
Decrement loop counter
Branch
2
3-7
3-7
3-7
3-7
3-7
2
1
1
1
1
1
2
27-47 cycles
•
Cortex-M4
2
1
1
1
1
1
2
1
1
1
yn  b0 xn  b1 xn  1  b2 xn  2
1
1
2
 a1 yn  1  a2 yn  2
Cortex-M4 40-65%
higher performance !
16 cycles
Only looking at the inner loop, making these assumptions
▫ Function operates on a block of samples
▫ Coefficients b0, b1, b2, a1, and a2 are in registers
▫ Previous states, x[n-1],
[ ] x[n-2],
[ ] y[n-1],
y[ ] and y[n-2]
y[ ] are in registers
g
56
Optimize by unrolling the loop by 3
x0 = *x++; (
(2 cycles)
y
)
y0 = x0 * b0;
(1 cycle)
y0 += x1 * b1;
y0 += x2 * b2;
(1 cycle)
(1 cycle)
y0 -= y1 * a1;
y0 -= y2 * a2;
(1 cycle)
(1 cycle)
*y++ = y0; (2 cycles)
x2
y2
y2
y2
= *x++; (2 cycles)
= x2 * b0;
(1 cycle)
+ x0 * b1;
(1 cycle)
+=
+= x1 * b2;
(1 cycle)
y2 -= y0 * a1;
y2 -= y1 * a2;
(1 cycle)
(1 cycle)
*y++ = y2; (2 cycles)
x1 = *x++; (2 cycles)
y1 = x1 * b0;
(1 cycle)
y1
y1
y1
1
y1
+=
+=
-=
-=
x2
x0
y2
2
y0
*
*
*
*
b1;
b2;
a1;
1
a2;
(1
(1
(1
(1
cycle)
cycle)
cycle)
l )
cycle)
Reduces loop overhead
Eliminates
Eli
i t th
the need
d tto shift
hift
down state variables
30 cycles
y
on Cortex-M4 to
for 3 output samples
 10 cycles per sample
*y++ = y1; (2 cycles)
Decrement loop counter (1 cycle)
Branch
(2 cycles)
57
Sampling / Nyquist
Nyquist / Shannon Criteria
– sampling frequency (Fs) must be at least twice the signal
bandwidth, or information about the signal will be lost.
Type
Signal bandwidth
Typical Fs
Voice frequencies
300 Hz to 3400 Hz
8 kHz
Audible frequencies
20 to 20,000 Hz*
44.1 kHz
48 kHz
* < 20 Hz often felt rather than heard
>20,000 Hz sometimes sensed by younger people
58
Real Time Processing
DSC bandwidth limited by sampling rate
– Available clock cycles = processor speed / sampling rate
Telephony
Fs = 8kHz
Audio
Fs = 44.1kHz
125 µsec => 120 MHz / 8 kHz => 15,000 clock cycles
22.7 µsec => 120MHz / 48kHz => 2,500 clock cycles
Filtering examples. FFTs use block processing. (Processing time = sample interval x N)
59
Results
Performance
• Cortex-M3 needed 1291 cycles (51.6% processor loading)
y
(12%
(
processor
p
loading).
g)
• Cortex-M4 needed onlyy 299 cycles
60
Motor Control Application
61
NXP’s Cortex M4 Motor control EXAMPLE
NXP’s first Cortex‐M4 based DSC
Running Single shunt Field Oriented Control (FOC)
Uses new State Configurable Timer Subsystem
g
y
– Makes 6 independent PWM signals with dual edge control
– Triggers ADC conversion at an exact determined moment 62
System overview
63
LPCXpresso Motor Control board
NXP Cortex-M4
Cortex M4 Eval board
Tools/Getting Started
66
Cortex Microcontroller Software Interface
Standard
CMSIS defines for a Cortex-Mx Microcontroller System:
– A common way to access peripheral registers and a common way
to define exception vectors
– The register names of the Core Peripherals and the names of the
Core Exception Vectors
– A device independent interface for RTOS Kernels including a
debug channel
– Interfaces for middleware components (TCP/IP Stack, Flash File
System)
By using CMSIS compliant software
p
, the user can easilyy re-use
components,
template code. CMSIS is intended to enable
the combination of software components
p middleware vendors.
from multiple
67
DSP Libraries for CortexTM-M3
C Library of Optimized DSP Algorithms
– FFT
• Supports both 32 and 16 bit data lengths
• Block sizes of 64, 256 and 1024
– FIR and IIR filters
• 16-bit single stage Biquad
• 32-bit single stage Biquad
–
–
–
–
–
PID controller
Resonator function
Random number generator
Dot Product
Cross product of vectors
68
M3 and M4 pin compatible boards
69
MCU Tool Solutions
NXP’s Low cost
Development
p
Tool Chain
Rapid Prototyping
Online Tool
Traditional Feature Rich
Tools (third party)
Circuit Cellar/Elektor
“NXP
NXP mbed Design Challenge”
Challenge
Succeed and you could walk away with a share
of $10,000 in cash prizes!
Launched Sept 21, 2010
mbed microcontroller
– Based on NXP LPC1768
– Made for prototyping
– Comes in a 40-pin 0.1" pitch DIP form-factor
so it's ideal for experimenting on
breadboard, stripboard and PCBs
Combined with mbed "Cloud" compiler at
http://mbed.org
Already more than 10,000 boards shipped!
For complete rules, or to request your
complimentary contest kit, please visit:
www.circuitcellar.com/nxpmbeddesignchallenge.
71
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement