Profile and Analysis of Memory Hierarchies for High - inf

Profile and Analysis of Memory Hierarchies for
High Efficiency Video Coding - HEVC
Ana Mativi, Eduarda Monteiro and Sergio Bampi
Introduction
●
●
HEVC Encoder:
Requires 40%-70% higher computation
effort and >2x more memory accesses
when compared to H.264 [1]
Accesses to main memory have great
impact on energy comsumption
Strongly relies on the cache hierarchy to
enhance overall performance
Generated results for HEVC encoder on 54
different cache configurations
●
Latency x Cache Configurations
Access Time (ns)
●
Results
6E+10
5E+10
4E+10
Methodology
●
●
Python script runs the tools, parses and
refines results
Callgrind tool [2] provides a summary of
HEVC's memory behavior (on HM 16.2 [3])
Cacti tool provides the cost of read/write in a
given cache configuration
HEVC
Memory
Configurations
L1
Cache
…
CallGrind
Energy Consumption
Read
Write
LL
Cache
Access
Time
#Accesses
#Read/Write
#Hits/Misses
Used the best cache (L1 8K-4, LL 8MB-2) to
generate detailed HEVC results (8 frames
class D video, QP 32)
●
Read/Write by Encoder's Module
CallGrind Annotate
Energy
Per access
Profiling
Refinement
Energetic model
by encoder's
modules
Cact
●
Conclusions and future work
●
●
●
●
●
Write
Read
Latency Estimation is modeled to reduce the
cache memory set
Latency=( L 1hits ×L1lat )+( LLhits ×LLlat )+( LLmisses ×RAM lat )
●
4E+11
4E+11
3E+11
3E+11
2E+11
2E+11
1E+11
5E+10
0E+0
Cache
Cache data
data by
by
function
function
#Accesses
●
The best cache shows positive results reduced latency - for this video application
L1 hits are up to 95%
LL global misses are less than 0.0012%
All HEVC Encoder modules have more than
70% reads
The proposed methodology provides new
ways to analyse the encoder's features and
could be used for any other application
Next step will be changing the coding
parameters to analyse the impact on the
memory hierarchy
Accesses by Encoder's Module (%)
10%
0%4% 1%
1%
2%
36%
7%
0%
39%
Entropy
InvQuant
Quant
InvTransf
Misc
Transf
Filter
Inter/Intra
Inter
Intra
Hits/Misses at L1 and LL (%)
0% 2% 2%
L1Hits
L1Misses
LLHits
95%
LLMisses
References
[1] Muhammad Shafique, Jörg Henkel. Low Power Design of the Next­Generation High Efficiency Video Coding. ASP­DAC, pages 274­281, 2014.
[2] Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. PLDI, pages 89–100, 2007.
[3] HM16.2, High Efficiency Video Coding Test Model (HM) Encoder, Strasbourg, 2014.
Instituto de Informática
Universidade Federal do Rio Grande do Sul
Caixa Postal 15064 | 91501-970 Porto Alegre - RS - Brasil
Contact: ana.mativi@inf.ufrgs.br | inf.ufrgs.br/~acmsouza
5th IEEE CASS Rio Grande do Sul Workshop – October 22-23, 2015 – Porto Alegre, Brazil
Download PDF
Similar pages