AN2094 Freescale Semiconductor / Motorola, AN2094 Datasheet - Page 28

no-image

AN2094

Manufacturer Part Number
AN2094
Description
ITU-T G.729 Implementation on StarCore SC140
Manufacturer
Freescale Semiconductor / Motorola
Datasheet
Details of Selected Functions
After function-level reoptimization the encoder ran 1.77 times faster than the initial version.
4.2.4 Assembly Implementation
The assembly version is similar to the reoptimized C version, but optimizes register usage. The final version of the
ACELP_Codebook() function is 2.8 times faster and 1.1 times smaller than the initial version.
4.2.5 Summary
Table 11 lists the ACELP_Codebook() cycle count and code size after each optimization step
4.3 Optimizations in Lag_max()
The Lag_max() function is called by the Pitch_ol() function to compute the open-loop pitch estimation.
This computation is done in the following steps:
4.3.1 Function-Level C Optimizations
The following optimizations were applied to the original reference code:
Analysis of the original function revealed that some parameters are constant. The maximum lag and the minimum
lag parameters of the Lag_max() function are always specific values, and the number of correlations computed is
always a multiple of four. This suggests the use of multisample to take advantage of the four ALU architecture of
the SC140. Also, the scal_sig[-PIT_MAX] and signal[-PIT_MAX] pointers, which are used to compute
correlations in the in Pitch_ol() function, are aligned on 8-byte boundaries to generate more efficient code.
28
1.
2.
3.
Replace the L_shl() function with the << operator.
Replace the mult() instruction with L_mult() combined with a 16-bit right shift.
Align the input and local vectors to allow parallel data moves.
Use multisample to compute correlations.
Replace unmodified variables with constants.
Compute the autocorrelation of the input signal for all possible time lags between minimum and maxi-
mum lag.
Determine which lag corresponds to the maximum autocorrelation value.
Compute the normalized correlation for the selected lag.
ITU-T G.729 Implementation on the StarCore™ SC140/SC1400 Cores, Rev. 1
Initial version
Initial function-level C optimizations
Algorithmic changes
C optimizations after algorithm changes
Assembly implementation
Note:
Table 11. ACELP_Codebook() Performance Summary
Applied to initial version
Version
1
.
Cycle Count
38713
32102
26994
21770
13842
Size (Bytes)
4804
4828
6456
6600
4372
Freescale Semiconductor

Related parts for AN2094