- No category
advertisement
Multiple simple random sampling without replacement
13
Goal
Generate K>>1 simple random length-M samples without replacement from a population of size N (1 ≤M≤N).
Solution
For exact definitions and more details of the problem, see [
].
Use the following implementation of a partial Fisher-Yates Shuffle algorithm [ KnuthV2
] and Intel MKL random number generators (RNG) to generate each sample:
Partial Fisher-Yates Shuffle algorithm
A2.1: (Initialization step) let PERMUT_BUF contain natural numbers 1, 2, ..., N
A2.2: for i from 1 to M do:
A2.3: generate random integer X uniform on {i,...,N}
A2.4: interchange PERMUT_BUF[i] and PERMUT_BUF[X]
A2.5: (Copy step) for i from 1 to M do: RESULTS_ARRAY[i]=PERMUT_BUF[i]
End.
The program that implements the algorithm conducts 11 969 664 experiments. Each experiment, which generates a sequence of M unique random natural numbers from 1 to N, is actually a partial length-M random shuffle of the whole population of N elements. Because the main loop of the algorithm works as a real lottery, each experiment is called "lottery M of N" in the program.
The program uses M=6 and N=49, stores result samples (sequences of length M) in a single array
RESULTS_ARRAY
, and uses all available parallel threads.
Source code: see the lottery6of49
folder in the samples archive available at http://software.intel.com/enus/mkl_cookbook_samples.
Parallelization
#pragma omp parallel
{
thr = omp_get_thread_num(); /* the thread index */
VSLStreamStatePtr stream;
/* RNG stream initialization in this thread */
vslNewStream( &stream, VSL_BRNG_MT2203+thr, seed );
... /* Generation of experiment samples (in thread number thr) */
vslDeleteStream( &stream );
}
The code exploits all CPUs with all available processor cores by using the OpenMP*
#pragma parallel directive. The array of experiment results
RESULTS_ARRAY
is broken down into
THREADS_NUM
portions, where
THREADS_NUM
is the number of available CPU threads, and each thread (parallel region) processes its own portion of the array.
77
13
Intel
®
Math Kernel Library Cookbook
Intel MKL basic random number generators with the
VSL_BRNG_MT2203
parameter easily support a parallel independent stream in each thread.
Generation of experiment samples
/* A2.1: Initialization step */
/* Let PERMUT_BUF contain natural numbers 1, 2, ..., N */
for( i=0; i<N; i++ ) PERMUT_BUF[i]=i+1; /* using the set {1,...,N} */
for( sample_num=0; sample_num<EXPERIM_NUM/THREADS_NUM; sample_num++ ){
/* Generate next lottery sample (steps
): */
Fisher_Yates_shuffle(...);
*/
for(i=0; i<M; i++)
RESULTS_ARRAY[thr*ONE_THR_PORTION_SIZE + sample_num*M + i] = PERMUT_BUF[i];
}
This code implements the partial Fisher-Yates Shuffle algorithm in each thread.
In the case of simulating many experiments, the
is only needed once because at the
beginning of each experiment, the order of natural numbers 1...N in the
PERMUT_BUF
array does not matter
(like in a real lottery).
Fisher_Yates_shuffle function
Fisher_Yates_shuffle (...)
{
for(i=0; i<M; i++) {
: generate random natural number X from {i,...,N-1} */
j = Next_Uniform_Int(...);
: interchange PERMUT_BUF[i] and PERMUT_BUF[X] */
tmp = PERMUT_BUF[i];
PERMUT_BUF[i] = PERMUT_BUF[j];
PERMUT_BUF[j] = tmp;
}
}
Each iteration of the loop
A2.2
works as a real lottery step: it extracts a random item
X
from the bin with remaining items
PERMUT_BUF[i], ..., PERMUT_BUF[N]
and puts the item
X
at the end of the results row
PERMUT_BUF[1],...,PERMUT_BUF[i]
. The algorithm is partial because it does not generate the full permutation of length N, but only a part of length M.
NOTE
Unlike the pseudocode that describes the algorithm, the program uses zero-based arrays.
Discussion
In step
Next_Uniform_Int
function to generate the next random integer
X
, uniform on {i, ..., N-1} (see the source code for details). To exploit the full power of vectorized RNGs from
Intel MKL, but to minimize vectorization overheads, the generator must generate a sufficiently large vector
D_UNIFORM01_BUF
of size
RNGBUFSIZE
that fits the L1 cache. Each thread uses its own buffer
D_UNIFORM01_BUF
and the index
D_UNIFORM01_IDX
pointing to after the last used random number from that buffer. In the first call to
Next_Uniform_Int
function (or in the case all random numbers from the buffer have been used), the full buffer of random numbers is regenerated again by calling the vdRngUniform function with the length
RNGBUFSIZE
and the index
D_UNIFORM01_IDX
set to zero (earlier in the program): vdRngUniform( ... RNGBUFSIZE, D_UNIFORM01_BUF ... );
78
Multiple simple random sampling without replacement
13
Because Intel MKL only provides generators of random values with the same distribution, but step
A2.3
requires random integers on different intervals, the buffer is filled with double-precision random numbers uniformly distributed on [0;1) and then, in the
Integer scaling step, these double-precision values are
converted to fit the needed integer intervals: number 0 distributed on {0,...,N-1} = 0 + {0,...,N-1} number 1 distributed on {1,...,N-1} = 1 + {0,...,N-2}
...
number M-1 distributed on {M-1,...,N-1} = M-1 + {0,...,N-M}
(then repeat previous M steps) number M distributed on: see (0) number M+1 distributed on: see (1)
...
number 2*M-1 distributed on: see (M-1)
(then again repeat previous M steps)
...
and so on
Integer scaling
/* Integer scaling step */ for(i=0;i<RNGBUFSIZE/M;i++)
for(k=0;k<M;k++)
I_RNG_BUF[i*M+k] =
k + (unsigned int)(D_UNIFORM01_BUF[i*M+k] * (double)(N-k));
Here
RNGBUFSIZE
is a multiple of M.
] for performance notes related to this code.
Routines Used
Task
Creates and initializes an RNG stream.
Generates double-precision numbers uniformly distributed over the interval [0;1).
Deletes an RNG stream.
Allocates memory buffers aligned on 64-byte boundaries for the results and population.
Frees memory allocated by mkl_malloc
.
Routine
vslNewStream vdRngUniform vslDeleteStream mkl_malloc mkl_free
79
13
Intel
®
Math Kernel Library Cookbook
Optimization Notice
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
80
advertisement
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Related manuals
advertisement
Table of contents
- 5 Legal Information
- 3 Contents
- 11 Getting Help and Support
- 13 Notational Conventions
- 15 Related Information
- 17 Intel® Math Kernel Library Recipes
- 19 Finding an approximate solution to a stationary nonlinear heat equation
- 23 Factoring general block tridiagonal matrices
- 33 Solving a system of linear equations with an LU-factored block tridiagonal coefficient matrix
- 39 Factoring block tridiagonal symmetric positive definite matrices
- 43 Solving a system of linear equations with a block tridiagonal symmetric positive definite coefficient matrix
- 45 Computing principal angles between two subspaces
- 49 Computing principal angles between invariant subspaces of block triangular matrices
- 53 Evaluating a Fourier integral
- 55 Using Fast Fourier Transforms for computer tomography image reconstruction
- 59 Noise filtering in financial market data streams
- 65 Using the Monte Carlo method for simulating European options pricing
- 71 Using the Black-Scholes formula for European options pricing
- 77 Multiple simple random sampling without replacement
- 81 Bibliography