ReservoirSampler creates a streaming algorithm that can be used to obtain a random sample from a population that is too large to fit in memory. The samples can be made reproducible can be using `set.seed(...)` before initialising the streamer.

Implementation is based on doi:10.1145/198429.198435.

Format

An R6Class generator object

Methods


Method new()

Creates a new ReservoirSampler streamer object.

Usage

Arguments

k

the desired sample size

Returns

The new ReservoirSampler (invisibly)


Method update()

Update the ReservoirSampler streamer object.

Usage

ReservoirSampler$update(x)

Arguments

x

values to be added to the stream

Returns

The updated ReservoirSampler (invisibly)


Method clone()

The objects of this class are cloneable with this method.

Usage

ReservoirSampler$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

sampler <- ReservoirSampler$new(k = 10)
for (i in 1:100) {
    sampler$update(i)
}
length(sampler$value)  # random sample from 1:100 of size 10
#> [1] 10
#> [1] 10