Inspired by @drob’s example:
https://twitter.com/drob/status/1008409373423611904
and the subsequent WTF in my brain, and seeing @gymbrane do a python version:
https://twitter.com/gymbrane/status/1008939111925809153
I was inspired to try the same thing with Rcpp.



If you’re asking “why would you bother?” you don’t belong here, goodnight.

What we’re doing here is comparing the number of coin flips it takes to encounter different patterns. Counterintuitively, the number of flips it takes to encounter a HEAD then TAIL is around 4, while for a HEAD then HEAD it’s about 6.



Define the C++ function to run the simulations.

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double howmanyflips( LogicalVector heads_pattern, int reps ) {
    
    int flips;
    bool current, done;
    double results_sum = 0;
    
    // repeat for the number of required repetitions
    for( int i = 0; i < reps; i++ ) {
        
        // start counting flips
        flips = 0;
        
        // keep going until we find the right pattern
        done = false;
        while( done == false ) {
            
            // flip the coin
            flips ++;
            current = R::rbinom(1,0.5);
            
            // if we got the first value we're looking for, try for the second
            while( current == heads_pattern[0] & done == false ) {
                
                // flip the coin
                flips++;
                current = R::rbinom(1,0.5);
                
                // if we got the second value we're looking for, we're done
                if( current == heads_pattern[1] ) {
                    done = true;
                };
            };
        };
        
        // add the current result to the sum
        results_sum += flips;
    };
    
    // calculate the average of all results
    return results_sum / reps;
    
};



Bring in @drob’s tidyverse version as well for comparison

library( tidyverse )
tidyverse_version <- function( reps ) {
    crossing(trial = 1:reps,
             flip = 1:100) %>%
        mutate( heads = rbinom(n(), 1, 0.5)) %>%
        group_by(trial) %>%
        mutate(next_flip = lead(heads),
               hh = heads & next_flip,
               ht = heads & !next_flip) %>%
        summarize(first_hh = which(hh)[1] + 1,
                  first_ht = which(ht)[1] + 1) %>%
        summarize(first_hh = mean(first_hh),
                  first_ht = mean(first_ht))
}



Compare the results.

replicates <- 1E5

c( hh = howmanyflips( heads_pattern = c(T,T), reps = replicates ),
   ht = howmanyflips( heads_pattern = c(T,F), reps = replicates ) )
     hh      ht 
6.01273 3.99848 
tidyverse_version( reps = replicates )



So the results are similar. Like I said, WTF is this statistical black magic?

Anyway, as an aside, the C++ function should run way faster, and benchmarking is fun, so:

library( microbenchmark )
microbenchmark(
    
    cpp = c( howmanyflips( heads_pattern = c(T,T), reps = replicates ),
             howmanyflips( heads_pattern = c(T,F), reps = replicates ) ),
    
    tidyverse = tidyverse_version( reps = replicates ),
    
    times = 10
)
Unit: milliseconds
      expr        min         lq       mean     median         uq        max neval
       cpp   21.62147   21.68506   21.76179   21.78339   21.81704   21.87219    10
 tidyverse 7693.60846 7719.36256 7888.32718 7847.86873 7981.62736 8210.69254    10



So timing wise, the C++ version is about 350x faster than the tidyverse version. This is partly because of the general better overhead of running a simulation in C++ vs R. It’s also because the C++ version avoids most coin flips altogether by only simulating flips up until the required pattern is found, whereas the tidyverse version flips 100 coins for each simulation regardless of the outcome of the flips.



Discuss

Unfortunately, because of the above point, it’s also not much use running set.seed here to compare results. The different number of flips will change the sequence of the random number generator, and the results will be different regardless.

Worth noting is that @gymbrane’s python version stops when the required pattern is found AND is able to search for patterns longer than 2 results in sequence. Neither my nor @drobs methods can do that.







LS0tCnRpdGxlOiAiQ29pbiBmbGlwIHNpbXVsYXRpb24iCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCjxicj48YnI+Ckluc3BpcmVkIGJ5IEBkcm9iJ3MgZXhhbXBsZTo8YnI+CiAgICBodHRwczovL3R3aXR0ZXIuY29tL2Ryb2Ivc3RhdHVzLzEwMDg0MDkzNzM0MjM2MTE5MDQgPGJyPgphbmQgdGhlIHN1YnNlcXVlbnQgV1RGIGluIG15IGJyYWluLCBhbmQgc2VlaW5nIEBneW1icmFuZSBkbyBhIGBweXRob25gIHZlcnNpb246PGJyPgogICAgaHR0cHM6Ly90d2l0dGVyLmNvbS9neW1icmFuZS9zdGF0dXMvMTAwODkzOTExMTkyNTgwOTE1MyA8YnI+Ckkgd2FzIGluc3BpcmVkIHRvIHRyeSB0aGUgc2FtZSB0aGluZyB3aXRoIGBSY3BwYC4KCjxicj48YnI+CklmIHlvdSdyZSBhc2tpbmcgIndoeSB3b3VsZCB5b3UgYm90aGVyPyIgeW91IGRvbid0IGJlbG9uZyBoZXJlLCBnb29kbmlnaHQuCgpXaGF0IHdlJ3JlIGRvaW5nIGhlcmUgaXMgY29tcGFyaW5nIHRoZSBudW1iZXIgb2YgY29pbiBmbGlwcyBpdCB0YWtlcyB0byBlbmNvdW50ZXIgZGlmZmVyZW50IHBhdHRlcm5zLiBDb3VudGVyaW50dWl0aXZlbHksIHRoZSBudW1iZXIgb2YgZmxpcHMgaXQgdGFrZXMgdG8gZW5jb3VudGVyIGEgSEVBRCB0aGVuIFRBSUwgaXMgYXJvdW5kIDQsIHdoaWxlIGZvciBhIEhFQUQgdGhlbiBIRUFEIGl0J3MgYWJvdXQgNi4KCjxicj48YnI+CgoKRGVmaW5lIHRoZSBgQysrYCBmdW5jdGlvbiB0byBydW4gdGhlIHNpbXVsYXRpb25zLgpgYGB7UmNwcH0KI2luY2x1ZGUgPFJjcHAuaD4KdXNpbmcgbmFtZXNwYWNlIFJjcHA7CgovLyBbW1JjcHA6OmV4cG9ydF1dCmRvdWJsZSBob3dtYW55ZmxpcHMoIExvZ2ljYWxWZWN0b3IgaGVhZHNfcGF0dGVybiwgaW50IHJlcHMgKSB7CiAgICAKICAgIGludCBmbGlwczsKICAgIGJvb2wgY3VycmVudCwgZG9uZTsKICAgIGRvdWJsZSByZXN1bHRzX3N1bSA9IDA7CiAgICAKICAgIC8vIHJlcGVhdCBmb3IgdGhlIG51bWJlciBvZiByZXF1aXJlZCByZXBldGl0aW9ucwogICAgZm9yKCBpbnQgaSA9IDA7IGkgPCByZXBzOyBpKysgKSB7CiAgICAgICAgCiAgICAgICAgLy8gc3RhcnQgY291bnRpbmcgZmxpcHMKICAgICAgICBmbGlwcyA9IDA7CiAgICAgICAgCiAgICAgICAgLy8ga2VlcCBnb2luZyB1bnRpbCB3ZSBmaW5kIHRoZSByaWdodCBwYXR0ZXJuCiAgICAgICAgZG9uZSA9IGZhbHNlOwogICAgICAgIHdoaWxlKCBkb25lID09IGZhbHNlICkgewogICAgICAgICAgICAKICAgICAgICAgICAgLy8gZmxpcCB0aGUgY29pbgogICAgICAgICAgICBmbGlwcyArKzsKICAgICAgICAgICAgY3VycmVudCA9IFI6OnJiaW5vbSgxLDAuNSk7CiAgICAgICAgICAgIAogICAgICAgICAgICAvLyBpZiB3ZSBnb3QgdGhlIGZpcnN0IHZhbHVlIHdlJ3JlIGxvb2tpbmcgZm9yLCB0cnkgZm9yIHRoZSBzZWNvbmQKICAgICAgICAgICAgd2hpbGUoIGN1cnJlbnQgPT0gaGVhZHNfcGF0dGVyblswXSAmIGRvbmUgPT0gZmFsc2UgKSB7CiAgICAgICAgICAgICAgICAKICAgICAgICAgICAgICAgIC8vIGZsaXAgdGhlIGNvaW4KICAgICAgICAgICAgICAgIGZsaXBzKys7CiAgICAgICAgICAgICAgICBjdXJyZW50ID0gUjo6cmJpbm9tKDEsMC41KTsKICAgICAgICAgICAgICAgIAogICAgICAgICAgICAgICAgLy8gaWYgd2UgZ290IHRoZSBzZWNvbmQgdmFsdWUgd2UncmUgbG9va2luZyBmb3IsIHdlJ3JlIGRvbmUKICAgICAgICAgICAgICAgIGlmKCBjdXJyZW50ID09IGhlYWRzX3BhdHRlcm5bMV0gKSB7CiAgICAgICAgICAgICAgICAgICAgZG9uZSA9IHRydWU7CiAgICAgICAgICAgICAgICB9OwogICAgICAgICAgICB9OwogICAgICAgIH07CiAgICAgICAgCiAgICAgICAgLy8gYWRkIHRoZSBjdXJyZW50IHJlc3VsdCB0byB0aGUgc3VtCiAgICAgICAgcmVzdWx0c19zdW0gKz0gZmxpcHM7CiAgICB9OwogICAgCiAgICAvLyBjYWxjdWxhdGUgdGhlIGF2ZXJhZ2Ugb2YgYWxsIHJlc3VsdHMKICAgIHJldHVybiByZXN1bHRzX3N1bSAvIHJlcHM7CiAgICAKfTsKYGBgCgo8YnI+PGJyPgpCcmluZyBpbiBAZHJvYidzIGB0aWR5dmVyc2VgIHZlcnNpb24gYXMgd2VsbCBmb3IgY29tcGFyaXNvbgoKYGBge3IgbWVzc2FnZT1GQUxTRX0KbGlicmFyeSggdGlkeXZlcnNlICkKdGlkeXZlcnNlX3ZlcnNpb24gPC0gZnVuY3Rpb24oIHJlcHMgKSB7CiAgICBjcm9zc2luZyh0cmlhbCA9IDE6cmVwcywKICAgICAgICAgICAgIGZsaXAgPSAxOjEwMCkgJT4lCiAgICAgICAgbXV0YXRlKCBoZWFkcyA9IHJiaW5vbShuKCksIDEsIDAuNSkpICU+JQogICAgICAgIGdyb3VwX2J5KHRyaWFsKSAlPiUKICAgICAgICBtdXRhdGUobmV4dF9mbGlwID0gbGVhZChoZWFkcyksCiAgICAgICAgICAgICAgIGhoID0gaGVhZHMgJiBuZXh0X2ZsaXAsCiAgICAgICAgICAgICAgIGh0ID0gaGVhZHMgJiAhbmV4dF9mbGlwKSAlPiUKICAgICAgICBzdW1tYXJpemUoZmlyc3RfaGggPSB3aGljaChoaClbMV0gKyAxLAogICAgICAgICAgICAgICAgICBmaXJzdF9odCA9IHdoaWNoKGh0KVsxXSArIDEpICU+JQogICAgICAgIHN1bW1hcml6ZShmaXJzdF9oaCA9IG1lYW4oZmlyc3RfaGgpLAogICAgICAgICAgICAgICAgICBmaXJzdF9odCA9IG1lYW4oZmlyc3RfaHQpKQp9CmBgYAoKPGJyPjxicj4KCiMjIyMgQ29tcGFyZSB0aGUgcmVzdWx0cy4gIyMjIwoKYGBge3J9CnJlcGxpY2F0ZXMgPC0gMUU1CgpjKCBoaCA9IGhvd21hbnlmbGlwcyggaGVhZHNfcGF0dGVybiA9IGMoVCxUKSwgcmVwcyA9IHJlcGxpY2F0ZXMgKSwKICAgaHQgPSBob3dtYW55ZmxpcHMoIGhlYWRzX3BhdHRlcm4gPSBjKFQsRiksIHJlcHMgPSByZXBsaWNhdGVzICkgKQpgYGAKCmBgYHtyfQp0aWR5dmVyc2VfdmVyc2lvbiggcmVwcyA9IHJlcGxpY2F0ZXMgKQpgYGAKCjxicj48YnI+ClNvIHRoZSByZXN1bHRzIGFyZSBzaW1pbGFyLiBMaWtlIEkgc2FpZCwgV1RGIGlzIHRoaXMgc3RhdGlzdGljYWwgYmxhY2sgbWFnaWM/CgpBbnl3YXksIGFzIGFuIGFzaWRlLCB0aGUgYEMrK2AgZnVuY3Rpb24gc2hvdWxkIHJ1biB3YXkgZmFzdGVyLCBhbmQgYmVuY2htYXJraW5nIGlzIGZ1biwgc286CmBgYHtyfQpsaWJyYXJ5KCBtaWNyb2JlbmNobWFyayApCm1pY3JvYmVuY2htYXJrKAogICAgCiAgICBjcHAgPSBjKCBob3dtYW55ZmxpcHMoIGhlYWRzX3BhdHRlcm4gPSBjKFQsVCksIHJlcHMgPSByZXBsaWNhdGVzICksCiAgICAgICAgICAgICBob3dtYW55ZmxpcHMoIGhlYWRzX3BhdHRlcm4gPSBjKFQsRiksIHJlcHMgPSByZXBsaWNhdGVzICkgKSwKICAgIAogICAgdGlkeXZlcnNlID0gdGlkeXZlcnNlX3ZlcnNpb24oIHJlcHMgPSByZXBsaWNhdGVzICksCiAgICAKICAgIHRpbWVzID0gMTAKKQpgYGAKCjxicj48YnI+CgpTbyB0aW1pbmcgd2lzZSwgdGhlIGBDKytgIHZlcnNpb24gaXMgYWJvdXQgMzUweCBmYXN0ZXIgdGhhbiB0aGUgYHRpZHl2ZXJzZWAgdmVyc2lvbi4gVGhpcyBpcyBwYXJ0bHkgYmVjYXVzZSBvZiB0aGUgZ2VuZXJhbCBiZXR0ZXIgb3ZlcmhlYWQgb2YgcnVubmluZyBhIHNpbXVsYXRpb24gaW4gYEMrK2AgdnMgUi4gSXQncyBhbHNvIGJlY2F1c2UgdGhlIGBDKytgIHZlcnNpb24gYXZvaWRzIG1vc3QgY29pbiBmbGlwcyBhbHRvZ2V0aGVyIGJ5IG9ubHkgc2ltdWxhdGluZyBmbGlwcyB1cCB1bnRpbCB0aGUgcmVxdWlyZWQgcGF0dGVybiBpcyBmb3VuZCwgd2hlcmVhcyB0aGUgYHRpZHl2ZXJzZWAgdmVyc2lvbiBmbGlwcyAxMDAgY29pbnMgZm9yIGVhY2ggc2ltdWxhdGlvbiByZWdhcmRsZXNzIG9mIHRoZSBvdXRjb21lIG9mIHRoZSBmbGlwcy4KCjxicj48YnI+CgojIyMgRGlzY3VzcyAjIyMKClVuZm9ydHVuYXRlbHksIGJlY2F1c2Ugb2YgdGhlIGFib3ZlIHBvaW50LCBpdCdzIGFsc28gbm90IG11Y2ggdXNlIHJ1bm5pbmcgYHNldC5zZWVkYCBoZXJlIHRvIGNvbXBhcmUgcmVzdWx0cy4gVGhlIGRpZmZlcmVudCBudW1iZXIgb2YgZmxpcHMgd2lsbCBjaGFuZ2UgdGhlIHNlcXVlbmNlIG9mIHRoZSByYW5kb20gbnVtYmVyIGdlbmVyYXRvciwgYW5kIHRoZSByZXN1bHRzIHdpbGwgYmUgZGlmZmVyZW50IHJlZ2FyZGxlc3MuCgpXb3J0aCBub3RpbmcgaXMgdGhhdCBAZ3ltYnJhbmUncyBweXRob24gdmVyc2lvbiBzdG9wcyB3aGVuIHRoZSByZXF1aXJlZCBwYXR0ZXJuIGlzIGZvdW5kIEFORCBpcyBhYmxlIHRvIHNlYXJjaCBmb3IgcGF0dGVybnMgbG9uZ2VyIHRoYW4gMiByZXN1bHRzIGluIHNlcXVlbmNlLiBOZWl0aGVyIG15IG5vciBAZHJvYnMgbWV0aG9kcyBjYW4gZG8gdGhhdC4KCjxicj48YnI+PGJyPjxicj48YnI+PGJyPgoK