Thursday, May 29, 2014

Source code for the data cost simulator

Over Golden Week, on several long flights with AC power, I got bored and wrote a script for determining which data plan is most likely to cost the least over time. Now, I've released the source code as a MATLAB function (that also works in GNU Octave – with some caveats). It uses data consumption information you supply to consider the probability of future usage, then calculates the corresponding total cost based on different plans and overage costs.

You can find it here on github.

Purpose

It is most useful when there are multiple people or devices sharing a single data plan. For example consider a family of four or five people who on average use a combined 17 GB of data each month. From NTT Docomo, 15 GB of data costs ¥12,500 per month, 20 GB costs ¥16,000, and each additional 1 GB costs an extra ¥1,000. If the total usage was exactly 17 GB each month, then the 15 GB plan would cost less:

¥12,500 + ¥2,000 =¥14,500/month for 17 GB versus ¥16,000 for the 20 GB plan. That would save ¥36,000 over a 2-year contract.

However, if total usage is highly variable from month to month, the decision on which is best becomes more complicated. The purpose of this function is to give you a better idea of how variable usage can affect the total cost.

Meaning of the results

In many cases, like I demonstrated at the bottom of this post, there may be no significant difference between all the options. In the linked case, which is based on my actual usage, I'm only looking at a difference of about ¥80 each month between the most expensive and least expensive options.

Even though there was no clear "winner", this is still useful information. I don't have to worry that I've made some drastic mistake that's going to cost a whole lot of extra money, and I am free to decide based on other, less tangible considerations. I could decide that I don't want to worry about having to frequently add additional GBs to my account and just decide to go with the larger quota. Or, I could decide (and this is what I actually did) that the option including a completely contract free, prepaid b-mobile FAIR SIM was best, even though there is a high chance that it would cost slightly more over two years. (And by slightly more I mean slightly.)

What it does

This is a MATLAB function that uses Monte Carlo methods to determine the most probable outcome and the uncertainty around that outcome. By default, it assumes all usage data are normally distributed (a "bell curve"), but it also allows you to specify different distributions that skew the probability towards either high or low usage. You can also specify that usage is "bimodal", meaning that there's a high probability of using very little data, and a high probability of using a lot of data, but very little chance of something in the middle. (This case would occur when someone often spends the better part of a billing cycle outside of Japan.)

The function then fits a probability density function to your data using the specified distribution and randomly draws monthly total data consumption values for the period of time specified (default is 24 months, the typical mobile contract length). This is repeated one thousand times to converge on the most probably outcome with an idea of the degree to which extreme cases are likely to occur. It next calculates the cost based on supplied quota, overage, and additional SIM card pricing. There is a built-in assumption (that can be changed) that if the quota would be exceeded with only four days remaining in the billing cycle, that no additional data are purchased and everyone suffers through at 128 kbps.

The defaults are for NTT Docomo and depend on the number of devices/people being considered: 2 devices assumes a phone and data-only tablet; 3 or more assumes all phones. Required, optional and default inputs, as well as GNU Octave compatibility, are described on the wiki page.

Example

This is "example 1" included with the function. It considers a family of four people sharing a single data plan. The first person (mother) uses a moderate amount of data but tends toward lower amounts with the occasionally high-usage month. The next person (son who downloads who knows what over mobile) uses massive amounts of data, often in the double digits, and the usage is fairly regular each month (normal). The third person (daughter) is the opposite of the first, typically using a lot of data but on occasion uses very little. The final person (father) has months where they hardly use any data (overseas business), and months were he uses a lot, with little in between.

Past data usage for each member of the family of four.

Considering the combined average usage yields a mean of 17.4 GB per month, with a reasonable potential of having usage as low as 11.3 GB and as high as 23.4 GB (1 standard deviation range). 17.4 GB would cost ¥15,500 with the 15 GB Docomo plan. The 20 GB plan would cost ¥500 more. (The median usage is actually just under 17 GB and would yield a ¥1,500 monthly savings.)

You can probably guess that the 10 GB plan won't do, and you would think about either the 15 GB or the 20 GB plan. The 15 GB plan at first seems like it could save a lot of money over time based on the average usage, but how many months will that extreme value of 23 GB occur? Let's run the simulation to get an idea.
Cumulative cost with 1-standard deviation range shown for each data plan (10, 15, 20, and 30 GB).

And we see that the 15 GB plan may not be the best after all. The interplay between all the different usage patterns causes high consumption enough times over the next two years to make the 20 GB plan clearly less expensive (though not by a huge amount). In this particular result, the 20 GB plan was least expensive in 99.6% of the simulations. Based on this, and keeping in mind this is only as good as the assumption that past usage will reflect future usage, I would be virtually certain that this family would spend the least amount of money on the larger 20 GB plan.

No comments:

Post a Comment