Friday, August 17, 2018

The Internet of 200 Kilogram Things: Challenges of Managing a Fleet of Slot Machines

In a previous post we talked about Finland's Linux powered slot machines. It was mentioned that there are about 20 000 of these machines in total. It turns out that managing and maintaining all those machines is a not as easy as it may first appear.

In the modern time of The Cloud, 20 thousand machines might not seem like much. Basic cloud management software such as Kubernetes scales to hundreds of thousands, even millions of machines without even breaking a sweat. Having "only" 20 thousand machines may seem like a small and simple thing that can be managed by one intern in their spare time. In reality things get difficult as there are many unique challenges to managing slot machines as opposed to regular servers.

The data center

Large scale computer fleets are housed in data centers. Slot machines are not. They are scattered across Finland in supermarkets and gas stations. This means that any management solution based on central control is useless. Another way of looking at this is that the data center housing the machines is around 337 thousand square kilometers in size. It is left as an exercise to the reader to calculate the average distance between two nearest machines assuming they are distributed evenly over the surface area.

Every machine is needed

The mantra of current data center design is that every machine must be expendable. That is, any computer may break down at any time, but the end user does not notice this because all operations are hidden behind a reliable layer. Workloads can be transferred from one machine to another either in the same rack, or possibly even to the other side of the world without anyone noticing.

Slot machines have the exact opposite requirements. Every machine must keep working all the time. If any machine breaks down, money is lost. Transferring the work load from a broken machine in the countryside to Frankfurt or Washington is not feasible, because it would require also moving the players to the new location. This is not very profitable, as atoms are much more expensive and slow to transfer between continents than electrons.

The reliability requirements are further increased by the distributed locations of the machines. It is not uncommon that in the sparsely populated areas the closest maintenance person may be more than 400 km away.

The Internet connection

Data centers nowadays have 10 Gb Ethernet connections or something even faster. In contrast it is the responsibility of the machine operator to provide a net connection to a slot machine. This means that the connections vary quite a lot. At the lowest end are locations that get poor quality 3G reception some of the time.

Remote management is also an issue. Some machines are housed in corporate networks behind ten different firewalls all administered by different IT provider organisations, some of which may be outsourced. Others are slightly less well protected but flakier. Being able to directly access any machine is the norm in data centers. Devices housed in random networks do not have this luxury.

The money problem

Slot machines deal with physical money. That makes them a prime target for criminals. The devices also have no physical security: you must be able to physically touch them to be able to play them. This is a challenging and unusual combination from a security point of view. Most companies would not leave their production servers outside for people to fiddle around with, but for these devices it is a mandatory requirement.

The beer attack

Many machines are located in bars. That means that they need to withstand the forces of angry intoxicated players. And, as we all know, drunk people are surprisingly inventive. A few years ago some people noticed that the machines have ventilation holes. They then noticed that pouring a pint of beer in those holes would cause a short circuit inside the machine causing all the coins to be spit out.

This issue was fixed fairly quickly, because you really don't want to be in a situation where drunk people would have financial motivation to pour liquids on high voltage equipment in crowded rooms. This is not a problem one has to face in most data centers.

Update challenges

There are roughly two different ways of updating an operating system install: image based updates and package based updates. Neither of these works particularly well in slot machine usage. Games are big, so downloading full images is not feasible, especially for machines that have poor network connections. Package based updates have the major downside that they are not atomic. In desktop and server usage this is not really an issue because you can apply updates at a known good time. For remote devices this does not work because they can be powered off at any time without any warning. If this happens during an upgrade you have a broken machine requiring a physical visit from a maintenance person. As mentioned above this is slow and expensive.

No comments:

Post a Comment