Thursday, July 26, 2012

Not the Cloud, the Hive

The cloud is evil.
It is a model of the internet that is fast approaching, and it is completely counter to the idea of the internet. In a cloud-based approach, data is stored by third parties. Companies like Google and Facebook, with vast server space and bandwidth, store your important data so you can access it anywhere, and distribute it to others. But there is a serious problem with that model. Companies are trying to make money, and as the internet ad bubble comes steadily closer to bursting (Facebook IPO, anyone?), companies turn to business models that aught to make you uncomfortable. They hold onto your data not just for you. Data is now a commodity, and companies use it to target advertisements or sell it to marketers and other interested parties. Furthermore, ISP's and other companies are often willing and able to give out user data to snooping governments.
The internet runs on a protocol stack called TCP/IP. Simply put, it is organized in a tree-like structure, where  data goes higher up the tree until it reaches a common node with it's destination, and then trickles back down again. Consider for a moment just how many computers this page went through before it reached you. If you want to know exactly, open up a terminal window and type
tracert blog.rabidaudio.com 
 The data squiggles through Google's web of servers, through a couple of ISPs and then back down through your ISP before arriving at your machine. Furthermore, Your computer had to talk to at least 6 (and probably closer to 15) different servers just to find the server that had the page in the first place (and that is if you are in the US!). Consider further that the tracert command doesn't count any of the (likely hundreds of) routers, switches, proxies, etc. (which are all essentially servers themselves) that had to pass each request along the line. Each step gives corporations and governments more opportunity to collect and potentially misuse your data.
To realize the solution, we have to look at a little internet sociology. At first, the internet was a very small group of ultra-nerds sharing files and ideas on newsgroups and BBS'es. As the internet's popularity rose, we saw a number of different communication systems become popular: email, forums, chat rooms. Recently, the boom has been in social media platforms. Notice that these are all about communications between real people; data transferred between users. 
If Alice and Bob are in the same room and Alice wants to send a file to Bob, it goes from Alice's machine to the wireless router, which then has to find Bob before giving him the file. A generally faster method is for Alice and Bob to connect ad-hoc. Alice sends Bob the file directly, with no stops between. Add in another user, and all three can share a file freely. The one caveat is that each connection requires an individual wireless card. However, if each user has at least two connections, a stable mesh network can be built. Alice can pass a file through Bob to Chelsea. More importantly, however, is that if both Bob and Alice have a piece of data, it is easy for Chelsea to get a copy. This allows information to propagate by popularity, as it does in the real world.
Imagine a school with a mesh network running alongside traditional TCP/IP. Alice takes notes for a class, and can easily distribute them to her classmates. Bob read an article from The Huffington Post (which he got via TCP/IP), and now Chelsea doesn't need to connect to The Huffington Post (or any of the traditional internet at all) to read the article; she can get it from Bob.
Principles of sharing (which allows P2P to work) can lead to very fast data transfer that does not need to leave a local area. Every user can dedicate some space on their device to storing data from other users on the network. When they access data from another user, they have just increased the number of available copies of that data for other users. For files that might potentially be less popular, partial copies can be distributed across user's devices. Speed and network stability could be increased by adding nodes capable of connecting to several devices, avoiding network bottlenecks. They could be static like traditional wireless routers, or even be mobile (imagine a solar-powered UAV flying over campus, automatically maintaining network stability by creating new connections). Such systems are best suited for smaller, localized networks, but as wireless technology improves in range and speed, it is conceivable that such networks could replace the internet at large. If every car, train, cellphone, etc were a node, mesh networks could stretch cross-country. No need to pay for internet service.
There are a few major problems that need to be overcome. First, the more hops necessary, the greater the latency, which would need to be reduced if large networks are to exist. Second, security is an issue (as it always is). Storing other user's data on your device without your knowledge is potentially very dangerous. Several things can be done about this. For example, distributed data (such as incomplete files) make viruses in the form of binary files nearly impossible. A way to quarantine network storage from the host device would also improve security. Serious mathematics (across game theory, information theory, graph theory, and more) will be necessary to develop adaptive networks that can create new nodes and connections quickly, avoid congestion,  and optimize searching and locating other users. Protocol stacks that include all of this need to be written. Some are in progress, although they still have issues. Hardware that is easy and cheap to deploy while still being capable of multiple high-speed connections will need to be developed. 
The benefits of such a network are well worth the work in my opinion. It's decentralized, meaning reduced chance of surveillance by corporations and governments, as well as removing reliance on ISP's, which removes issues of network infrastructure, monopolies/duopolies, network neutrality, and more. The right implementations have the potential to be significantly faster than traditional TCP/IP (for the same reason that P2P is often the fastest way to distribute popular data). Finally, it is firmly rooted in the ideas of sharing and community, which is what the internet is all about.
If anyone, particularly in the Georgia Tech area, would like to help me develop a deployable mesh wireless network, let me know.