Thursday, August 29, 2013

wki.pe - Official Announcement







I finally have reached a point on Wki.pe that I feel comfortable announcing it to the world.


For those of you who don't know, for over a year (I think it has been closer to two), I've been tossing around the idea of creating a URL shortener for Wikipedia. The idea is that there are several social media platforms (Twitter, Facebook, Google+, SMS, etc.) where you cannot create hyperlinks with custom text. Instead, the platform recognizes URLs and formats them as links for you. Often (e.g. Twitter), you are limited by the number of characters you can have, so URL shorteners are popular. However, often these services give you a nonsense URL (e.g.bit.ly/1865N9v). If you want to link to Wikipedia, and you want it to be clear that that is where it is going, you can use Wki.pe. That way, you can use the URL inside your sentence, for example:
TIL that wki.pe/Aldous_Huxley 's brother a.wki.pe/julian was a pretty cool guy.
The link to the Wikipedia article on Aldous Huxley is simply the name of the article on Wikipedia. I didn't have to generate this on the site (although I still could), I just knew the name of the article. Also notice the second link has an "a." at the beginning. This is a link I generated on the site. I told Wki.pe I wanted a URL that said "julian" to point to the article on Julian Huxley. For the time being, I'm calling this aliasing (but I'm looking for a more clear name).

The service allows for multiple articles for the same alias (that's where the "a." part comes from). If you try to go to an article that doesn't exist, it will take you to the search results page. It also handles languages relatively well. By default, it will detect your language and send you to the right version of Wikipedia. You can also force a particular language by generating it on the site. For now only a handful of languages are supported, but soon I will add all language versions of Wikipedia (and there are hundreds).

The base functionality has been working for a while (this was my first foray into databases, well before I knew SQL). Over the last few weeks, I've really been working on the UI and learning some fancy Javascript tricks in the process. The site uses jQuery (and jQuery-UI), and uses aAjax calls to PHP pages. You can see the source code on GitHub if you are curious. There is a bug with IE<=9 I still need to fix and two more features I plan to add (and I'm sure there are some Unicode bugs to still be worked out), but I think it is ready to be used.

Please, use it! Give me the satisfaction that I made something useful. :)

Tuesday, August 13, 2013

Discrete potentiometer

My car music player is finally coming along to the point where it looks like I might actually finish it. I will do a post about the software soon, and probably a more general post on the hardware later, but right now I want to talk about a knob I will be using on the device.

Your car stereo probably has a volume knob that spins all the way around. You can keep twisting it in either direction as much as you want, and the software will determine when you've topped- or bottomed-out. You probably didn't think too much about why this was the case, but there are some nice features that can be implemented thanks to it.

For one, you can have multiple volume controls. Often there are volume buttons on your steering wheel. A knob with set start and end positions means it is tied to a particular state, but changing the software state somewhere else doesn't change the physical state of the knob. Some car stereos (I know mine does, and I bet most do now) will set the volume to 0 when the car starts, and ramp it up gradually. That way, if a friend borrowed your car and cranked the volume up really high, you won't get a nasty surprise when you get in.

While most knobs are potentiometers with a continuous resistance from 0 to X, knobs like this have discrete states (they 'click' in small increments). To make one of these, I bought a cheap rotary switch. This is a SP12T (single pole, 12-throw) switch, so one input can be connected to any one of 12 outputs. Some rotary switches will let you twist all the way around forever in either direction, but this one did not, so I had to pry the case open and carefully cut out a little piece of plastic that was blocking the gap between positions 1 and 12. This particular switch had a small ball bearing and a spring, both of which I almost lost and I struggled for a few minutes to put everything back together (fair warning).

Now that the switch can spin freely, we need it to be able to turn a setting up or down. To know if the knob is spinning clockwise or counterclockwise (or not at all), we can simply compare the current state of the switch to the previous one. There are a few ways to read the switch state. Connecting every output to its own GPIO pin on your microprocessor would take up 12 pins which is a very inefficient use of your pins. One option would be to connect each pin to the first 12 inputs of a 16-to-4 encoder, which would encode your signal into a 4 bit number, taking up 4 GPIO pins. However, there is still a better way to do this.


By soldering resistors of equal magnitude between each of the positions, we've created what is essentially a discrete potentiometer. In this case, I used 100 ohm resistors, so this is a 11*100 = 1.1K ohm potentiometer (the first state is 0 for 0 ohms, so the last state is 11 for 1100 ohms), but one which can only take values 0, 100, 200, 300... Now we can connect it to an analog pin, and save our digital pins for other things.

There is still one issue here, however. The analogRead() function is going to give integer values from 0 to 1023. That means our values are going to be 0, 85, 171, 256, 341, 427, 512, 597, 683, 768, 853, 939. But these aren't exact. These resistors have a margin of error around 2% (for most resistors, it is around 5%, but I bought more accurate ones for this situation). The supply voltage can fluctuate as well. That means we need to allow for ranges, so maybe 0-42 means position 0, 85-128 means position 1, and so on.

A easy way to do this would be:
   round(12*analogRead(0)/1024);
However, this is a very inefficient command for a microprocessor. First it multiplies two integers, then it divides two integers, creating a float (microprocessors like the Ardunio are very slow at doing floating point math, particularly division). Then it converts that back to an integer. This one line translates to 4 logical steps, which translate into a whole lot of steps for the processor.

The way I implemented this was not nearly as slick, but it runs about 10 times faster. If we look at our predicted analogRead() values in binary:
0000000000
0001010101
0010101011
0100000000
0101010101
0110101011
1000000000
1001010101
1010101011
1100000000
1101010101
1110101011
Notice the first four bits (highlighted in red) are unique for each value. That means that we can get a good idea which state we are in just looking at the 4 most significant bits. If you think about it in decimal: let's say you were going to get numbers between 0 and 1000, and you wanted to quickly tell if that number fell in the first 250 number bin (0 to 249), the second (250 to 499), and so on. You don't really care what the number in the one's place is in this situation. Just by looking at the hundred's and ten's places, you can tell which bin it's in (26X is in the 2nd bin, and 8XX is in the 4th). So back to the binary above, if we trash all the lower-significance bits (which might vary slightly), we can still get the right answer.

There is one little bit of weirdness here. Four bits means 16 possible values, 0000 to 1111. But we only have 12 positions we care about. Look at the binary version of the predicted value of position 3 (0100000000 or 256). If analogRead() returned something a little high, like 275, we would be okay, since the first four bits are still 0100. But what if it returned something a little low, like 245 (binary 0011110101)? This has a smaller value for its four most significant bits. However, notice also that we skipped from 0010 for position 2 to 0100 for position 3 (right over 0011). This is because 1024 is not divisible by 12. In hindsight, getting a 8- or 16- position rotary switch would have been a better plan. If we give both 0011 and 0100 to position 3, we've solved this problem. We have to do this as well for positions 7 and 9 (thanks to the fact that gcd(12,1024)=4). The shortest margin of error that would return a wrong value is then 21 (for example, if the value for position 2 was high by 21-- 192, it would be read as position 3). That is a hair over a 2% margin of error. 

Below is the code I used for this. It is not nearly as concise or elegant as the single line from before, but it is faster by an order of magnitude, as bitwise shifts are computationally cheap. If you have a better way to do this, I would love to hear your suggestion.

   int c = analogRead(0)>>6;
   switch(c){
     case 0:
       return 0;
       break;
     case 1:
       return 1;
       break;
     case 2:
     case 3:
       return 2;
       break;
     case 4:
       return 3;
       break;
     case 5:
     case 6:
       return 4;
       break;
     case 7:
       return 5;
       break;
     case 8:
     case 9:
       return 6;
       break;
     case 10:
       return 7;
       break;
     case 11:
     case 12:
       return 8;
       break;
     case 13:
       return 9;
       break;
     case 14:
       return 10;
       break;
     case 15:
       return 11;
       break;
   }

Sunday, May 12, 2013

last.fm API: genre influences

I've been talking about writing something using the last.fm API, and I finally have. Nothing to exciting here, just testing the water. This little script (using pylast) gets the top tags from your top artists, and scores them, giving you a percentage for each tag. It is pretty shitty code; this was a one-off thing, and the way it (I) can't make up it's (my) mind whether to use lists, dictionaries, or tuples is pretty embarrassing. Be sure to get an API key from last.fm first.



#!/usr/bin/python
import pylast, math, operator, sys

 ###############################################
# This is a simple script to play with pylast   #
# and the last.fm API. It goes through your     #
# library and gets the top tags for your top    #
# artists, and weights them based on playcounts,#
# giving you a percentage for different tags.   #
# Be sure to install pylast!                    #
#      http://code.google.com/p/pylast/         #
#################################################
#   Charles Knight - charles@rabidaudio.com     #
 ###############################################

usage = "python genre_influences.py username artist_limit tag_limit [log_plays]"

if len(sys.argv) < 4:
 print usage
 sys.exit()

username = sys.argv[1]
artist_limit=int(sys.argv[2]) #How many artist's tags to use (e.g. top 50)
tag_limit=int(sys.argv[3])  #How many tags for each artist
if len(sys.argv) > 4:
 log_plays=int(sys.argv[4])
else:
 log_plays = 0  #1 to take the natural logarithm for playcounts. This smooths out your results by
    # giving more weight to artists with fewer listens (0 weighs by normal playcount)

# You have to have your own unique two values for API_KEY and API_SECRET
# Obtain yours from http://www.last.fm/api/account for Last.fm
API_KEY = "YOURAPIKEY"
API_SECRET = "YOURAPISECRET"

#This is an exclude list for tags. Here are some of the shitty ones I got. reducing tag_limit might help
bad_tags = ["female vocalists", "singer-songwriter", "rock opera", "80s", "90s", "political", "epic", "canadian", "megaman", "female vocalist", "eminem", "60s", "female fronted metal", "bass", "christian", "british"] 

all_tags = {}


# In order to perform a write operation you need to authenticate yourself
network = pylast.get_lastfm_network(api_key = API_KEY, api_secret = API_SECRET)

mylibrary = pylast.Library(user = username, network = network)


artists = mylibrary.get_artists(limit=artist_limit)

for a in artists:
 artist = a.item
 playcount = a.playcount
 if log_plays:
  playcount = math.log(playcount)
 name = artist.get_name()
 top_tags = artist.get_top_tags(limit=tag_limit)
 weights = []
 tags = []
 for t in top_tags:
  tt = t.item.get_name().lower()
  if tt not in bad_tags:
   tags.append(tt)
   weight = float(t.weight)
   weights.append(weight)
 sw = sum(weights)
 for i in range(len(weights)):
  weights[i]=weights[i] / sw
  tag = tags[i]
  if tag in all_tags:
   all_tags[tag] += weights[i]*playcount
  else:
   all_tags[tag] = weights[i]*playcount

#This black magic came from StackOverflow. Sorts a dictionary by value into tuples, no idea how. Requres 'operator'
#http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value#613218
scores=sorted(all_tags.iteritems(), key=operator.itemgetter(1))

scores.reverse()
ss = sum(s[1] for s in scores)
for t,s in scores:
 print '%-22s ==> %5s' % (t, str(round(s/ss,3)*100)+"%")

Tuesday, April 30, 2013

DSP + Arduino

Amanda Ghassaei, who has this amazing project I just found out about, also has a great Instructable for doing DSP with Arduino. A few hours and one blown op-amp later, I was able to get my Arduino to pickup my bass playing. It samples around 38.5 kHz, but I only had it send serial data back every 10 ms (a measly 100 Hz). Still, I was able to generate this graph with the data. You can see the clipping where I cranked it up to test the peak indicator LED.

The atmega328 has 32Kb flash memory. If I used half of that for a delay line, 16,000 samples / 38500 samples/sec = .42 second delay. I could get a longer delay by using external flash (e.g. SD card), but I worry that the read/write time would be low. It will be interesting to apply some of the concepts from class to actual signals. What does a 50-point averager do to my tone? (answer: probably reverb)

There's not a lot I can do with this now, since I can't hear the results of any processing yet. I just ordered a DAC0808 8-bit digital-analog converter. (The new DUE has two built-in DAC outputs, as well as PWM, although it still isn't an ideal platform for DSP). Eventually, I would like to get an FFT algorithm running and make a tuner. That will probably have to wait until after finals, though.

EDIT: Just found out about wki.pe/Resistor_ladder s.

Monday, April 1, 2013

Location-enabled mesh networking protocol

Think of a peer-to-peer network. Not in an application-level sense, like torrents or Bitcoin, but on a physical/data-link level. Rather than connecting to a router, your device connects directly to a handful of other nearby devices, creating a web of communication. Mesh network topologies are not a new concept. Problems such as high latency and potentially different hardware have limited usage to a few novel solutions. But these protocols lack a powerful tool that is quickly coming ubiquitous as we approach the Internet of Things, and that is location data.

Devices that require network connectivity are increasingly mobile. It is expected that networked mobile devices will include compass/gyroscope/accelerometer/GPS hardware, which in combination can give the location and heading information (position, velocity, and acceleration) of those devices. In an ad-hoc mesh network, the number of connections each node can make is limited, and creating new connections is expensive. Using this information, devices could make smart decisions about what connections to make. If a node is moving quickly away from you, the value of making a connection to it is low, as it will be out of range in a short amount of time.

A mesh network should be a preferred topology for IoT. It is cheap to implement, free of authority, and mirrors social interaction. Take, for example, a system of smart cars. To avoid collisions, the smart cars must be able to communicate with those nearby. Using our current system, each device would require a network connection such as LTE. The communications would pass up to a nearby router/server and back down to the nearby vehicles. This method has three flaws. First, it is inefficient. Direct connections to nearby vehicles would increase the rate of data transfer, and even connections that require a low number of hops would about equal the transfer rate of a router system. It is also expensive. Having high-speed, reliable dedicated routers capable of covering the entire road system of a country is quite a demand, and installing and supporting this expensive hardware would be even more burdensome. The slow and uneven deployment of broadband in America shows the extreme cost and difficulty of this. Finally, they are insecure, in that they offer a single point of failure. If an attacker were to disable the router, communication between the vehicles would be lost, with potentially catastrophic results. A distributed network lacks authority  making it resilient against attacks. The network is able to quickly adapt to the loss of a single node. 

A distributed network like this is not without its security problems. For example, a person sending false positioning data could reduce or possibly even disable the network by skewing the calculations of optimal connections. This is analogous to (but distinct from) the problem of, in the car example, someone standing on the sidewalk, claiming to be a car about to fly across the street, potentially causing a collision between the real cars as they react to this false threat. This is part of the security trade-off of authority-less networks, and is ultimately a problem to be solved in the field of network security. This does not diminish the value of a location-enabled mesh networking protocol.

Edit:
Turns out much of this already exists. wki.pe/Intelligent_Vehicular_AdHoc_Network specifically mentions wki.pe/Position-based_routing . Other cool stuff I found:
http://wirelessafrica.meraka.org.za/wiki/index.php/DIY_Mesh_Guide
wki.pe/Netsukuku
wki.pe/Babel_(protocol)

Friday, February 15, 2013

Circular buffer algorithm rates on Arduino

For my chorded keyboard, I need a mode switch function. These modes increment and decrement in a loop. For example, if there are four modes (0, 1, 2, and 3), and I want to go to the next mode, 3 should loop back around to 0. I'm sure there is a name for these systems, but I'm not sure what it is. Apparently, they are called wki.pe/Circular_buffers.

This is an easy problem, and there are many ways to solve it (some more elegant than others). As I thought about all the ways to do this, I wondered which was the fastest. So I worked up 5 algorithms of varying elegance and methods and ran them against each other.

F1 increments, and then moves 4 to 0.
byte f1(byte a){
  a++;
  if(a>=4){
    a=0;
  }
  return a;
}

F2 uses modulo:
byte f2(byte a){
  a++;
  return a % 4;
}

F3 is exactly the same as F2. I was curious wither it would be faster or slower to do it in one line. It is certainly more elegant.
byte f3(byte a){
  return ++a % 4;
}

F4 is a case-switch:
byte f4(byte a){
  switch(a){
    case 0:
      return 1;
      break;
    case 1:
      return 2;
      break;
    case 2:
      return 3;
      break;
    case 3:
      return 0;
      break;
  }
}

F5 is if-esleif-else:
byte f5(byte a){
  if(a==0){
    return 1;
  }else if(a==1){
    return 2;
  }else if(a==2){
    return 3;
  }else{
    return 0;
  }
}

I suspected F4 and F5 to be very slow, as they are a lot of lines. I figured modulo is probably a longer function, so my hunch was F1 would win in terms of speed, modulo's elegance aside. The speed of the algorithms would depend on the number going in: F1(0) should be faster than F1(3). Each algorithm is run for each of {0,1,2,3} 100 times, and the microseconds to do that are returned. I repeated this for all 5 algorithms 10 times. Here is the test code:

byte a;
byte b;
byte c;
unsigned long starts;
unsigned long ends;
unsigned long delta;

void setup() {
  Serial.begin(9600);
  delay(2000);
}

void loop() {
  //f1
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f1(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);

  //f2
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f2(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
  
  //f3
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f3(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
  
  //f4
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f4(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
  
  //f5
  for(a=0; a<=3; a++){
    starts = micros();
    for(b=0; b<=100; b++){
      c=f5(b);
    }
    ends = micros();
    delta = ends - starts;
    Serial.print(delta);
    Serial.print("\t");
  }
  Serial.println(");
  delay(2000);
}

The full data is available here. The average time in microseconds for each algorithm:

F1 F2 F3 F4 F5
0.662 0.539 0.539 1.101 0.979

It is worth noting that the modulus function is faster than any of the functions with conditionals, including the first. It is also interesting that there is really no difference between incrementing on a separate line vs. on the same line (this is probably because they compile to the same thing). Here is the chart of the results. Each group is for the inputs {0, 1, 2, 3}. There was no real change across inputs between each algorithm, which I thought was surprising. 

This is a case where the most elegant solution is also the fastest.