Sunday, May 12, 2013

last.fm API: genre influences

I've been talking about writing something using the last.fm API, and I finally have. Nothing to exciting here, just testing the water. This little script (using pylast) gets the top tags from your top artists, and scores them, giving you a percentage for each tag. It is pretty shitty code; this was a one-off thing, and the way it (I) can't make up it's (my) mind whether to use lists, dictionaries, or tuples is pretty embarrassing. Be sure to get an API key from last.fm first.



#!/usr/bin/python
import pylast, math, operator, sys

 ###############################################
# This is a simple script to play with pylast   #
# and the last.fm API. It goes through your     #
# library and gets the top tags for your top    #
# artists, and weights them based on playcounts,#
# giving you a percentage for different tags.   #
# Be sure to install pylast!                    #
#      http://code.google.com/p/pylast/         #
#################################################
#   Charles Knight - charles@rabidaudio.com     #
 ###############################################

usage = "python genre_influences.py username artist_limit tag_limit [log_plays]"

if len(sys.argv) < 4:
 print usage
 sys.exit()

username = sys.argv[1]
artist_limit=int(sys.argv[2]) #How many artist's tags to use (e.g. top 50)
tag_limit=int(sys.argv[3])  #How many tags for each artist
if len(sys.argv) > 4:
 log_plays=int(sys.argv[4])
else:
 log_plays = 0  #1 to take the natural logarithm for playcounts. This smooths out your results by
    # giving more weight to artists with fewer listens (0 weighs by normal playcount)

# You have to have your own unique two values for API_KEY and API_SECRET
# Obtain yours from http://www.last.fm/api/account for Last.fm
API_KEY = "YOURAPIKEY"
API_SECRET = "YOURAPISECRET"

#This is an exclude list for tags. Here are some of the shitty ones I got. reducing tag_limit might help
bad_tags = ["female vocalists", "singer-songwriter", "rock opera", "80s", "90s", "political", "epic", "canadian", "megaman", "female vocalist", "eminem", "60s", "female fronted metal", "bass", "christian", "british"] 

all_tags = {}


# In order to perform a write operation you need to authenticate yourself
network = pylast.get_lastfm_network(api_key = API_KEY, api_secret = API_SECRET)

mylibrary = pylast.Library(user = username, network = network)


artists = mylibrary.get_artists(limit=artist_limit)

for a in artists:
 artist = a.item
 playcount = a.playcount
 if log_plays:
  playcount = math.log(playcount)
 name = artist.get_name()
 top_tags = artist.get_top_tags(limit=tag_limit)
 weights = []
 tags = []
 for t in top_tags:
  tt = t.item.get_name().lower()
  if tt not in bad_tags:
   tags.append(tt)
   weight = float(t.weight)
   weights.append(weight)
 sw = sum(weights)
 for i in range(len(weights)):
  weights[i]=weights[i] / sw
  tag = tags[i]
  if tag in all_tags:
   all_tags[tag] += weights[i]*playcount
  else:
   all_tags[tag] = weights[i]*playcount

#This black magic came from StackOverflow. Sorts a dictionary by value into tuples, no idea how. Requres 'operator'
#http://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value#613218
scores=sorted(all_tags.iteritems(), key=operator.itemgetter(1))

scores.reverse()
ss = sum(s[1] for s in scores)
for t,s in scores:
 print '%-22s ==> %5s' % (t, str(round(s/ss,3)*100)+"%")