Mechanical Nation : new album query
One of the features I want to have in the Mechanical Nation site is a list of new release albums.
I currently found Muspy which will email you (or use an RSS feed) album releases for the artists that you have asked it to track. This is a nice service, and to start I will use it (just have to enter a crapload of artists in the industrial genre for it tor track).
However, what I would like to eventually do is to write a something that will search by genre rather then artist. That way new artist will be listed (ones that I may not know about) and allow for discovery of new artists. However searching by genre is a little more complex.
I will start with looking at the last.fm plus plugin for Musicbrainz Picard.
That plugin searches the last.fm database for the artist queried and retrieves tags/genre info which is then populated into the mp3. I will need to first figure out how to query the last.fm database, and extract all artists that match that genre. Then that artist list can be plugged into the Muspy.com site to return album release dates.
UPDATE: 29 Nov 2010
Lastfm categorizes music based on tags (aka genres). I assume this is to facilitate the linking and playing music of similar artists. This tags database is quite vast (since lastfm users can add tags to music) and care will need to be taken to only use artist who’s main tags match the searched tag. Will need to develop some sort of cutoff (to define what ‘main tag’ means) so for example ‘Lady Gaga’ doesn’t get used as a band that may have EBM as a minor tag if someone tagged it as EBM.
Lastfm has an API for accessing it’s database. First thing is to write something that queries that database for certain predefined tags.
I will start with a short list of industrial genres/tags to feed into the lastfm API: Industrial, EBM, synthpop, Aggrotech, Ambient industrial, Dark ambient, Dark electro, EBM, Electro-Industrial, Power Noise. (add more)
At some point (in a future release) I could just list a selection of genres, and people can choose their preferred choice of genres to track.
Then take the list of returned artists, and query the MusicBrainz (I think the musicbrainz database is more current then Discogs) database for each artist’s albums. Display albums that have a release date from now to some point (one week?) in the future.
UPDATE: 01 Dec 2010
Got the lastfm api to work in PHP with the help of the ‘phpLaftFmAPI‘ framework.
Though there is a probelm. LasftFM’s api method for calling artists from tag info is tag.getTopArtists. The problem is that the method mentioned only returns 50 items. That’s kinda limiting since I first need to gather all artists with the pre-defined tags, and then query (on Musicbrainz or discogs) those artists for album release dates.
Hmmm….
Later the same evening…
Using the method tag.getWeeklyArtistChart will return as many items as I want (using the parameter ‘limit’). This may be a way to get the artist info I want. With that method, I can also narrow the search to a date range, which may be useful. Maybe make the date range 6 months ago till today. The idea being that if the artist released an album in the last 6 months, it will be in the charts at some location. This date range may actually be beneficial so I don’t search EVERY artist that has ever released an album under a certain genre. Just the artists that have been on the charts in the last xx months.
I don’t know if this is a good idea, but will test it.
Did some tests; manually took some of the returned Artists, and searched for them in the Musicbrainz database, and most have had an album released in 2010. So far, it’s promissing.
Next I have to figure out the Musicbrainz API to return whether an artist has an album released the current week (or in other time frame).
UPDATE : 10 Dec 2010
After being disappointed with the fact that MusicBrainz did not have the latest Alien Vampires album in their database (even after it’s release date) I gave Discogs (album database Edwin recommended from the beginning) a try and sure enough, it had that album.
So the next thing I did was to get an API key from Discogs. I found a sample piece of php code that queries the Discogs database.
Searching by genre or style is weird. Read this.
This brings the question (to my mind) whether I should be scrapping multiple databases to make sure no release dates are missed?
With that in mind, this is a list of other music related APIs out there.
UPDATE : 17 Dec 2010
Discogs releases monthly archives of their album and artist databases, in XML format. The December 2010 Album releases XML file is 4.2 gigs uncompressed. That’s substantial.
Find it HERE.
All I want to extract is all the artist names that have a certain Genre and Style tag. That list will become a MySQL database of artists.
To parse such a large file, I need to work on it piece by piece. Hopefully this link will help with a method to read the file incrementally as opposed to loading everything in memory to process.
On parsing large files, this link is also good. Another link on using SAX
UPDATE: 28 Jan 2011
Progress is being made. Have written a php script that will parse the discogs XML file. Currently it only browses and displays all the styles under the genre “Electronic”.
UPDATE: 29 Jan 2011
Modified the script some more, and now outputting artist names based on 5 or 6 style types. See styles HERE.
UPDATE: 11 March 2011
So I’ve sorted through the Discogs database.
Iterated through all 2.6 million album releases and if a release contained at least one of 5 styles (industrial, ebm, synthpop, electro and rythmic noise) I extracted the artist name.
So now I have just shy of 50,000 artists who released an album which at had one of those styles tagged to it. The problem is that if (for the sake of argument) ABBA had an album that had a style tag of Electero, it would get put in my database, even though we know that band isn’t really what I’m looking for.
Some releases have only one style tag, others have 2. I’ve seen up to 4 style tags per release.
So 50,000 artists is a shit load of artists to track, alot of which are not what I’m looking for, though I can’t think of any slick methods (short of manual entry) to ensure that only we consider industrial bands are tagged. I don’t know what criteria to use to sort this out.
This is where I’m stuck right now.
Some algorithms:
Algorithm 01
Depending on how many tags are present for each release, make sure that at least 1/2 the style tags are in my list of tags.
Algorithm 02
Look at the other releases of the same band and make sure that most (if not all) of the releases have one style that is in my list of tags. This algorithm I think would involve copying the entire database into a SQL database locally to be able to this kind of data manipulation.
How this would work: Take one album release > Extract the artist > Find all other albums by that artist > look only at their last few albums > if most of the style tags are in my list, then artist is good.
Algorithm 03
Look at album releases since an arbitrary year (say 2005?) that also fall under algorithm 01
UPDATE: 12 March 2011
Looked at only at release dates of 2008 and on, matching at least one style tag resulted in 7500 artists. A good way down from 50,000 but still to many.
Correction. That 7500 is for releases in 2008 only.
UPDATE: 18 March 2011
Decided that what I want to do has nothing to do with gathering a database of artists. What I want is to know what new albums are released in the genre/sub-genre. Discogs can provide that info (see this link) though I’ll need to write a scraper program to fetch the data, but it seems very doable.
If I could find some sort of RSS feed from Discogs that would be even better.
Read the list of releases, enter them in a DB locally, and each day compare against the local DB entries to determine which are new items and display those.
For each release I will want: Artist name, Release name, sub-genres, date of release, record label,
UPDATE: 25March2011
Using this scraper tutorial. Will see how successful it is.
UPDATE: 30 March 2011
Very successful. And easy to use.
When using the genres “EBM, Electro, Synth-pop, Industrial” too many unwanted results came up. For example when matching against Electro only, I would get albums with “EBM, Industrial and Electro” as styles (styles for one album), but I would also get albums with styles of “House, Electro” or “Trance, Dubstep, Electro” which are not what I’m looking for. For the styles of Electro and Synth-pop (which has the same problem as Electro) I decided I would filter to make sure that there had to be on of the other styles (EBM, Industrial) listed in the styles for the album to be considered.
Then I realized that if that’s the case, then I don’t need to search for Electro and Synth-pop. They would be part of the album’s style that had both Industrial (or EBM) and synth-pop or Electro.
So the list of searched genres will only be EBM, and Industrial. Considering putting Rhythmic Noise in that list…