TIL (Theotherday I Learned) That Word “Group” Doesn’t Mean What You Think It Means.
Here’s my schema so you can follow what I was trying to do.
My database gathers data I collect using the Spotify Web API on the popularity of artists, albums, and tracks over time. For example, I have a weekly cron task that grabs the popularity of a bunch of artists.
This query’s ultimate purpose is so I can show the latest popularity scores for all albums by a given artist. As pictured below, it shows the artistName, albumName, year (the album was released), and latest popularity score. It needs the albumID for two of the tables it selects from and the artistID to select from two tables. I only want the latest popularity score so it selects from the popAlbums table multiple times.
After several days of trying to get this complex query to work (I kept getting a “mysqli_fetch_array() expects parameter … to be mysqli_result, boolean given in” error), I broke it up into a couple views and incorporated them into this query:
It worked fabulously. I was delighted. But I needed to have a PHP variable in a couple spots to generate it dynamically so I couldn’t use views. It still took me a while longer to get the sub-queries to function properly. This is what I came up with. It uses the actual artistID instead of a variable for testing purposes.
It worked fine in MySQL Workbench (which I’ve fallen in love with along with Sequel Pro for Mac OS) but I’ve seen stuff be fine in my editor and fail with the Terminal, browser, server, etc. so I uploaded that and when it actually worked in the browser from my server—instantly with no error—my first reaction was shock and skepticism.
Maybe it was pessimism or maybe it was intuition but my first thought was, “That can’t be right.”
The browser showed only one result per album but … still … my gut didn’t accept it.
I ran a query for all (Alice Cooper) results in the popArtists table to see if these were, indeed, the latest results and not … whatever … so I looked for album popularity scores that were different the last time I pulled them from Spotify.
What I saw was that the score was, indeed, incorrect in Figure 3 and Figure 4. But the date was correct (see Figure 3).
“This isn’t going to end the way you think” Luke said but I didn’t listen.
My results were showing the latest date but not the latest popularity score.
Now I was really confused. But fascinated. I had that gleeful feeling I’ve heard scientists get when they say, “THAT’s not supposed to happen!”
Some googling told me Group doesn’t work exactly how you think and, thankfully, one of the solutions was presented and explained in a way that worked for me.
GROUP BY groups the specified column but, apparently, it doesn’t necessarily keep all the other columns with it … you know, like in Excel … or, at least, it doesn’t … I don’t know. It just doesn’t work like most people think it should. I am far from the first to encounter this phenomenon.
I’m still not sure I could explain how the solution works to someone else but it involved yet another sub-sub-sub-query that first groups the results then grabs the row with the … but you’d think that’s what … nevermind … just trust me the solution works.
Here’s what finally worked:
And the result
Note several of the scores changed once the query is written* and works correctly.
*It worked just fine … MySQL isn’t broken …
Lastly, I love both MySQL Workbench and Sequel Pro because they allow me to work remotely via ssh getting instant test results (not having to write a file, upload it, test it in the browser, rinse, repeat like I was doing) while not worrying about people attacking, say, an installation of PHPMyAdmin (no matter how securely I’ve locked it down). These apps made this and so much other work go SO much faster so it was so much more enjoyable.