Data Science Isn’t Always Sexy and Glamorous

Sometimes data science is a bunch of debugging and fact-checking.

Bit o’ trivia about me: I got into this because I wanted to start using open data and APIs instead of constantly fact-checking frequently inaccurate data from my co-workers. Now, I find I have to fact-check my data. Not sexy. Not sexy at all.

I’ve wanted to write this post for a long time but, until now, I felt it would seem like mere whining … or conspiracy ranting (you’ll see).

Gathering complete, comprehensive election results seems an impossible task. It’s almost (get ready) as if somebodyThey … don’t want us to have them (imagine my voice in any shrill tone you like).

It should be easier. Much easier.

I promise I won’t complain about every little thing because I’ve complained about some of these things before — such as

  • The FEC provides Excel files for some elections but only PDFs for others
  • The FEC API doesn’t provide access to election results at all
  • The aforementioned results aren’t available on their website for months
  • Politico has results in real time and even updates them for a couple days after the election but the page is then like an operating system update or rendering video … it stays at 99% complete forever. Even now, as of December 10, 2018, Alaska shows only 99.5% precincts reporting.

Last night, I opened VS Code to continue adding features to Election Insights (the web app formerly known as prezPlayPro) when I noticed a few things were somehow broken since my triumphant post of November 23 — not only did some results change but some maps were downright broken.

Long Story Short: I now test everything using a private or Incognito window so I can be at least a wee bit more sure I’m looking at the latest code. Results that I expected to change after a fix made 2-3 weeks ago finally showed up. So, nothing was broken or causing incorrect results but I only know the “new” results are accurate because I did some digging in several piles of data to confirm … digging which should have been easier.

I had two potential problems I needed to investigate. Two questions needed answering:

  • Did Evan McMullin really beat Darrell Castle in a buttload of states?
  • Why do I have two candidates (Darrell Castle and Emidio Soltysik) affiliated with the U.S. Taxpayers party in Michigan?

In my previous, Final 2016 Presidential Election Maps, post, I realized I hadn’t included Evan McMullin in my arrays of “right-leaning” candidates. Much to my surprise, adding him to those arrays didn’t change any results (or so I thought). Last night, when I saw that including him may have drastically changed the results, I realized one of two things was true — either my code was broken or my database contained mistakes.

I chose to look into Texas because when I moved my hand, that’s where my cursor landed, showing me McMullin. My results are taken from the PDF from the FEC but, FORTUNATELY, I didn’t go directly to that PDF to confirm results. I also wanted to check party affiliations which I got from Ballotpedia (whom I’ve whined about previously for other issues even before the inaccuracy I just found). Otherwise, I wouldn’t have found some of the groovy things I did.

So, first, I went to the Ballotpedia page for Michigan’s 2016 presidential election results. Much to my relief, the mistake was theirs.

ballotpediaMichigan2016.png
I took all my party affiliations from their Results tables which, at least in this case, differs from the list above.

When I first started this project, I tried using Python‘s Beautiful Soup to grab info like that in the above screenshot from Politico because they conveniently listed every state on a single page. Unfortunately, the code is filled with inconsistencies and invisible crap neither I nor Beautiful Soup could beat into submission. Also, if memory serves, candidates’ names were spelled differently on different state ballots. <– That’s infuriating fact #4,987 on the list.

So I just did some major cutting and pasting to fifty pages I saved from Ballotpedia which sucked in it’s own way because you can’t right-click on their US map to open them in separate tabs — you have to click each one and, after saving the state page, click the Back button to get back to the map.

Before I noticed the Ballotpedia candidate list contained different parties than the results table, I followed the link to their data source (Michigan‘s Secretary of State or, as Ballotpedia calls it, “Department of State”) but when I clicked it, got a 404. Several of the source links at Ballotpedia have the same result but I don’t know whether I should be frustrated with Ballotpedia for having broken links or, as I’d thought previously, frustrated with those states for not keeping their results pages up. My FEC results PDF lists parties for each candidate (but not, much to my chagrin, by state). There I found Soltysik listed as Natural Law Party (which is still kinda conservative, if my recollection is correct) and Socialist Party USA (like the Beach Boys song).

Not yet noticing the mistake in the screenshot above, I set my party affiliation problem aside for the moment and went to Ballotpedia’s Texas page so I could confirm my results (from the FEC) for Castle/McMullin.

ballotpediaTexas2016.png

Ballotpedia doesn’t even list McMullin as a candidate in Texas but does list 51,261 write-in votes. Ever the optimist, I clicked the link for Texas Secretary of State.

TexasSoS2016.png
And, as it turns out, McMullin wallops Castle in Texas.

Black gold! Texas tea! Comprehensive election results, that is!

Note that most of those 51k+ write-in votes are for a single candidate. I think that’s rather significant. If I were the type to post election results, I might consider including that bit of information. Of course, Ballotpedia is probably in the pocket of the Commission On Presidential Debates (who fit nicely in the pocket of Big Insurance who are run by the Illuminati).

Now I was curious if Politico limited their results like Ballotpedia.  I had to go there anyway to see what party affiliation they had for Soltysik anway, so … after finding Soltysik was accurately listed as NLP in Michigan, I saw Politico‘s Texas results were wanting as much as Ballotpedia’s.

TexasPolitico2016.png
These are Politico’s “detailed” results.

Now I was grateful I couldn’t get Beautiful Soup working to my satisfaction with Politico. I’d have missed out on a bunch of candidates!

I still have much digging to do because far too many of my candidates have “null” for party affiliation — not to mention I now know I must fact-check whatever I find. Getting data directly from each state would be best, of course, but since Ballotpedia’s links don’t go anywhere, that won’t be as easy as I’d like.

Advertisements

Django Unleashed

A few days ago, a recruiter contacted me via LinkedIn about a position requiring some Python and Django experience.

While I do take the advice of countless articles and podcasts, applying to positions that Imposter Syndrome would otherwise preclude, I don’t blindly apply to everything — I want to have at least minimal experience with the languages and such so that I could at least confidently learn my way through the job on the job if given such an opportunity. I’m taking the advice to heart and considering interviews a training ground–not cause for flop sweat or embarrassment. I’m always upfront about my knowledge and experience during phone screenings, cover letters and emails.

On the other hand, if a recruiter contacts me, I have no moral qualms about saying I’d be glad to pursue whatever they’re offering — they approached me, I’m not being an imposter. Having said that, if the recruiter contacts me about Java (which is SO irritating considering it’s not in my resume because I’ve never touched it or looked at it) or has a list of other crap I’ve never worked with or heard of, I won’t respond. If I’ve so much as completed a tutorial, however, I’ll explain to them my level of familiarity and willingness to learn and, if they reply back …

To the point …

She wasn’t put off by my less than a year with Python and asked, in a follow-up LinkedIn message, about my Django experience. I replied, “Can I answer that question tomorrow? My answer will be different then.”

I immediately started Lynda‘s Mastering Django Web Development course. After a few minutes, I wrote her again stating that, while as a software instructor I frequently learned things overnight so I could teach them the following day, Django didn’t appear to be one of those things I’d be comfortable with overnight.

Since then, however, I’ve had a delightful time with Lynda’s Up and Running with Python and Django course. That may change how I perceive Mastering, but I may also come back and say that the instructor for Up and Running is simply a better teacher (an impression I’m kinda learning toward).

One of the reasons I like the UaR teacher (besides his teaching skillz) is his voice — he sounds like Richard on Silicon Valley.

2016 Books of the Year

As with last year, these aren’t necessarily new books released in 2016. These are the books I discovered that most contributed to my learning and advancement in 2016. New books face a disadvantage in their hopes to be so honored — I prefer free books or library books. Having said that, free online books are at a disadvantage because I don’t own a tablet (and, if I did, it would be WiFi only limiting where I could read these online books) and my screen-space is taken up with the apps I use to do my coding and testing (books belong on my literal, not my pixel, desktop). I do have a Kindle eReader, but free eBooks are at a disadvantage because they almost always suck.

MY BIG, FAT, RAGE QUESTION: What ever happened to “Print-Friendly”? It would be so great if articles and books I found could be printed in an easily-read manner that didn’t waste paper. Is there a snippet of code that consciously, purposely prints a final page with just a couple lines on it? That’s not so bad when it’s the page’s footer — I can opt not to print (or throw away) that one — but, when it’s the last half of the last paragraph, that really sucks. Even if I don’t need that last paragraph, the OCD in me rises up and I must still print and keep it.

See Also: 2015 Books of the Year

There aren’t many. The number of sites I’d recommend for learning grew much more this year than my list of books. For the most part, these are in no particular order.

Learning PHP, MySQL, JavaScript, CSS & HTML5 by Robin Nixon

Best thing about this is Nixon doesn’t isolate writing functions, or forms, or databases … you learn like you’d want to — in context, in practical, real-world ways. Nixon is also a very good writer. O’Reilly means you won’t get frustrated by poor editing, poor proofing, or crappy code and/or directions that don’t work. Picked up an old edition at the library just based on the PHP/MySQL content and kept it as long as I could. I still need to go back and get it again. It’s frequently checked out and often has holds. New it’s almost $40. Used it’s under $10.

The Modern Web: Multi-Device Web Development with HTML5, CSS3, and JavaScript by Peter Gasston

If, like me, you want to know what stuff is and does and why — as opposed to what code to write for X to work so you can finish the project — Modern Web is delightful. Terms, technologies, and tools I’d scanned or glossed over and those I’d never heard of are explained in understandable (1 point) and interesting (2 points!) ways.

From my March 30 post: “You should probably already be in love with No Starch Press. I kept passing by The Modern Web by Peter Gasston but it might just be my new favorite book from them. I was hooked instantly once I gave it a chance and although I’m only on page 21, I’ve been educated and enlightened. Not just informative but entertaining and stimulating. Yay for my library having a copy. I would never consider picking up a book on CSS but, based on this experience, I’m going to check out his Book of CSS3.”

My entire post from April 15: “I stopped reading The Modern Web by Peter Gasston for a bit because I thought, ‘Okay, the rest won’t be this good … it’s slowing down’ but picked it back up on a whim and then, wham, it started being so great again. I can’t believe how awesome this book is. I wish I’d found it months ago.”

Learn HTML5 and JavaScript for iOS by Scott Preston

I thought this would be creating mobile apps using HTML5 and JavaScript. I hoped so. Isn’t learning all this JavaScript (including libraries and frameworks), PHP, MySQL, and Python enough? I have to learn Java and Swift, too? Once I realized this book covered “merely” making mobile-friendly (read: screen-size) web apps, I was disappointed. But, like Modern Web (though not to that extent), I was pleasantly surprised. Preston explained things in enlightening ways that actually made me angry at the nameless clowns who wrote the crap I’d read on those topics before. Preston inspired many of those “Why didn’t they just say that?!” moments about which I’ve ranted many times before. And, there are enough things I didn’t know (and things I didn’t know about things I did) that I’m working through the books samples and exercises. Preston is a good author … who should punch his proofreader and editor in the face (this specifically applies to the 2012 edition I got from the library).

Plug-In PHP: 100 Power Solutions by Robin Nixon

Currently ripping through this. I thought the author’s name sounded familiar but didn’t make the connection until this morning. If I’d noticed the word “Plug-In” in the title, I probably wouldn’t have checked out this book. The good news, as I see it, is I would actually have called the book 100 Awesome Task-Based PHP Snippets Explained In-Depth and Extremely Well. I love that Nixon is a true hacker-geek who proudly admits to using PHP to download and store all of Wikipedia locally in case he wants to look something up offline. I’m quite fond of authors who take the time to say, “If X doesn’t work, it’s either because you screwed up Y or there’s a Z-factor, in which case you should …”

Learn Python the Hard Way by Zed A. Shaw

Hard, PDF download, or read free online. The introduction is called, “The Hard Way Is Easier” and I agree. Shaw is also very strict about adhering to the hard way. He frequently asks, “Something doesn’t work?” followed by “Did you use something I told you not to or do it in a way I didn’t tell you?” then with “I told you so. Now go back and do it how I told you.” I love it. Obviously, this isn’t one of those “Learn language X using … some thing you’ll probably never really use and, even if you did, this method won’t help you learn language X in and of itself” resources. He’s also got Hard Way books on Ruby, C, SQL, JavaScript, and … ooh … just noticed Regex which I desperately want and need.

Honorable Mention (Holdovers from Last Year)

  • JavaScript and jQuery: Interactive Front-End Web Development by Jon Duckett

Beautiful Socialism

About 13 months ago, I made some maps of the 2012 election results using only the socialist candidates. If nothing else, I was curious how many votes they’d get if they all voted together instead of having seven or more different candidates. Those maps were made using Illustrator, Photoshop, and data from the  delightfully detailed, completely comprehensive FEC 2012 Election Results (PDF) I tediously transcribed into Excel. Fun, but yuck.

For 2016, I’m using Python and D3 and will probably throw in some MySQL, PHP, and jQuery along with, of course, plain-old JavaScript for some other fun.

I’ve just finished preparing the latest data from the 2016 election. The post and project are called beautifulSocialism because I use BeautifulSoup. Get it? See what I did there?

First, I just saved the 2016 Presidential Election Results page from Politico. After my first few tries (ever) using Beautiful Soup, I reduced just over 6202 lines of code (and they were really long lines) to 103 equally dense lines of code that I could almost use. I can’t express how proud I am of how elegant it is, IMHO, and how proud I am.

07
Lines 30-33 were a last-minute addition after I noticed that tag contained the hidden treasure of the full party name.

However, there were some whitespace issues I just could not solve and neither Google searches nor StackOverflow provided solutions that worked for me. Also, BeautifulSoup’s encoding shoved some additional unwanted characters into my “final” product.

table3_before_editing

I spent a lot of time trying many things but solved neither problem. I made it even worse a couple times, though!

Eventually, I surrendered and used Dreamweaver for a relatively few rounds of Find & Replace. First, I used Dreamweaver’s awesome Apply Source Formatting command which made the code pretty but the number of lines ballooned to 2595.

Sadly, correcting the candidates’ names took far more rounds than I expected because they were screwed up in so many different ways. I wanted full names and, since there were many candidates even I was unfamiliar with, I went to my go-to source of presidential candidate information for the last ten years but Politics1.com‘s lack of state-specific ballot information  (in their defense, that’s not the site’s purpose) posed two problems:

  • They give the candidate’s home state but I didn’t know if Smith from whatever state, for example, would be the Smith running in some other state.
  • They give the party the candidate most identifies with but Politico’s results used whatever was on the ballot–often “unaffiliated,” “independent,” or “other.”

So, much to my chagrin, I used Ballot-o-pedia or whatever it’s called. It’s the slowest damn site on the Internet. I hate to admit that it was a huge help and I’m still not providing a link to it. It made me want to throw stuff several times.

After a bunch of manual editing I didn’t expect, I now have this:

3a

Now for some fun DOM manipulation using jQuery to dynamically add some sexy CSS to for the table. Yes, it might be quicker to just do it manually (much like my eventual editing in Dreamweaver) but I prefer learning, even if it takes longer and I make mistakes. Besides, someday, hopefully, I’ll be working with much larger data sets and this knowledge would, of course, pay off.

That’s unrelated to the maps, of course, but there are so many cool things I can do to practice with this data! I’ll also make something that will find and list all the different parties candidates use in different states (just for fun but also) to consolidate them and use in the maps.

Eventually, much of this work will also apply to my politicsPlay project as well.

BeautifulSoup is Not BS

Just completed my first BeautifulSoup tutorial and oh my gosh is there ever a reason everyone raves about it. I can’t believe I stopped to write this post before trying it on my own projects!

I love that the tutorial used a table listing members of congress for practice.

I Am the King of Rock and Roll

Behold …

badassery

I, king of the world, the greatest Python programmer in history, have succeed in the following. I wrote a script that does this wicked coolness using the Spotify API:

  1. Creates and opens an HTML file
  2. Inserts opening HTML, opening and closing HEAD tags, and opening BODY tag
  3. Takes an artist ID
  4. Gets that artist’s albums
    • For each album
      1. Puts the album title in H2 tags
      2. Inserts an opening OL tag
      3. Uses the album ID to get that album’s tracks
        • For each track
          • Puts the track name in LI tags
      4. Inserts a closing OL tag
  5. Inserts closing BODY and HTML tags
  6. Closes the HTML file

It looks like this …

8list.png

If you would like autographed picture of me or some other token of my awesomeness, please leave a polite comment.

I am also now available for hire as your new Sr. Developer or Data Scientist.

Update: After getting the popularity for each album (below), I seem to have stalled.

7pop.png

I wanted to combine the two–have a list of albums showing their popularity and the songs showing their popularity but, for some reason, I can’t get my Python code to get the track objects containing the track popularity. I can do it if that function is it’s own thing but as soon as I make it part of a loop it won’t work anymore.

Another update: Eventually I decided to start over with JavaScript and jQuery.

Darnit, Darnit, Darnit

Still working on my algorithm for use with the Spotify API. I know APIs and/or algorithms are really easy for some people but this is taking me a bit. It isn’t necessarily these topics in and of themselves that’s difficult for me, it’s that I am a concept-to-completion, designer + developer, front-end and back-end person so I’m always thinking and working on multiple/more aspects of a project than, perhaps, most people would.

Also, I keep hoping and trying to complete these projects and meet my goals without having to learn yet another language, framework, or library … or two or three.

If my venture into learning React hadn’t crashed & burned (crappy book with many, many mistakes so I would often struggle with something, blaming myself, forever until I found out the documentation itself was incorrect) this Spotify project might have been finished quite quickly.

My … guess … at the moment is … if I could somehow use JavaScript with Python, that would be ideal … but I don’t know if that’s possible or the best solution so … I’m still figuring out and learning what I can and can’t do … that’s a big thing … I don’t know what I don’t know in some cases.

I’m thinking, as I’ve mentioned before, now is the perfect time to learn at least one of these templating thingies (library? framework? both? where does one end and the other begin?).

Or, maybe the solution is using these other Python-related tools … database thingies and whatnot.

I must confess I’m scared of venturing into WebPy or PyWeb or whatever it’s called.