Sometimes scraping can be a complete pain when the site you’re dealing with is poorly structured. While trying to scrape composers and their respective musical pieces from a site, we ran into a problem because a composer’s pieces were not nested within that composer. Enter the ZIP method, the coolest method I had never heard of that I found out about today.
Here’s an example of the html structure we were dealing with:
Hooray! We now have paired the correct piece with the correct composer. All we had to do then was iterate through our arrays to create hashes: assigning array[0] as the key and array[1..-1] as the value. We needed [1..-1] because sometimes a composer has multiple pieces.
From there it was easy to save the key as the composer and the value as that composer’s piece. Thank you to my team members for their help with this today. Shout out to Rebecca Greenblatt and Will Lowry!