When researching enormous amounts of navigational data creating a hierarchical graph of the data can be very insightful. One of our long-term customers has a very content rich website which keeps growing over the years. Back in the days we have build a custom solution which is basically a set of tools via which the website is created.
Navigation and content are both maintained in a large Excel sheet, both the sheet and files are uploaded and the server converts this pool of information into a website including all navigation.
Although currently we are using the fourth revision of the site the workflow is still the same as back in early 2000. The latest revision of the site always had some navigational issues. Depending on the section and depth the menu’s would become too long or even wrongly placed.
Over the time the templates which drive this have become filled with exceptions and hardcoded conditions. When recently a new section was added to the site the issues of “funny” or “plain wrong” navigation raised their ugly bits again.
Instead of digging into the templating code I decided to try something else; get the insight of WHEN en WHY this happens..
We have been adding the Neo4J Graph database to our new toolkit and I learned to appreciate the visual representation of the data, it shows you how stuff is related and it creates an insight which is missing when you are browsing through lists of lists and folders of folders.
Instead of converting everything into a graph database I decided to try and use some Graphing tools I’ve learned through meetups about graphs.
First in line was Gephi which might be very suitable for the job but sadly does not run on Mac OS X at the moment due to several issues. Instead of trying to mold my Java installation I had a quick look via a Windows VM. It is most certainly a package I will try again when 0.9 is back up to sniff and runs on Yosemite.
An important part of the Excel sheet in which all assets are documented is the “location” of the asset in the site; there are more than 1800 lines which look like:
Nieuws > Politiek? Het interesseert u geen ene moer Nieuws > Publicatie 'De veranderende relatie tussen burger en overheid' Feiten en cijfers > Amsterdam > Kerncijfers Feiten en cijfers > Amsterdam > Bevolking Feiten en cijfers > Amsterdam > Openbare orde en veiligheid Feiten en cijfers > Amsterdam > Werk en inkomen Publicaties > Amsterdam in cijfers Publicaties > Amsterdam in cijfers > 2014 Publicaties > Amsterdam in cijfers > 2013 Publicaties > Amsterdam in cijfers > 2010-2012 Thema's > Verkiezingen > Tweede Kamer Thema's > Verkiezingen > Tweede Kamer > 2002 Thema's > Verkiezingen > Tweede Kamer > 2003 Thema's > Verkiezingen > Tweede Kamer > 2006 Thema's > Verkiezingen > Tweede Kamer > 2010 Thema's > Verkiezingen > Tweede Kamer > 2012 Thema's > Verkiezingen > Provinciale Staten
This can easily be converted to a format which is known as “the dot language” which is used by several tools including Graphviz. With some quick Python code applied:
for i in open("items.txt"): i = i.strip() e =