Do You Love Maps, History and Data?
Yes, I love them! I also love trying to do some sort of analysis on them. I hope some of you share a love and will appreciate this site dedicated to them. Here's a brief explanaion of how that love generated this site.
I was initially inspired by reading a pair of excellent books:
'Everybody Lies' got me excited about performing my own data analysis, particularly from large open-source data sources, like Wikipedia.
'Gunpowder' gave me an idea for a project, based on this quote:
... castles encouraged medieval nobles and strongmen to assert to assert their independence, resulting in the centuries of localized war that plagued feudal Europe.
In other words, the fractured map of Europe during the middle ages might be explained by the proliferation of strong defensive castles. Based on statement, I thought it would be cool to create an animated graph comparing castle counts to counts of kingdoms/duchies, etc. Hopefully finding a positive correlation and also producing a map that illustrated that lots of tiny states were the natural result of strong castles. The data source would be Wikipedia.
Step one would be to write some code to parse the roughly 10,000 castle articles on Wikipedia and pull-out the construction dates and destruction dates, plus their location. Step two would entail cross- referencint that with a list of historic states. Step three would be building a nice animated graph.
Scraping data off of Wikipedia initially looked straightforward: the first pages I looked at presented data in tabular form, with a simple entry listing the construction date. But as I plowed through more castle pages, it proved to be quite the challenge; the castle pages were too inconsistent for good data. And then I discovered that there were way more castles around than are listed in Wikipedia, though most of the non-Wiki castles are now just ruins. Bummer, it didn't look like it would be possible to capture all the castles and the dates for the ones on Wikipedia would probably require a human reading every article, in contrast to my programmatic scanning.
But, capturing the geographic coordinates from Wikipedia proved to be quite a bit easier, and I liked the map I was able to build using that data. Also, I wrote some code to search for key words classifying castle types and features, such as 'Z-Plan castles, 'plague', 'secret doors', etc. While running that code I realized it provided a great way to discover interesting stories about castles that castle lovers might appreciate. I could search for things like 'witches' or 'treachery' that might make for interesting stories, then examine the results for good stories. So I used my code to assemble a nice list of interesting stories about castles.
Pretty quick I decided that I had some material for an interesting web site, so I thought about what other kinds of things people might like to see. Maybe they'd like something to help them pick the best castles to visit on their short vacations, whether it be jousting, falconry exhibits, museums, or something else. That led to the 'Visit' page. And, discovering that some castles had virtual tours, while others could be rented; led to a couple additional pages. So that takes me to the current state.
It was more work than I'd planned to bring the web site to the current point. There are a ton of things that I don't like, or need improvement, but I can't do that work if the site isn't interesting. I need some indication that the site will be interesting to enough people. If the site gets enough traffic, I have a lot of ideas about improving the web site.
Explanation of Certain Design Choices
Walled Cities vs. Castles
It looks like Wikipedia has a consistent confusing issue: a lot of 'castles' listed on their 'Castle-list' pages refer walled-towns or cities, not castles. Sometimes the town has a castle with the same name, such as this randomly chosen castle: Šumperk
Trying to read this page using computer code is pretty tricky: you'll notice that the first paragraph states “Šumperk” is a town... Then, later on, it says “The house of Páni z Lipé built Šumperk Castle”, with a link to a page for that castle that doesn't actually exist. In this case, it is pretty clear that the article pertains to the town, not the castle, so my code decides not to include the castle in the database, even though it appears in the list of castles for the Czech Republic.
However, other times the articles are more ambiguous; sometimes there is a castle with the same name as the town, but the article starts out with phrase like “castle name is a town/village/burg”.
The upshot is that, if the article has a comment like "castle name x is a city/town/burg, etc.", then I don't include it in my database. It is really hard to write computer code that reads the text and decides whether the article pertains to a castle or a town. But I'm open to feedback about this.
Fortifications vs. Castles vs. Palaces vs. Fancy Manors
It seems like Wikipedia plays it pretty loose when it comes to deciding what goes into their lists of castles; a lot of the buildings listed aren't really castles, at least, not to my way of thinking. Eventually I decided to keep everything in the list, unless disqualified for being a town/city, as described above. But, again, I'm open to feedback.
Also, Wikipedia has a whole different set of lists pertaining to fortifications, which I discovered after having expended a lot of work on just the lists of castles. I've thought about including that data on this web site, again, I'm open to feedback.
Accuracy of Data
If you examine enough of the castles I've captured, you'll notice the castle status ('ruined', 'intact', 'rebuilt', etc.) is often wrong. Sorry about that, the reason is that it is pretty hard for a computer to read some of the language on Wikipedia. My code looks for phrases like “the castle was restoredin 2020” or “the castle is in ruins”.
Sometimes it finds something like that which is misleading, such as the captions for Kolossi Castle, where one of the photos is captioned “Kolossi Castle ruins”, yet other photos appear to show an intact castle.
Obviously, my computer code can't look at the other pictures on the page and deduce that the castle status is probably not “ruins”; it takes a human to make that determination. So the castle is listed here as being in ruins. Other times, multiple statuses may be listed, and my code attempts to use the last one it finds, even though this is problematic.
The same issues apply to determining the castle's built dates: sometimes the article has a table with a caption like “built date” that makes it really obvious when the castle was constructed, but most of the time my code has to examine the article text, seeking phrases like “Good King X erected/built/ordered the construction of the castle in year xxxx”. Again, this works most of the time, but often castles have been destroyed and rebuilt multiple times, making this a tricky proposition. The result is that the accuracy of my dates is shakey; I hope you can understand the difficulty of capturing the dates.
Short Descriptions: some of the Wikipedia articles are quite lengthy. My database allocates just 8,000 characters per castle for the description; that is enough for many castles but occasionally this results in a truncated description. I recognize this can be a bit confusing, especially when the castle in question is listed on my “Most Interesting” page and the interesting parts aren't displayed! Sorry, I have to limit the size of the description, but again, I'm open to feedback.
Ideas for Future Work
I would like to improve the site and have a lot of ideas for doing so. I'd also appreciate feedback from users about what features I might add. However, if the site is boring to most people and doesn't receive enough traffic, then I'm not going to expend that effort. So log-in and leave me a note, let me know whether you like the site or whether it is just ho-hum, another castle web site.
Some Half-Baked Ideas to Improve the Site
- Would like to allow users to edit/delete the comments they leave on the site.
- Would like to allow users to change their password/email address and optionally enter their picture and some sort of user profile.
- Thinking about adding a new page along the lines of “new features/articles” created since the last time you visited the site, along with a summary of responses to comments you may have entered on the site since your last visit.
- Find ways to make the site appealing enough that you'all keep coming back because of the social aspect, comments and comraderie. Possibly add a feature like “castles in the news” where users could upload links about castles that are current.
Would like to grant recognition to users who contribute to the site, by uploading
content, completing quests, or contributing to the discussion. I was thinking of
creating ranks/badges to assign to users based on how much they contribute, they
could be something along the lines of 'Peasant', 'squire', 'knight', 'knight-commander',
based on activities completed here. Some candidate activities might include
- Entering comments, voting on content: one merit
- Uploading original content, such as trip reports or pictures: five merits
- Completing a quest: ten merits
- Comments with cuss-words: two demerits
- Insulting a fellow member: 3 demerits
- Uploading advertising links: banned
- Uploading NSFW pictures: ten demerits
- Attempting hacker exploits: banned
- Data clean-up. Refer to my discussion above. Also things like photo captions that are erroneously captured as paragraphs and missing headings.
- Speculative. Attempt to somehow capture the battle history for each castle. It would be cool to list/count the battles for each castle and perform some sort of analysis of what terrain features and castle features are correlated with high/low success rates in battle. This would probably entail a lot of research and enhancements to the castle pages on Wikipedia.
- Capture data from other open-source entities, if any such exist. Since that is a lot of work, it would depend on this site generating enough traffic to justify the effort. I know everyone loves free content, but I do have a life outside of castles.