Predicting society membership with a whole lot of data
For many of us, societies are a key part of the university experience, giving us a chance to meet new friends, explore new hobbies, and become more integrated in the University as a whole. However despite freshers’ fairs and ‘Give it a Go’s, a lot of societies struggle to find new members. This is especially difficult for smaller societies that cover a niche interest and depend heavily on having an engaged membership base to fund new opportunities.
So how can societies maximise membership? Social media companies currently use algorithms that map relationships between users and use this data to recommend new friends and followers. Could we make a similar algorithm that recommends new societies to students? Leeds’ student, George Sykes, is working on exactly that.
George collected all this data by scraping it from the LUU website. This process involves using the Scrapy Framework to find all the links on a webpage. It then follows those links to new pages and follows the links on all of those pages and so on until it has reached a specified depth limit. Whilst doing this, it collects relevant information from the pages; in this case: name of society, a unique society ID, names of people on the committee, people’s committee position. Once you’ve collected all your data you can import it into a graph database, using further code to create connections between groups.
In total, the graph (if you open the link click the little play button in the bottom right to expand out the nodes and then click a red dot to see the society names) contains 354 nodes (societies) and 170 relationships (shared committees). Much of the graph isn’t currently connected and you can’t always find a path between two societies. This is because a lot of small societies don’t share committee members. The addition of shared members would certainly create a much more complex and informative model.
The Adamic Adar algorithm calculates how close two societies are, with a higher number indicating that their committees are more closely linked. For example the Abuse Awareness Society and Debating society receive a score of 0.455, whilst when the former is compared to cheerleading the algorithm generates a 0.
At the moment this model is still in its infancy but is an exciting new tool to boost student engagement round campus. If we generated a graph using membership data we could use even more complex models to find societies that have a similar student dynamic. This could then recommend societies for students based on their interests and values. This data could also identify societies with shared membership and help pair them up for collaborations that they may never have thought of before, connecting more students and making sure everyone gets the most out of their university experience.
You can find George’s complete graph here and if you want to try your hand at scraping your own set of data, make sure to take a look at all the code you’re going to need.