The Developer of the Future

The Developer of the Future – Data

December 14, 2021 Deanna Davenport

Welcome to the second instalment of the ‘Developer of the Future’ blog series. Today we’re looking into the topic of Data and how it will help shape the future of development.

Data

The more we engage with technology, the more data is produced which presents a challenge not only for data storage but also how we process and extract value from it.

Storage

Traditionally data has been stored in relational databases. These mimic our spreadsheet tables and were a natural progression as society moved to storing data electronically, but overtime as the amount of data we want to store has grown and changed shape, these traditional tables have started to struggle.

Relational databases are quite rigid. Once you’ve defined your schema, it’s difficult to change it. That means if your data obtains new attributes, adding them (or removing them) from a relational database can be difficult. There’s a reason for all of this of course, the strict schema allows you to perform advanced SQL queries, because the structure is known, but for many use cases these queries aren’t actually needed. It’s not uncommon to have a relational database where the only search performed is a simple look up by identifier, in such cases the restriction of the schema is just a hinderance.

Scalability is another issue facing relational databases. When you want to increase the capability of a relational database, you have to scale it vertically. This means powering up the database server by upgrading components like the CPU. It’s a costly process, limited by the availability of upgrades. Relational databases are forced into this type of scaling because their tables (relations) are linked to one another making them difficult to separate and split across multiple servers.

We are now starting to see a rise in the use of No-SQL databases. These come in different varieties, each with their own pros and cons, but the overall theme of these databases is one of flexibility. They allow the attributes of the data to change, often storing data as ‘key-value’ pairs. The ‘key’ represents an identifier, whilst the value is our data object which can now have whatever fields it needs. The removal of relations also means that No-SQL databases can scale horizontally as the database can be distributed across multiple servers. This horizontal scaling can be cheaper than vertical scaling as the servers themselves don’t need to be as powerful.

But what about relationships between data?

So far we’ve talked about how No-SQL databases have stepped away from having links between data, but what if those relationships are just as important as the data itself? In this case another type of No-SQL database is starting to take over, called a Graph database. Graph databases store data as nodes and represent relationships as links between those nodes. Through graph databases you can carry out complex traversals and truly explore a large dataset by branching out from a starting point to investigate neighbouring data. This, coupled with the fact that the data can be flexible makes it a popular choice for a wide variety of projects. At Rowe we’ve worked with multiple different flavours of graph databases for different applications including gazetteers and semantic content management systems.

Big Data

We’ve talked about how demand for data is growing and how the choice of database affects how you can scale to meet this demand, but how do we process these ever growing volumes of data?

Big data refers to datasets that are too large or complex to be processed with traditional software. These datasets can hold a wealth of information if you are able to analyse it and ask the right questions, so the demand for applications that can store and utilise it has soared. Big Data is also being used for training datasets in Machine Learning – another topic we’re going to delve into later!

Here at Rowe we’ve developed systems that deal with vast amounts of data. Some of these analyse data and transform it into more readable charts and graphs, whilst others extract data from even larger sets and package this up for end customers.

Visibility

Something interesting we’ve noticed is the move towards sharing data, particularly with public sector organisations. At Rowe we have been involved in the design and development of various API’s for different government organisations and these are often being made public. This shift towards public API’s encourages sharing across different government organisations and is a trend we think will continue.

Keep a look out for the next blog in the series on Open-Source software.