One part of my IoT solution design that I wanted to dive into a bit more in the data side of things because after all, the reason I’m making this project is to capture data.
The first question you might want to ask yourself when making an IoT project is where are you going to store the data, this was where I started, but now that I’ve completed the first release I’ve realised that that was the wrong question to have started with, instead I should have asked what was I going to do with the data?
Using Your Data
Before you choose a storage type and data structure it’s really important to start thinking about just what you will use your data for. Azure has lot of things to choose from such as Time Series Insights, Machine Learning services, Stream Analytics, Power BI or even the humble Excel spreadsheet! You can also build your own dashboards, maybe even some animated charts with React 😉.
All of this will influence the decisions that you make around storage and structure.
For my solution I have two ways I want to use the data at the moment, first is to generate Power BI reports that allow me to look at trends over time (generation, consumption, etc.), second is to create some custom real-time charts.
Structuring Your Data
Now we’ve got an idea of what we want to do without data it’s time to think about how we will structure it, as that will have an influence on the type of storage that we use.
From my inverter I’m getting three data sets:
- The labels for the sensors being monitored
- The values from each monitored sensor
- The power generated in 5-minute increments (I think… I’m not 100% sure if that’s what it, but that’s what I think it is)
The data in 1 & 3 are interesting but the really valuable information is the data from the 2nd API. In here it’s broken down with a few valuable groups:
- The watts, amps and volts per panel group
- A summary of the watts, amps and volts that went into the inverter
- A summary of the watts, amps, volts and hertz that went to the grid (I think… It’s called “out” in the API so I’m guessing that’s out of the inverter to the grid)
- Total generation summary in day, week, month, year and all time
There’s a heap of other points that I get back that I don’t understand too (this is an undocumented API after all 🤣).
With this in mind, I started to think about the kinds of “questions” I would formulate for the data, such as “what is the power generated by each panel set for the last 30 days?”, “what’s the total in and out?” or “how much power do we use as a household?”. This helped me think about how best to structure the data.
I decided that I wanted to store the raw message untouched since I don’t use all fields yet (but may in the future), and I want to do this for each API that I call.
Next, I want to break down the main one into a few groups, Panel Feed and Summary. This is where I use multiple functions and consumer groups that I described in the solution design.
Finally, we want to structure our data for the kinds of queries we want to do against it. I made the decision that I would optimise for read in a non-relational manner, meaning I’ll duplicate data across the different structures instead of doing joins. But I do still want to have a loose relationship between each piece of data, so for that I’m generating a correlation ID that is attached to the message so each record can be related if I want.
Choosing Our Storage Type
So let’s take stock, we want to store half a dozen different data structures in a non-relational manner with some basic query support. Oh, and I want it to be cheap (hey, it’s my credit card each month!). With all this in mind I landed on Azure Table Stroage.
Since I’m using F# I have Record Types that represent the different structures:
You’ll see here that I have a
CorrelationId property, this allows me to trace the panel record (of which I have 2 per message) back to the full data set when it was sent up. I also have a timestamp in there for the message that allows me to group them over time.
For each of my data structures I use a different table rather than a combined table. This is mainly so I can look at an individual type and not find data gaps when the structure of each record is different.
It’s Not Perfect
It’s worth noting though that this isn’t a perfect solution. When I started looking into the Power BI reporting my friend Dom Raniszewski asked me why I was using Table and not Blob, which would be more efficient. And he’s right, there are a number of inefficiencies in how the data is stored for read in Power BI, but the reason for that is I also wanted an easy programmatic model so I could build my own real-time reporting (Power BI refreshes the data every 24 hours). I’m going to keep it as is for the moment but we’ll see, I may revise it in the future.
And it turns out that future is now, as while writing this post I realised I had a design flaw in the way I’m storing data for retrieval. Since the main view I want is at the day level, not seconds (which I capture in) I need some way to view that. But I can’t do it because the date is a timestamp to the second and Table Storage’s query engine isn’t advanced enough. I’m going to think through how to best address this and retrofit it back into the 30k+ records I already have in storage!
Data is often a cornerstone of an IoT project and ensuring you have the right approach to storing it will dramatically improve the benefit you can get from it. As a technologist, your thought might immediately jump to choosing the right database type and then determine how to work your application into it.
Instead, I’d encourage you to flip the direction, start thinking about what you want to do with your data and then find out what will be the best fit for that.
For me Table Storage is the best fit for a number of reasons, but there are still imperfect edges that I’ll have to deal with.