Obviously, the post title is
stolen from inspired by the Principia Mathematica without claiming to be as comprehensive or influential. With that out of the way, let's focus on what are the underlying characteristics of data as well as the implications that come along with it for processing and storing.
Data comes in different shapes: tabular, nested, graphy.
As I have pointed out here and there, in terms of data shapes, one can distinguish between logical and physical data layouts:
When you process data, you want to be aware of the physical layout, in order to exploit it—for example column-oriented formats such as Parquet for analytical workloads—and you also want to accomodate the logical layout, be it explicitly or through interfaces (CSV file vs. Google Spreadsheet).
Data has granularity, or better say we choose to treat it with a certain granularity.
One valid definition of data granularity is Wikipedia's although it's arguably a simplified one. To appreciate the real depth of it, I suggest you read Martin Kleppmann's post Stream Processing, Event Sourcing, Reactive, CEP … and making sense of it all, where he concisely makes the case for raw events vs. aggregates, including their use cases and pro/cons.
In this post Martin also makes the case that one can have both fast reads & writes when decoupling the input and output schemata. Just read the post, I can honestly not add more here ;)
Data has gravity.
That means, in a nutshell, it tends to put up resistence when moved and tends to be more sticky than, for example, code. Say you've got a cluster with 3PB on-prem and your objective is to design a DR solution hosted in a public cloud. What would you do?
Again, I'm not gonna drill down here, I'll just refer you to experts on this topic: datagravity.org.
Data has a temperature.
One of the most interesting and practical pieces on this topic I came across is HDFS Storage Efficiency using Tiered Storage from the eBay engineering team:
Do you know of other insightful ones?
There we go: data has a temperature, gravity, granularity, and comes in different shapes. I'm sure as we together explore this space we will encounter even more underlying characteristics of data.