I was doing a Google search a couple of days ago and particularly, I wanted to know what are the options for developers who need to visualize large amounts of data points within applications. To my surprise, I couldn't really find that many solutions, and in fact, most of the results I got were from already many years ago.
But there was one result that grabbed my attention and it was a Reddit post about a user asking how to plot billions of data points? While other users replied saying "try using Power BI or Excel", other users suggested workarounds like using a Pandas DataFrame in the Python environment in order to store and manipulate data. But the user who posted the question didn't want to store and manipulate data, the user wanted to visualize the data!
So, how to visualize billions of data points and why would you do that?
Table of Contents
- Why Visualize Billions of Data Points?
- High-Performance in Data Visualization
- Charting Libraries
- .NET Charting Libraries
Why Visualize Billions of Data Points?
Continuing with analyzing what the Reddit user asked, some users suggested that trying to visualize billions of data points was out of the scope of what most people do.
In fact, the author of the thread later explained that as a developer working with physicists in designing high-power supplies, they had the need to understand the effects of active components at high frequencies.
Now, there are several learnings that we can extract from this real-world scenario. The first is why to visualize billion data points or in what cases would you have to do so?
Just as in the user's case, there are several demanding industries that require to process of large amounts of data that go beyond the scope of what most people need to visualize.
For instance, medical visualization is one of those demanding industries that require specific technical characteristics and types of charts that can process massive amounts of data.
Medical visualization is one of those industries that can use data visualization in almost every process from administration or healthcare data monitoring to research, molecular analysis, or medical devices development.
Where solutions as Power BI can be useful for visualizing thousands of data points for medical administration data, charting components are a must when visualizing complex data for high-end analyses.
Another real-world case is in motorsports where billions of data points are transmitted in real-time from high-performance (F1, Nascar, etc.) vehicles to data logging systems that are processing the data instantly and support racing engineers in decision-making.
Data Logger Application
The list of industries that need to visualize billions of data points could go on...vibration analysis, industrial automation, satellites, 3D mobile mapping, and so on... But the second learning to highlight from the Reddit post is the consequences of implementing workarounds.
The Reddit user shares that fast-changing properties with high sampling rates may take hours to render and temporary solutions such as scaling down the datasets result in the loss of information, something that often happens with downsampling methods as well.
The third learning takes us to assess the charting solutions that the user was trying to implement. For instance, the user implemented Matplotlib, Plotly with WebGL and d3.js, and in the user's own words: "all solutions seem to struggle with scaling up to this [billion data points] size".
Now, those are good solutions for data visualization but they are not high-performance oriented which takes us to define what is high-performance data visualization? or what are the characteristics of high-performance charts?
High-Performance in Data Visualization
Let's start with the definition of high-performance which refers to something "better, faster, or more efficient than others" and in the context of data visualization or charts (a.k.a. charting controls or components), a high-performance data visualization would refer to those charting controls that are better, faster, and more efficient than other solutions.
We identify 6 characteristics of high-performance charting libraries, their impact, and how they are used in the real world.
|High performance enables:||Why?||Real-world use case:|
|Streaming live charts with high data rates.||
Visualization happens in real-time supporting smoothly scrolling charts and up to thousands of data feeds at the same time.
|Industries using it: vibration research, machine condition monitoring, instrumentation, industrial automation, medical applications (ECG, EEG), seismic monitoring, Fintech, etc.|
Very high refresh rate.
|A chart can update 100 times/sec instead of 1 time/minute. It doesn’t twitch providing a more pleasant visualization and interaction.||Applications need to show dynamic 2D or 3D charts with the lowest possible latency.|
Allows visualizing data instantly without presenting delays in the data streaming. Supports effective and real-time decision-making with minimal lags.
Industries using it: racing telemetry systems, aviation, trading, medicine, etc.
|Full accuracy data.||All the data can be visualized with full precision without reducing the size of the datasets using workarounds like downsampling which can lead to losing valuable information.||In medical visualization, Electrocardiograms (ECG) data typically streams 1000 data points/sec. Reducing visualized values by downsampling loses vital information.|
|To reduce resources consumption (energy, work, time).||
Intelligent algorithms reduce the requirement of computing power and consumption of energy. No super-computers are needed for a high-performance visualization.
Instant visualization of big and complex datasets saves work time remarkably.
Visualizing a dataset with a traditional chart can take hours, while the CPU is 100% occupied with max energy consumption.
Remarkable saving of energy and work time.
|A better user experience.||More appealing charts look and feel, and interactive usage.||All applications benefit.|
That was an extensive but necessary recap on what is determined as high-performance in data visualization as not all data visualization tools are high-performance oriented but rather oriented to more basic uses.
Now, let's talk about downsampling, the opposite of delivering only high-performance, and one of the most common workarounds for visualizing million/billion data points. Within the scope of the so-called "high-performance" charting tools, the problem of downsampling is widely common as is often implemented in order to render large data sets but, what is downsampling?
Downsampling is a technique where only part of the data is processed, for instance, visualizing only every 100th data point and discarding the rest 99% of the data, resulting in a huge loss of information.
First, here's an example how a dataset looks like when all its data have been rendered normally:
Contrarily, here's how a dataset looks like when it has been downsampled and a lot of information has been lost:
So, consider that when a dataset contains 1,000,000 points and a downsampling factor of 100 has been applied, only 10,000 data points will be processed. The loss of information would be huge and real-time applications that depend on live data streaming as real-time monitoring in medical visualization or racing telemetry systems will be displaying charts and analyses with incomplete information.
Basic use cases may support the use of downsampling but we're talking about high-performance data visualization for industries such as medical visualization, motorsports, telecommunication, Fintech, etc. which need to process all the incoming data. In this case, the real-world application is more demanding and simple solutions, such as open-source libraries or simple data visualization tools, cannot get the job done.
Analyzing the real-world case scenario of the Reddit user, we can spot that the user was struggling to find a solution that could visualize billions of data points without having to apply the downsampling technique and without giving up high-performance features such as zooming and panning. How to solve the problem?
A charting library is a comprehensive collection of different charts that developers can integrate into applications and are built to process millions and billions of data points while rendering the data into easy-to-understand visualizations or dashboards.
Creating visualizations and dashboards supports an organization's decision-making process and also allows live streaming of data, processes monitoring, and analyses. Depending on the industry and the level of insight required, the use of data visualization will vary.
Within the past years, the demand for data visualization tools (charting libraries, BI tools, infographics) has increased as a consequence of the constant growth of data generated from multiple sources and the valuable information that professionals can get from their data.
Charting libraries, for instance, require another level of expertise as they can only be integrated into applications by developers but the visualizations can be consumed by any type of end-user the visualization is aimed for.
Here's an example of an interactive Electrocardiogram for JS:
JS Electrocardiogram (ECG/EKG)
A JS chart can be visualized in web and mobile applications therefore the cross-platform compatibility makes a JS charting library a powerful visualization tool for almost all devices. Some key characteristics that make a high-performance JS charting library are:
➡️ Rendering technology. WebGL rendering is the most advanced rendering technology for visualizing 2D and 3D charts for compatible browsers. It is fully compatible with GPU acceleration.
Despite various libraries featuring WebGL technology, unlocking the full potential of visualizations is still limited to other variables such as resource consumption and algorithms.
➡️ GPU acceleration. Charting components featuring GPU acceleration are more efficient as the graphics processing unit (GPU) works along the device's CPU to speed up the processing time of different tasks by allocating them to separate processing units. In practice, GPU-accelerated visualizations are processed faster, one of the secrets of high-performance charting controls.
➡️ Library collection. A charting library needs to feature a wide collection of visualizations, interactive examples, and visualization types for dedicated purposes or industries. For example, LightningChart JS features more than 100 interactive visualization examples.
➡️ Algorithms. As mentioned before, high-end technologies such as WebGL rendering and GPU acceleration don't necessarily guarantee high performance without intelligent algorithms.
An intelligent algorithm involves innovation and conveys different techniques that solve data visualization challenges more efficiently than the currently available technologies.
➡️ CPU & resource consumption. In data visualization, average charting libraries tend to have an extremely high CPU resource consumption. In such a case, the full resources availability of the device is being compromised to work in only one process which leaves no more resources available for any other tasks.
For instance, an extremely high resource consumption would not allow the user to interact with the chart, and the system will most likely freeze. This is a serious topic that affects industries and end-users that need fast responses from their applications.
An application will always benefit from charting components that consume minimum CPU resources and are not a nuisance to process other tasks.
➡️ Frames Per Second (FPS)/Refresh rate. Why is the refresh rate an important factor in data visualization? Generally in graphics processing, the refresh rate is what determines how smooth the graphics will be visualized.
No one enjoys low-quality visualization experiences and is recommended that charting libraries render at 40 FPS, as a minimum.
➡️ Load-up speed. Similar to how high-performance cars measure their accelerations times from 0-100 km in just a couple of seconds, charting components also measure their rendering time but in milliseconds.
Therefore, the load-up speed measures once initiated the rendering process, how many milliseconds does it take for a chart to be fully visible for the user. The faster the load-up time, the better the user experience.
➡️ The maximum amount of data visualized. Simply put, how many data points can a data visualization feature or render before the application crashes? The visualization capacity will vary from technology to technology or by chart type.
For instance, the maximum amount of data that can be visualized by the highest-performing JS Surface charts using a standard low-end device are:
|Chart type||Max. amount of data (low-end device)|
|Static Surface||144 million|
|Refreshing Surface||16 million|
➡️ Incoming data points. This is a parameter that will only be present in certain types of charts and represents the amount of incoming data per second that the chart can handle. This is particularly important when running demanding applications, if the chart is not able to take all the incoming data, the application will run out of memory or crash.
You can take the parameters of LightningChart JS' Appending Surface charts as a good reference of high-performance:
|Chart type||Incoming data points per second|
That being said, the performance of a JS chart will always depend on the chart type (Surface, heatmaps, line charts, etc.), the JS charting library, and the device's characteristics.
See more about JS charts high performance
.NET Charting Libraries
Similar to a JS charting library, a .NET charting library also contains an extensive collection of chart types and visualizations but is written to run in .NET framework-developed applications. Developing applications with .NET also harness the power of cross-platform compatibility.
As mentioned by Microsoft's official "What is .NET?" release note, a .NET application can be written using C#, F#, or Visual Basic. There are three different .NET implementations:
- .NET for cross-platform application development compatible with web servers and console applications.
- .NET framework that supports multiple services on Windows-native applications.
- Xamarin implementation for running applications on different mobile OS.
Here's an example of a .NET 3D spectrogram commonly used in the audio engineering industry:
In the case of data visualization, there are several .NET charting libraries, both open-source and commercial that focus on delivering charting components that can be integrated into .NET applications. Some key characteristics that make a high-performance .NET charting library, are:
➡️ GPU acceleration. Regardless of the programming language, a high-performance data visualization library should always aim for maximizing the device's resources and delivering fastly rendered visualizations. Some of the benefits of GPU acceleration are:
- real-time monitoring applications
- high-resolution visualizations
- smooth interactivity without rendering delays or flickering
- efficient device's resource-management (⚠️important for developers working with high-performance applications⚠️)
➡️ Fallback rendering. In order to always deliver the highest performing data visualizations, a high-performance charting library must have fallback rendering availability. For instance, LightningChart .NET uses DirectX11/DirectX9 WARP software rendering whenever GPU rendering is not available.
➡️ Library collection & functionality. Data visualization libraries should cover a variety of both 2D and 3D charts that can be implemented in any demanding industry.
Most commonly, the examples included in the libraries might be XY charts, spectrograms, heatmaps, line charts, surface, scatter charts, medical visualizations, smith charts, polar charts, etc.! As a reference, highly detailed .NET plotting libraries have at least 100 visualizations examples.
➡️ .NET compatibility. A high-performance charting library can be written primarily to support the .NET framework and additionally support other .NET versions such as .NET Core 3.0, .NET 5, and the latest released .NET 6.
➡️ UI features & interactivity. Interactive visualizations support how end-users interact with their data, analyses, and ultimately knowledge. Often, charting libraries (both open-source and commercial) struggle to maintain performance while delivering interactivity.
High-performance data visualization solutions combine both. Interactivity is delivered at the same time performance is delivered. As a good reference, a visualization/chart/graph must have a full-mouse interaction, touchscreen support, and if possible, customizable mouse functions.
➡️ The maximum amount of data visualized. Similar to JS charts, the maximum amount of data that can be rendered in .NET charts will depend on its type, the charting library, and the resources available. For instance, the highly-advanced SampleDataBlockSeries is a line series visualization that visualizes up to 16 billion data points.
The way how the SampleDataBlockSeries works is by storing data as memory blocks in which new incoming data is stored as memory blocks. The result is much lower consumption of memory and CPU resources.
The type of applications that benefit when using such advanced algorithms in their visualizations are, for instance, medical monitoring applications, vibration monitoring, or data logger systems.
This type of advanced visualizations and high-performance charting libraries would have been the solution for the Reddit user who needed a massive visualization capacity without compromising performance!
The SampleDataBlockSeries is the highest-performing line series visualization ever made to handle real-time applications built to process an extremely high number of data points with the least CPU and memory consumption.
Developed by LightningChart .NET, the SampleDataBlockSeries (SDBS) allows adding new samples by using the AddSamples method:
// Add samples to the end.
Other useful properties of the SampleDataBlockSeries is the PointCount property that can be used to get the current total number of data samples in the line series visualization:
// Get the total number of samples.
int samplesCount =_chart.ViewXY.SampleDataBlockSeries.PointCount;
Here's an example of the SampleDataBlockSeries line series for .NET that showcases 16 billion data points:
LightningChart® Data Visualization Library for Desktop, Web & Mobile Application Development
A good charting component may have WebGL technologies and even GPU accelerated charts but may still not be high-performance. Why is that?
Mainly, the development of intelligent algorithms oriented to solve the most difficult engineering problems is what can determine how to effectively use the high-performance of world-class technologies such as GPU acceleration and DirectX 11 & DirectX 9 support.
Both LightningChart .NET and LightningChart JS have innovated in the development of high-performance algorithms that cover an extensive number of applications within scientific, engineering, industrial, motorsports, telecommunications (among other) fields.