Svelte Correlation Matrix
This is an experiment in data density and client-side performance.
It renders a vast, hierarchically clustered correlation matrix of time series data. All statistical computations, including time series detrending, correlation coefficient computation, and hierarchical clustering, are performed in a web worker on the client. Please note that although the data is currently simulated, the computations are actually being performed. The experiment uses a 2D virtualization approach to dynamically render only visible matrix cells to the DOM, which significantly reduces the DOM size.
Correlation
The Pearson correlation coefficient is used to determine the correlation. It ranges from -1 to +1. A value of -1 indicates a perfect negative correlation, i.e. the two correlated time series move against each other. A value of +1 indicates a perfect positive correlation, i.e. the two correlated time series move together.
A value of +0.00 indicates that a positive value less than 0.005 has been rounded to two decimal places. A value of -0.00 indicates that a negative value greater than -0.005 has been rounded to two decimal places.
Correlation Matrix
Please note that some of the features described in the following are only available in the full-screen UI of the matrix experiment and not in the preview on the left.
Hovering over a matrix cell displays the rolling correlation in the chart below. Clicking on a matrix cell permanently selects it and allows interaction with its rolling correlation chart. To deselect a cell, click the selected cell again, double-click anywhere in the matrix, or use the Esc key.
The detrend data setting removes trends from the time series data via first-differencing.
The correlations can be sorted in ascending or descending order using hierarchical clustering (Cluster), the average correlation (Avg.Corr) of each time series with all other time series, or the variable's name (Name).
To quickly navigate the matrix, keys 1, 2, 3, 4 can be used to move to the edges of the corresponding quadrants.
Minimap
On laptop screens and larger, the minimap allows pattern detection across the entire correlation matrix. It is particularly useful when the matrix is ordered by hierarchical clustering or average correlation. The minimap can be used for click-and-drag navigation. The white overlay rectangle shows the currently visible subregion of the matrix.
Rolling Correlation Chart
The chart visualizes two time series and how the correlation between them changes over time. The rolling correlation is a channel consisting of three lines: The correlation coefficient in the middle and the upper bound of the confidence interval above and the lower bound of the confidence interval below. The correlation channel is always plotted on the y-axis from -1 to +1, even if its minimum and maximum values don't reach these values, to make it easier to compare correlations. The time series data and correlation channels for all time series are plotted on the same time axis, even if both time series have missing data.
If a matrix cell is selected by click or touch, the rolling correlation chart can be interacted with. When the chart is hovered, a legend appears in the top left corner and a blue overlay rectangle appears where the cursor has horizontally intersected the chart — indicated by the vertical blue line to the right of the rectangle.
The legend provides information about the time of the intersection point, the values of the two correlated time series, and the rolling correlation over the selected window width and its upper and lower confidence interval (CI) bounds given by the selected CI level.
The blue overlay rectangle visualizes the window width of the rolling window that slides along the two time series to compute the rolling correlation. Higher window width values result in a wider overlay rectangle, a smoother correlation channel, and narrower confidence bands. Higher values are better for detecting long-term changes, while lower values are better for detecting short-term changes in the correlation.
The CI level controls the vertical expansion of the confidence interval band. Higher values result in a wider band. If the upper and lower bounds of the confidence interval are either both positive or negative, the rolling correlation is statistically significant at the CI level over the time range given by the window width. If one is positive and one is negative, the rolling correlation is not statistically significant at the CI level over the time range given by the window width. The confidence interval band is narrower for correlation coefficients closer to -1 and +1.
The chart always displays the non-detrended time series data, but the correlation channel is computed on the detrended time series data, if detrend data is checked.
Leading gaps in the correlation channel are caused by the width of the rolling window and detrending (one additional missing leading data point if detrend data is checked). Intermediate gaps are caused by missing time series data or time series data with insufficient variance to compute the rolling correlation within the window width.
Status Bar
The status bar at the bottom displays various information, including the average correlation (avg. corr) and median correlation (med. corr) of all time series. The average and median correlation help to contextualize the correlation coefficients in the matrix. Note how the correlated data points decrease slightly when detrend data is checked, as detrending produces one missing leading data point per time series.
The term pair-wise complete refers to pair-wise complete observations, which simply means that the two correlated time series both have an existing data point at a given point in time. If one or both would not have a data point at a given point in time, the observation would not be pair-wise complete. Trivially, only pair-wise complete observations can be correlated.
Besides Svelte, the only dependency of this
experiment is ml-hclust.