SciChart® the market leader in Fast WPF Charts, WPF 3D Charts, and iOS Chart & Android Chart Components
Hi
I have experienced some performance issues when accessing YMin & YMax properties of data series containing lots of NaN’s.
I have a chart with the following setup:
XAxis = DateTime / YAxis = Numeric
1 FastLineRenderableSeries series / XyDataSeries<DateTime, double>
I have prefilled the data series with 1 million datapoints. Some of the values are randomly made NaN’s.
If 1/2 of the Y values are NaN’s – accessing YMin or YMax property takes ~117ms.
If 1/10 of the values are NaN’s it “only” takes ~27 ms.
…and if no NaN’s are added to the data series – it takes ~3ms;
This is a huge problem for us – as we have large data series including lots of NaN’s – and we must access the YMin / YMax each time data is added to do special scale fitting. This is normally once per second – but with up to 20 data series – (117ms * 2 *20 ) it takes over 4 seconds.
If I have 10 million datapoints – it takes 10 times as long – so it looks like it is recalculating the min/max each time.
/Flemming
Hi Flemming,
You always ask good questions! 🙂
DataSeries.YMax/YMin is implemented as follows
/// <summary>
/// Gets the computed YRange, which is an <see cref="IRange"/> wrapping YMin and YMax properties
/// </summary>
/// <remarks>Note: The performance implications of calling this is the DataSeries will perform a full recalculation on each get. It is recommended to get and cache if this property is needed more than once</remarks>
public virtual IRange YRange
{
get
{
lock (SyncRoot)
{
TY min, max;
_yColumn.GetMinMax(out min, out max);
return RangeFactory.NewRange(min, max);
}
}
}
/// <summary>
/// Gets the computed Minimum value in Y for this series
/// </summary>
public IComparable YMin { get { return YRange.Min; } }
/// <summary>
/// Gets the computed Maximum value in Y for this series
/// </summary>
public IComparable YMax { get { return YRange.Max; } }
Notice the remarks on YRange getter.
Now _yColumn.GetMinMax is a highly optimised routine using templates and unsafe code however … if there are NaN present in the array then we have to do extra calculations to determine what is the min & the max by testing each element for NaN first. As a result, it is slower.
How many data-points are we talking about? I’m guessing a lot.
Why are you calling YMin/Ymax (to what end) and is there another way you can avoid calling these too often? (For example, if you append one point, you don’t need to call YMin/Ymax but can just test new point vs. the existing min-max)?
Best regards,
Andrew
Update:
We used to have an implementation in an older version of SciChart where YMin, YMax was computed whenever the series was updated. I think you could reproduce it if you did something like this:
public class XyDataSeriesCalculateOnAppend<TX> : XyDataSeries<TX, double>
where TX : IComparable
{
private DoubleRange _yrange;
public XyDataSeriesCalculateOnAppend()
{
ResetYRange();
}
private void ResetYRange()
{
_yrange = new DoubleRange(double.MaxValue, double.MinValue);
}
public override IRange YRange => _yrange;
public override void Append(TX x, double y)
{
_yrange.Min = Math.Min(_yrange.Min, y);
_yrange.Max = Math.Max(_yrange.Max, y);
base.Append(x, y);
}
protected override void ClearColumns()
{
ResetYRange();
base.ClearColumns();
}
}
Notice you would need to override methods that you use, such as Append, Insert, Remove, Clear and reset and update the YRange. If you do the calculation on append or update, then you will take a small performance hit on updating the series, but the getter for YMin, YMax will be instantaneous.
If you’re appending to your dataseries in a background thread, dont forget to lock(SyncRoot) around critical operations
public class XyDataSeriesCalculateOnAppend<TX> : XyDataSeries<TX, double>
where TX : IComparable
{
private DoubleRange _yrange;
public XyDataSeriesCalculateOnAppend()
{
ResetYRange();
}
private void ResetYRange()
{
_yrange = new DoubleRange(double.MaxValue, double.MinValue);
}
public override IRange YRange
{
get
{
lock (SyncRoot)
{
return _yrange;
}
}
}
public override void Append(TX x, double y)
{
lock (SyncRoot)
{
_yrange.Min = Math.Min(_yrange.Min, y);
_yrange.Max = Math.Max(_yrange.Max, y);
base.Append(x, y);
}
}
protected override void ClearColumns()
{
lock (SyncRoot)
{
ResetYRange();
base.ClearColumns();
}
}
}
Let me know if this helps!
Best regards,
Andrew
Hi Andrew
Thank you for your answer.
In our real world scenario, we log data for scientific experiments and we do that typically once per second. Some experiments can take weeks and we want to be able to view at least a few days of data. Some graphs have up to 20 data series – so for 5 days of data logging we will have 606024520 = 8640000 data points – we don’t know how many NaN’s an experiment will generate.
I know that there are several work-arounds – as you mention we can track if new data impact the min/max values. It gets a little more complicated when we remove data again – as the min/max in some cases will have to be recalculated.
But what I was hoping for – was the that you had some super optimized algorithm that took everything into account and only did the recalculations when necessary 😉
I can also remove consecutive NaN’s – but keep at least one – so the gaps are preserved – but the effect will depend a bit on the distribution of the NaN’s.
I want to use the YMin/ YMax in order to do my own autoscale calculations. I have also tried to implement a custom Viewport manager (deriving from DefaultViewportManager) and overriding OnCalculateNewYRange. When calling CalculateYRange (renderPassInfo) on the y axis there seems to be the same performance problem.
I am not sure I understand why the performance penalty is so hard – do you not just have to skip any NaN’s in your calculations?
Update:
I see your new response and will give it a try 🙂
/Flemming
Hi Andrew
You said: “To skip a NaN you have to test for NaN using double.IsNaN(d). When you call this 10,000,000 times it is a bottleneck.”
– you are right.. that will take some time. 🙂 – but what I do not understand is why the time varies with the amount of NaN’s if all y values are tested?
I will try doing the min/max the way you have described 🙂
But I might have been too focused on the time it took accessing the YMin/YMax properties as being responsible the performance problems. I have since removed all code related to this issue and are still having major performance issues when zooming and panning the chart.
I have a chart with the following setup:
XAxis = DateTime / YAxis = Numeric
4 FastLineRenderableSeries series / XyDataSeries<DateTime, double>
I have prefilled each data series with 1 million datapoints. Some of the values are randomly made NaN’s. No data is added realtime.
If 50% of the Y values are NaN’s – the UI gets very unresponsive.
If lower than ~5% or higher than ~95% of the Y values are NaN’s – the UI gets acceptable response times – so this is a bit strange.
Again – no NaN’s and the chart works great.
I have attached a test project.
/Flemming
Hi Andrew
Did you have the chance to look at this issue ?
/Flemming
Update:
After deeper investigation of the provided sample we have found a solution. Please see the blog post below with youtube video showing how we solved it!
Performance debugging: Improving the speed of charts when many NaN
Best regards,
Andrew
Please login first to submit.