Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Collections.Immutable performance impact on Actor create / terminate #1766

Closed
Aaronontheweb opened this issue Mar 9, 2016 · 6 comments

Comments

@Aaronontheweb
Copy link
Member

So the PRs implemented to resolve #1676 appear to have had a negative performance impact, as measured by NBench.

Before

Akka.Tests.Performance.Actor.ActorMemoryFootprintSpec+ReceiveActorMemoryFootprint

Measures the amount of memory used by 10,000 ReceiveActors
2/18/2016 7:31:02 PM

System Info

NBench=NBench, Version=0.1.5.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=4
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=4

NBench Settings

RunMode=Iterations, TestMode=Measurement
NumberOfIterations=13, MaximumRunTime=00:00:01

Data


Totals

Metric Units Max Average Min StdDev
TotalBytesAllocated bytes 28,008,832.00 26,902,120.00 25,250,416.00 843,641.85
[Counter] ActorCreateThroughput operations 10,000.00 10,000.00 10,000.00 0.00

Per-second Totals

Metric Units / s Max / s Average / s Min / s StdDev / s
TotalBytesAllocated bytes 69,230,977.62 54,114,108.66 43,021,475.73 9,931,200.88
[Counter] ActorCreateThroughput operations 24,717.55 20,062.96 16,390.97 3,260.75

Raw Data

TotalBytesAllocated

Run # bytes bytes / s ns / bytes
1 27,785,784.00 65,412,083.32 15.29
2 26,527,208.00 43,496,367.60 22.99
3 26,171,392.00 43,021,475.73 23.24
4 27,783,040.00 65,285,911.84 15.32
5 26,294,304.00 43,098,902.10 23.20
6 26,784,064.00 51,933,288.84 19.26
7 25,250,416.00 56,717,526.66 17.63
8 27,330,832.00 45,403,462.19 22.02
9 26,174,512.00 49,001,863.51 20.41
10 28,008,832.00 69,230,977.62 14.44
11 26,903,016.00 51,317,600.50 19.49
12 26,706,744.00 50,842,175.48 19.67
13 28,007,416.00 68,721,777.24 14.55

[Counter] ActorCreateThroughput

Run # operations operations / s ns / operations
1 10,000.00 23,541.56 42,478.06
2 10,000.00 16,396.89 60,987.18
3 10,000.00 16,438.36 60,833.32
4 10,000.00 23,498.48 42,555.95
5 10,000.00 16,390.97 61,009.22
6 10,000.00 19,389.62 51,573.98
7 10,000.00 22,462.02 44,519.60
8 10,000.00 16,612.54 60,195.48
9 10,000.00 18,721.21 53,415.34
10 10,000.00 24,717.55 40,457.08
11 10,000.00 19,075.04 52,424.54
12 10,000.00 19,037.20 52,528.72
13 10,000.00 24,536.99 40,754.79

After

Akka.Tests.Performance.Actor.ActorMemoryFootprintSpec+ReceiveActor_memory_footprint

Measures the amount of memory used by 10,000 ReceiveActors
3/8/2016 8:24:26 PM

System Info

NBench=NBench, Version=0.1.5.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=4
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=4

NBench Settings

RunMode=Iterations, TestMode=Measurement
NumberOfIterations=13, MaximumRunTime=00:00:01

Data


Totals

Metric Units Max Average Min StdDev
TotalBytesAllocated bytes 29,654,304.00 27,118,249.85 24,547,896.00 1,541,224.07
[Counter] ActorCreateThroughput operations 10,000.00 10,000.00 10,000.00 0.00

Per-second Totals

Metric Units / s Max / s Average / s Min / s StdDev / s
TotalBytesAllocated bytes 42,763,484.96 40,006,147.21 33,907,384.29 2,906,162.54
[Counter] ActorCreateThroughput operations 16,743.36 14,769.52 12,760.13 1,014.47

Raw Data

TotalBytesAllocated

Run # bytes bytes / s ns / bytes
1 28,114,184.00 41,539,661.30 24.07
2 25,325,848.00 36,981,834.31 27.04
3 28,610,080.00 42,763,484.96 23.38
4 27,992,736.00 41,929,710.56 23.85
5 26,663,536.00 38,547,409.34 25.94
6 29,654,304.00 42,292,005.23 23.65
7 25,928,976.00 41,496,958.26 24.10
8 24,547,896.00 41,101,418.20 24.33
9 27,938,712.00 42,204,372.24 23.69
10 25,504,792.00 33,907,384.29 29.49
11 28,258,064.00 42,567,665.91 23.49
12 28,010,536.00 35,741,797.82 27.98
13 25,987,584.00 39,006,211.25 25.64

[Counter] ActorCreateThroughput

Run # operations operations / s ns / operations
1 10,000.00 14,775.34 67,680.34
2 10,000.00 14,602.41 68,481.86
3 10,000.00 14,947.00 66,903.06
4 10,000.00 14,978.78 66,761.10
5 10,000.00 14,456.98 69,170.76
6 10,000.00 14,261.68 70,117.99
7 10,000.00 16,004.09 62,484.04
8 10,000.00 16,743.36 59,725.18
9 10,000.00 15,106.06 66,198.62
10 10,000.00 13,294.52 75,218.99
11 10,000.00 15,063.90 66,383.87
12 10,000.00 12,760.13 78,369.13
13 10,000.00 15,009.56 66,624.22
@Aaronontheweb
Copy link
Member Author

The impact of this is pretty visible on PRs like #1765; calling the correct shutdown method was going to add overhead no matter what (which is what that PR was for) but the 26% decrease in throughput for adding and terminating children is going to sting.

We can either just "eat" the overhead and accept that shutdowns and creating lots of siblings under one actor are going to be more expensive on average... Or we can revert back to the previous hand-rolled immutable collections for the ChildContainer (which we know had bugs and some of its own performance issues)... Or we can try a third option, using something like red/black trees.

In general I think it makes a lot of sense to keep a reference to System.Collections.Immutable within the root project - tons of other modules take it as a dependency and it's useful in defining messages with immutable collections on them anyway. But there's definitely a real performance cost beyond what we had before with the hand-rolled ImmutableAvlTree and I think it's worth discussing that before we include that in a release.

My vote, for what it's worth: we should pick a different strategy than the built-in System.Collections.Immutable. It's performance is unwieldy even at relatively small numbers of siblings (tens of thousands.) We should be able to support an order of magnitude more than that.

@JeffCyr
Copy link
Contributor

JeffCyr commented Mar 9, 2016

@Aaronontheweb I think the problem is here

_actor.SupervisorStrategyInternal.HandleChildTerminated(this, child, GetChildren());

The parent is calling GetChildren() for each of its child when it is in the terminating state.

I think the SupervisorStrategy should not be called when the parent is terminating, what do you think?

@Aaronontheweb
Copy link
Member Author

Updated numbers from #1778

Akka.Tests.Performance.Actor.ActorMemoryFootprintSpec+ReceiveActor_memory_footprint

Measures the amount of memory used by 10,000 ReceiveActors
3/11/2016 7:21:51 AM

System Info

NBench=NBench, Version=0.1.5.0, Culture=neutral, PublicKeyToken=null
OS=Microsoft Windows NT 6.2.9200.0
ProcessorCount=4
CLR=4.0.30319.42000,IsMono=False,MaxGcGeneration=2
WorkerThreads=32767, IOThreads=4

NBench Settings

RunMode=Iterations, TestMode=Measurement
NumberOfIterations=13, MaximumRunTime=00:00:01

Data


Totals

Metric Units Max Average Min StdDev
TotalBytesAllocated bytes 26,819,080.00 25,156,843.69 24,004,848.00 883,339.10
[Counter] ActorCreateThroughput operations 10,000.00 10,000.00 10,000.00 0.00

Per-second Totals

Metric Units / s Max / s Average / s Min / s StdDev / s
TotalBytesAllocated bytes 45,138,386.03 37,806,156.54 31,624,124.12 4,249,110.43
[Counter] ActorCreateThroughput operations 18,452.58 15,046.79 13,174.06 1,813.49

Raw Data

TotalBytesAllocated

Run # bytes bytes / s ns / bytes
1 25,860,232.00 35,620,532.06 28.07
2 25,304,016.00 34,788,416.88 28.75
3 24,658,816.00 34,976,136.08 28.59
4 24,461,832.00 45,138,386.03 22.15
5 24,004,848.00 31,624,124.12 31.62
6 24,509,984.00 42,148,418.59 23.73
7 25,355,896.00 34,850,519.17 28.69
8 25,069,928.00 36,960,965.45 27.06
9 26,819,080.00 42,371,488.10 23.60
10 26,459,424.00 36,470,087.33 27.42
11 24,239,056.00 36,451,467.81 27.43
12 24,433,424.00 44,514,212.67 22.46
13 25,862,432.00 35,565,280.77 28.12

[Counter] ActorCreateThroughput

Run # operations operations / s ns / operations
1 10,000.00 13,774.25 72,599.23
2 10,000.00 13,748.18 72,736.90
3 10,000.00 14,184.03 70,501.83
4 10,000.00 18,452.58 54,192.97
5 10,000.00 13,174.06 75,906.76
6 10,000.00 17,196.43 58,151.61
7 10,000.00 13,744.54 72,756.15
8 10,000.00 14,743.15 67,828.12
9 10,000.00 15,799.01 63,295.11
10 10,000.00 13,783.40 72,551.03
11 10,000.00 15,038.32 66,496.79
12 10,000.00 18,218.57 54,889.04
13 10,000.00 13,751.72 72,718.20

@Aaronontheweb
Copy link
Member Author

My take: the changes in #1772 don't impact the throughput of creating children very much, but they have a huge impact on the time it takes to shut them down. Shaved a solid 8 minutes off of the benchmark run time - most of that time doesn't get counted inside the benchmarks themselves because it occurs during the cleanup phase.

Aaronontheweb added a commit that referenced this issue Mar 11, 2016
Issue #1766 - Lazy evaluation of ChildrenContainer.Children and ChildrenContainer.Stats
@MrTortoise
Copy link

I had actually implemented a red-black tree just prior to picking this issue up (straight out of intro to algorithms). Whilst its tested it has no performance tests. Moreover I think the problem here is likely to be the builder pattern the collections.immutable uses. I think building some tests that compare the collections in isolation would be a good starting point? The old avl looked quite cannonical

@JeffCyr
Copy link
Contributor

JeffCyr commented Mar 23, 2016

@MrTortoise We found out that the real issue was that the tree was copied to a List each time ChildrenContainer.Children was accessed.

PR #1772 fixed that and the ImmutableDictionary doesn't seem to be a performance bottleneck anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants