Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] JB: fixes and adjusments #2337

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jchavanton
Copy link
Contributor

@jchavanton jchavanton commented Dec 18, 2023

  • [Core] JB: if the jitter buffer size is above the its max size, we force accelerate

  • [Core] JB: proper handling of seq roll-over

  • [Core] JB: fix seq decrement

  • [Core] JB: add buffering skip counter

  • [Core] JB: reset when expanding above the maximum size

  • [Core] JB: disable skip buffering when acceleration is active

These fixes and the adjustment were thoroughly tested, they were found while testing the packet relay metrics. #2292

A/B testing with 100 sessions and no artificial jitter, to make sure everything is working well under normal conditions.

  • less packets are replaced by PLC or simply skipped with the default JB
  • less delays is introduced

EJB : more elastic jitter buffer
image

The packet relay metrics are still not ready for merge but I am able to use them to precisely monitor the jitter buffer in production.


Here is one report of real traffic facing jitter, I selected it randomly from production traffic.
We can use it as a source of validation evidence.
Configuration:

<action application="set" data="rtp_jitter_buffer_accelerate=true"/>
<action application="set" data="jitterbuffer_msec=20:1200"/>

End of call report:

{
  "rp": {
    "call_id_in": "rt342cma1dgelvphc5pb",
    "call_id_out": "c454bbd8-4e26-4049-85bb-20569a0b2990",
    "report": {
      "in": {
        "ssrc": "0x95262C82",
        "remote_socket": "172.11.61.245:18550",
        "local_socket": "172.11.61.245:35930",
        "codec": "opus",
        "count": 43181,   // ~15 minutes call
        "plc": 732   //  1.6% PLC
      },
      "out": {
        "ssrc": "0xA95A90EE",
        "remote_socket": "172.11.50.19:34402",
        "local_socket": "172.11.61.245:13842",
        "codec": "PCMU",
        "count": 43181,
        "max_ms": 1060,   // this is the maximum time spent by any packet inside FS, we can see close to matching the JB max size.
        "avg_ms": 74.57   // this is the average time spent by any packet inside FS
      }
    }
  },
  "jb": {
    "in_call_id": "rt342cma1dgelvphc5pb",
    "out_call_id": "c454bbd8-4e26-4049-85bb-20569a0b2990",
    "jb": {
      "size_max_ms": 1220,   // we can see that the maximum jitter buffer size was respected
      "size_est_ms": 20,
      "acceleration_ms": 13680,
      "expand_ms": 13580,
      "fast_acceleration_ms": 0,
      "forced_acceleration_ms": 60,   // at one point the jitter buffer was above its max size
      "jitter_max_ms": 140,   // very high jitter was seen during that call
      "jitter_est_ms": 8,   // near the end there was not that much jitter
      "reset": 7,
      "reset_too_big": 0
    }
  }
}

@signalwire-ci
Copy link

signalwire-ci bot commented Dec 18, 2023

@signalwire-ci
Copy link

signalwire-ci bot commented Dec 18, 2023

@dragos-oancea
Copy link
Contributor

src/switch_jitterbuffer.c:1741:27: error: 'switch_jb_stats_t' {aka 'struct switch_jb_stats_s'} has no member named 'expand_frame_len'
 1741 |       if (jb->jitter.stats.expand_frame_len < 0) jb->jitter.stats.expand_frame_len = 0;
      |                           ^
src/switch_jitterbuffer.c:1741:66: error: 'switch_jb_stats_t' {aka 'struct switch_jb_stats_s'} has no member named 'expand_frame_len'
 1741 |       if (jb->jitter.stats.expand_frame_len < 0) jb->jitter.stats.expand_frame_len = 0;

@jchavanton , you should fix the build.

@jchavanton
Copy link
Contributor Author

src/switch_jitterbuffer.c:1741:27: error: 'switch_jb_stats_t' {aka 'struct switch_jb_stats_s'} has no member named 'expand_frame_len'
 1741 |       if (jb->jitter.stats.expand_frame_len < 0) jb->jitter.stats.expand_frame_len = 0;
      |                           ^
src/switch_jitterbuffer.c:1741:66: error: 'switch_jb_stats_t' {aka 'struct switch_jb_stats_s'} has no member named 'expand_frame_len'
 1741 |       if (jb->jitter.stats.expand_frame_len < 0) jb->jitter.stats.expand_frame_len = 0;

@jchavanton , you should fix the build.

Thanks !
Yes will test this exact version soon, I am working on a fork to have more metrics.
But I will make sure to test it properly.
I will convert it to a draft for now to avoid confusion.

@jchavanton jchavanton marked this pull request as draft December 19, 2023 13:58
@jchavanton
Copy link
Contributor Author

jchavanton commented Dec 21, 2023

A/B testing on real traffic.

So far I have been to rigorous on not accepting to reset unless I know exactly why but in some case the remote RTP endpoint may behave unexpectedly and I added a reset to deal with exceptional cases. I am expecting that the reset will take good care of them
image

Lets hope I will have the visit of santa 🎅 I think I deserve it 🙂

Edit : 3 hours
image
image

@jchavanton
Copy link
Contributor Author

jchavanton commented Dec 29, 2023

running A/B testing to make sure I am not missing anything
image

@jchavanton
Copy link
Contributor Author

jchavanton commented Jan 9, 2024

After several days of A/B testing of the elastic jitter buffer
Measurements have confirmed that this will :

  • Reduce the delay introduced by the jitter buffer by half (on average 30ms instead of 70ms), while still reducing the amount of packets dropped or missing significantly.

One of the reason why we faced difficulties (that were identified during the ramp-up A/B testing) was that the default jitter buffer is resetting under various conditions some of which are a bit obscure (expecting the unexpected), however even if these resets are damaging the flow of packets they tend to restore the conditions to normal.

Adjustments were made to fine tune the reset on the elastic jitter buffer as well.

image

@jchavanton
Copy link
Contributor Author

image

- add buffering skip counter
- jitter buffer reset when too many missed frames
- jb disable when facing too many consecutive missing frames
- fix check_jb_size while expanding
- disable non elactic skip due to buffering
- another check size roll-over issue
- reset when expanding above the maximum size
- buffer size management
- proper handling of seq roll-over and fix seq decrement
- force accelerate when above the max size
@signalwire-ci
Copy link

signalwire-ci bot commented Jan 11, 2024

@signalwire-ci
Copy link

signalwire-ci bot commented Jan 11, 2024

@signalwire-ci
Copy link

signalwire-ci bot commented Jan 11, 2024

@signalwire-ci
Copy link

signalwire-ci bot commented Jan 11, 2024

@jchavanton
Copy link
Contributor Author

jchavanton commented Jan 11, 2024

@dragos-oancea I made a rebase and commit clean up, looks like it is time to review.

My take away on the testing is that this is statistically significant, enough to convince my that the elastic jitter buffer is indeed reducing the impact of degradation caused by jitter without introducing new problems.
As we may have expected, but due to all the little thing that can go wrong, I wanted to make sure.

I think, we can already observe the sample distribution is not that random because A and B are always not that far from each other.

Anyway I can still dig on poor performing calls and investigate the buffer behavior that was seen, to corner case even more unexpected conditions.

But it seems we are in a very good spot !
image

@jchavanton jchavanton marked this pull request as ready for review January 17, 2024 21:41
@jchavanton
Copy link
Contributor Author

@andywolk Hi Andrey, will you or someone else be able to help with the review.

Should I update FS documentation to reflect the new feature and make sure users are aware about it ?

@jchavanton
Copy link
Contributor Author

jchavanton commented Feb 13, 2024

image

Today I am completing the A/B testing, it lasted 30 days and 10M minutes of very long calls (real users) were compared, we have built enough confidence in the value of this improvement.

Considering that the calls have less latency, less missing packets and higher average call duration.
@andywolk hopefully, we can review, document and merge.

@jchavanton
Copy link
Contributor Author

@andywolk I can not make it to cluecon this year, but maybe we could look at merging that MR ?
What do you think is missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants