Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTE & NR Coexistance #422

Open
wants to merge 17 commits into
base: improve_5g_ims
Choose a base branch
from

Conversation

Rashed97
Copy link

@Rashed97 Rashed97 commented Feb 22, 2025

This adds support for IMS routing between devices on LTE and NR.

Calls and SMS should work fine. I have tested SMS but cannot test calls due to still having QoS issues on NR, but according to the logs, it should work.

Additionally it adds some cleanup for the mysql database container. This makes it far more robust.

These shouldn't be hardcoded, but rather use the IP addresses declared in .env
Rework the MySQL container:

- Change the mysql_init.sh to be bind mounted:
  This allows dynamic changes to the script without having to recompile the container.

- Change the way the container data is first set up:
  The current process depends on the container being started the first time
  with an empty docker volume, since the docker volume will then implement a
  "copy on first use" method to populate itself with data. This data is in the
  docker image from the initial compilation, as installing mysql-server also
  initializes the data directory. If for some reason, the docker volume is not
  "empty", then the generic MySQL data directory from the image is never copied
  into it, resulting in an invalid data directory. This adds checks to ensure
  that the data directory is present, and if not, then initialize one before
  trying to use it.

- Change the permission setting:
  The current usermod -d call doesn't guarantee that the data directory is
  owned by mysql:mysql, and therefore leaves open the possibility for the
  data directory to fail to be used by the MySQL daemon. This occurs when
  migrating the volume from one machine to another. Adding the chown -R call
  ensures that the owner is set properly.

- Don't use mysql restart
  Some edge cases exist where restart doesn't actually stop any running mysql
  instances. Use stop and start instead, and also add a kill call to ensure
  that all running mysql instances are fully killed before we make changes.
- Don't use pkill or kill with database processes. These could lead to database corruption
- Make sure to gracefully shutdown the database daemon/services when the script recieves a terminate or interrupt command
- Ensure that mysqld_safe doesn't take over the PID so that interrupt or terminate signals reach the script
- Change the image to use ENTRYPOINT to ensure that signals reach it
- Add health check: Docker will ping the mysql database service every 30sec to ensure it's still up
Currently the P-CSCF can handle LTE-LTE and NR-NR communications.
Cross-technology communications are still a work in progress.
Using the PANI header, we can determine what interface this UE should
register via.

TODO:
- Implement detection for additional network types (ex. for IWLAN)
- Add some sort of database store for network access type, to be used
  in the mt config, as P-CSCF cannot query the network type from the
  device before starting a call
MT is currently broken due to the lack of a PANI header. This needs
to be reconfigured to read out of a database that has updated values
that are written during UE registeration.
Precursor to moving the N5 code to it's own configs
This allows us to have entirely different routes to select from depending on
the technology, versus having to do switching in the middle of the routes.
Required after the route config file split
@herlesupreeth
Copy link
Owner

Thank you very much for your contribution. I will try to review this MR as soon as possible.

One thing which prevents co-existence from working is I believe there should be two instances of SMF and UPF (one handling 4G and another 5G), referring to deploy-all.yaml.

@@ -45,7 +45,7 @@ elif [[ "$COMPONENT_NAME" =~ ^(pcscf-[[:digit:]]+$) ]]; then
/mnt/pcscf/pcscf_init.sh && \
mkdir -p /var/run/kamailio_pcscf && \
rm -f /kamailio_pcscf.pid && \
kamailio -f /etc/kamailio_pcscf/kamailio_pcscf.cfg -P /kamailio_pcscf.pid -DD -E -e
kamailio -M 16 -m 128 -f /etc/kamailio_pcscf/kamailio_pcscf.cfg -P /kamailio_pcscf.pid -DD -E -e
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you let me know why this is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure what the root issue is, but I was seeing weird/random memory freeing issues when including all 5 route configs (register, mo_N5, mo_Rx, mt_N5, mt_Rx). By excluding any one of those configs, the errors would disappear, and my researching it seemed to indicate that it was due to the process running out of memory. Bumping up the default memory to 16M/128M like this resolved the issues.

@@ -75,11 +75,11 @@ server_header="Server: TelcoSuite Proxy-CSCF"
log_facility=LOG_LOCAL0

fork=yes
children=4
children=16
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you let me know why this is increased?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was seeing some weird errors with TCP and HTTP connections after splitting the configs, bumping this (and the TCP processes) up seemed to resolve it, tho I want to see if this was related to the memory issues noted above or not.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my experience upon increasing this value increases the memory requirement thats the reason I kept it at 4 (when compared to sample configuration)


#!ifndef TCP_PROCESSES
# Number of TCP Processes
#!define TCP_PROCESSES 16
#!define TCP_PROCESSES 32
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -96,7 +96,7 @@ enable_tls=yes
#!endif
#!ifndef TCP_PROCESSES
# Number of TCP Processes
#!define TCP_PROCESSES 3
#!define TCP_PROCESSES 12
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -226,6 +226,7 @@ loadmodule "websocket.so"
loadmodule "cdp"
loadmodule "cdp_avp"
loadmodule "ims_qos"
loadmodule "ims_diameter_server"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this if its not used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry I thought I removed this. This was in an attempt to read the MSISDN from the Diameter Sh interface, which would have been my preferred approach vs reading from S-CSCF. I would still like to get this working, but I can remove it for now.

#!endif

# Tables to store users subscription tech (Rx or N5)
modparam("htable", "htable", "sub_tech=>size=4096;autoexpire=UE_SUBSCRIPTION_EXPIRES;")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is no UE_SUBSCRIPTION_EXPIRES defined in this branch (its in the other branch improve_xxx)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be in this branch, since I'm based on the improve_5g_ims branch which has it: c3c907f


# Try retrieving the IMPI from S-CSCF using the MSISDN
$var(msisdn_sub_id) = $ru;
route(GET_IMPI_FROM_SCSCF);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much of a fan of having to contact S-CSCF to fetch IMPI. Rather than this I would suggest parsing all the IMPUs present in P-Associated-URI during the time of registration and associate all of them with either NR or LTE

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't my first approach either, I didn't want to have external connections, but it's the most reliable method I've found. At least in my research, it didn't seem that P-Associated-URI is always present, but perhaps I'm mistaken there? Maybe we could add a check against P-Associated-URI first and then fall back to S-CSCF if P-Associated-URI isn't present?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to have external connections, but it's the most reliable method I've found. At least in my research, it didn't seem that P-Associated-URI is always present, but perhaps I'm mistaken there?

That SIP header is most of the time present (basically first value in this header sets the calling number UE needs to use)
. Say if its not present then we can always use the IMPI (IMSI based used for registration)

$var(imsi) = $tU;
#xlog("L_INFO", "IMSI: $var(imsi)\n");

#!ifdef WITH_RX
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove commented out code for features which may be added in future

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this (attempt at Diameter Sh)

#!endif

#!ifdef WITH_N5
if (!$var(msisdn)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didnt understand why this retrieving of MSISDN is needed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an attempt to fetch the MSISDN to use for checking the subscription in MO and MT routes, since the devices do not use the IMPI then, but this isn't functional on Open5GS due to missing APIs.

}

#!ifdef WITH_RX
#event_route[ims_diameter_server:sh-User-Data-Answer] {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this as well

@Rashed97
Copy link
Author

Thank you very much for your contribution. I will try to review this MR as soon as possible.

One thing which prevents co-existence from working is I believe there should be two instances of SMF and UPF (one handling 4G and another 5G), referring to deploy-all.yaml.

I've not found any issues with running both LTE and NR with a single SMF/UPF pair, but maybe I'm missing something. Everything I've tested on LTE seems to work fine but if you know what I should be on the lookout for, I can test.

Additionally, see above replied to comments

@herlesupreeth
Copy link
Owner

Thank you for addressing the comments. Is it okay if I delay merging this MR once calling in NR is verified?

I've not found any issues with running both LTE and NR with a single SMF/UPF pair, but maybe I'm missing something. Everything I've tested on LTE seems to work fine but if you know what I should be on the lookout for, I can test.

Thats because SMF is deployed in 4G mode when deployed using deploy-all.yaml

@Rashed97
Copy link
Author

Thank you for addressing the comments. Is it okay if I delay merging this MR once calling in NR is verified?

Sure I'm fine with that, but I personally can't do that given my continued issues with the QoS bearer issues (still not sure why it's not working even on a Threadripper 7960X with 256GB RAM). I'll upload a new chain today with some of the cleanup you requested around the commented out code, and I'll test identity retrieval from the P-Associated-URI header like you suggested as well.

Thats because SMF is deployed in 4G mode when deployed using deploy-all.yaml

Are you sure? Looking at smf_init.sh, it looks like the 4G SMF config is only loaded if DEPLOY_MODE is set to 4G: https://github.com/herlesupreeth/docker_open5gs/blob/improve_5g_ims/smf/smf_init.sh#L40

When using deploy-all, its set to ALL tho, not 4G: https://github.com/herlesupreeth/docker_open5gs/blob/improve_5g_ims/deploy-all.yaml#L278

Maybe I'm missing something tho?

@Rashed97
Copy link
Author

Rashed97 commented Mar 1, 2025

@herlesupreeth cleaned up the unused, commented out code (see above commit).

Also I added code to print the P-Associated-URI header element from the header if it's present in MO and MT and I'm not seeing it in any of the headers from my devices. Any ideas on why it's not showing?

Rashed97 and others added 4 commits March 5, 2025 13:11
This completes the changes required for dynamic N5 and Rx route selection.

This adds the following:
- A table to store subscribers and their registration tech
- A table to store contact subscribers for contact aors
- An HTTP endpoint in S-CSCF to query for a IMPI given a public identity (ex. MSISDN)
- Proper retrieval of the IMPI in MO and MT routes
- Routing based on the registration tech retrieved from the table given an IMPI
@Rashed97
Copy link
Author

Rashed97 commented Mar 5, 2025

@herlesupreeth I've updated the branch with some additional fixes and I've now been able to verify that calls work back and forth from an NR device to LTE (I can upload videos if you'd like verification). Not exactly sure why but my srsRAN setup suddenly stopped having the QoS PDU modification issues it was having previously.

I've also included the srsRAN update commit from the old VoNR branch, and the SGsAP commit from the main branch to bring this branch up to date.

Regarding the P-Associated-URI parsing, I still do not see that header element passed from my UEs in any of the packets received by the P-CSCF. I don't really like the S-CSCF connection either, so I'm going to look at better ways of doing this (perhaps saving it in the MySQL database or something), but as this chain stands right now, voice calls, video calls, and SMS all work between LTE and NR devices. I'm working on a new custom environment actually that may necessitate a better "tech storage" mechanism anyway: deploying an AMF, SMF, and UPF in a second location, with their own gNB, that all connect back to this main core. This will likely need a second P-CSCF at the second location as well, but still working on that.

Let me know if you'd like to see anything else or address any additional concerns with this series. Thanks!

@herlesupreeth
Copy link
Owner

Hey!! thanks a lot for addressing all the comments. Regarding P-Associated-URI, its sent only in 200OK for SIP REGISTER. Anyways, I will take a look at the changes done recently in this PR.

@Rashed97
Copy link
Author

Rashed97 commented Mar 5, 2025

Thanks! I’ll take a look at the 200 REGISTER message and see what I can parse there.

One more note: I’ve been playing with the processes/children again, and it looks like that change may not be needed, so I may revert that (or remove from the series). But in doing so, I’ve found something strange: the P-CSCF opens A LOT of connections to the MySQL database. Like an insane number. With the old 4 children and 16 TCP processes, it opens around 100 connections to the MySQL database. With my current changes, it’s around 350. For comparison, S-CSCF opens one connection and SMSC opens two. Any ideas on why this might be happening? It definitely doesn’t seem normal, since really it shouldn’t need that many at all, maybe 4 connections or so for usrloc, pcscf_usrloc and ims_dialog, but maybe I’m missing something.

@herlesupreeth
Copy link
Owner

Any ideas on why this might be happening? It definitely doesn’t seem normal, since really it shouldn’t need that many at all, maybe 4 connections or so for usrloc, pcscf_usrloc and ims_dialog, but maybe I’m missing something.

Sadly no. I havent observed that closely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants