-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using async API can re-order the arrival of packets. #54
Comments
Hi @sonatique OK, I think i know what is going on. The problem here is that you rack up 10 transfers which are all directly submitted to the underlying driver (USBD). Since the last patch, each transfer is broken into several USBD requests (like usbk). When you submit multiple requests at the same time, there is no guarantee that the reads will be done in order. So the first transfer might do: 1: packet 1, packet 2, packet 6 And second will do: 2: packet 3, packet 4, packet 5 Are you sure you are missing data, and that they are not just out of order? Thanks, /pedro |
@dontech : thanks for the explanation. I guess your explanation is absolutely correct, but then, this is a big problem, at least for met (and for other people using more than one concurrent transfers) ... :-( I tested with libusbK.sys and I have no issues: I can have multiple concurrent transfers, each of 2MBytes or so... Thanks. Do you think that there is any chance that you could fix this while retaining the possibility of having transfer >64K? |
Just verified it. This is exactly what is happening. libusb0-sys:[transfer_complete] sequence 7: 64 bytes transmitted, maximum_packet_size=64, totalLength=128 We cannot fix this without serializing what goes into the driver. We could put a lock on it or something. |
The basic problem is that you expect the data to arrive in the same order as your read requests. Yes, indeed. There have been several discussion on libusb-1.0 github regarding whether this is guaranteed to happen in all cases, and the conclusion was: yes. And in fact it is the case with latest official libusb-win32 release. Looks like I'll have to choose in between more than one transfer or transfer size >64K... Damned. |
Looks like I'll have to choose in between more than one transfer or transfer size >64K... Damned. Its not related to 64K directly. It is related to the endpoint size. In my case I reproduced it with an endpoint size of just 64 bytes. Why do you need all those queued requests? Do they actually help performance? |
What I meant is that I have to choose to either stay with current official release and have more than one buffer but of no more than 64KByte in size, or to use the latest patch to use an arbitrary large buffer size but only one at at time... Well, I technically do not need 16 buffers, it's an old number put here to be sure it was large enough. My recent tests with a USB3 device producing sustained 180MBytes/s indicate that my sweet spot is 3 buffers of 2MBytes each. |
@sonatique we will get there, do not worry. What is really needed is sort of a "mode" switch. There are 2 scenarios:
The answer is simple: When the driver issues the next USBD request, it should check if another request has already been issued. If it has, it should not re-request another for the same transfer, to keep the order intact. |
@dontech Thanks a lot for considering finding a solution. When you're writing about "async API" vs "sync API" which API are you referring to? The one from libusb0 or libusb-1.0? |
Both libraries support both mode.
|
OK I fixed the problem. So now, if another request is queued for the same endpoint, the IRP cannot be resubmitted as it would cause the order to go bonkers, and will not achieve anything anyway as another request is picking up the response anyway. This does however mean that if you are submitting allot of requests, the re-queueing of IRPs is not likely to happen, meaning that you will be using more IOCTL requests for the transfer. Currently this is the only way I see to fix this, while guaranteeing data integrity. |
@dontech : yeah! I no longer observe "corruption" with my high level application, neither using 16 x 16K buffers or 3 x 2M buffers, that is very cool, thanks a lot! My question about Sync and Async was related to the fact libusb-1.0 Sync API is implemented using its Async API + Wait. So, I was concerned by the fact that libusb0.sys could never really "understand" whether the requests are conceptually Sync or Async given whatever the higher level code does at libusb-1.0 level, libusb0.sys would always see Async calls. Anyway, you seemed to have fixed the problem, that is amazing. I am not sure I understand all the theoretic implications on performances, but I will test and see what I get. |
@dontech I tested for performances with my dedicated setup and I have a bad news, unfortunately: Here are some numbers (complement what I posted here: #45 (comment)): V1.3.0.1 (10/20/2023) snapshot libusb0.sys: libusbK0.sys: This PR libusb0.sys: These results are consistently reproducible on my machine. Basically the benefit of using larger buffer size is completely nulled regarding CPU usage when using more than one concurrent transfer ;-( EDIT: woups: I just got a BSOD when I physically unplugged my device while running N=3, S=2048K test with this PR libusb0.sys. Stop code is: KERNEL_MODE_HEAP_CORRUPTION. I never experienced BSOD with libusb0.sys when unplugging before. |
@sonatique thanks for testing.
Thanks, /pedro |
@sonatique I re-fixed this and also fixed up some things that where not safe on multi-core setups.
I am eager to hear your feedback on this. Please post new results: |
New test release here: https://github.com/mcuee/libusb-win32/releases/tag/test_release_async_order2 |
@dontech : thanks a lot for your hard work. Unfortunately, your latest "test release 1.3.0.1-async-order-2" seems to even give an higher CPU usage than the previous one for concurrent transfers: As for the BSOD: I didn't get it when unplugging during 1x 2MB test, but I got it immediately during 3x 2MB test: |
@sonatique I can see that USBPcap appears in your stack trace. USBPcap is notorious for causing problems. You probably installed it along with Wireshark I guess. Could you uninstall USBPcap and make sure it does not attach to your driver before testing ? |
@dontech : Thanks for looking at it. I indeed had USBPcap installed (old version from 2020) along with other Pcap stuff. Then when the computer rebooted, I ran another test, and this time I was no longer able to get BSOD despite trying several times to unplug while running transfers. So you probably found the cause of it, thanks a lot! But anyway, I am still wondering why I never had this before (while having USBPcap installed from the very beginning). It seems to mean that you added something that now "conflicts" somehow with USBPcap. I am concerned that some people that could potentially use a product containing this newer libusb0.sys may have USBPcap installed and therefore suffer instabilities they didn't have before... Regarding the high CPU usage: as you expected I guess: there is no improvement after I removed USBPcap.. |
Hi @sonatique
Thanks, /pedro |
|
@sonatique We need to quantify the problem I a way we can both see it. Could you re-run your tests with this app? It is a modification of your original app, so it runs directly with libusb-win32 library
|
Maybe it is good to add Existing Then there is no intermediate example between the simple |
Hi, thanks. My point is that the only ever version of libusb-win32 (libusb0.sys) I experienced BSOD with is your last experimental one. I used libusb-win32 for years (official release version up to the last one) with USBPCap installed (last version since 2020). |
I only used release version as far as I can tell. My last one was this: https://github.com/mcuee/libusb-win32/releases/download/test_release_async_order2/libusb-win32-bin-1.3.0.1.zip |
For sure I will test with your new app, as soon as I can. Quick question though: when you say "high number of completed transactions" : I have trouble understanding how this relates to lisbusb-1.0 transfers . I mean: I submit N (=3) libusb-1.0 transfers for the same IN endpoint and I have a single callback function to be called for all three of them. This is my only interaction with the process of submitting and receiving something. I do not have access to transactions done by libusb0.sys, it is a black box to me, so I cannot log or do something with what happens below libusb-1.0. I don't know how many low-level interactions are done with the hardware. Do you mean that I will now get several callback calls for each submitted transfer? That instead of getting one libusb-1.0 callback when my first submitted transfer is done with receiving 2MB, I will now get many callback calls carrying small amounts of bytes? |
I have moved this to a new ticket: #56 |
What i mean is to fix the ordering of packets i have to allow smaller packets to complete with smaller amounts of data. You will probably notice that some of your reads complete with only 64K of data or less. To fix this i need to fix: #52 |
Ok, got it . I added some display to my test app in order to see this and indeed, I see that all (but the very first one) read transfers complete with 64K when using more than 1 concurrent transfer. When using only 1 transfer, I see completion at the maximum value (2MB). The benefit of increasing the buffer size is thus lost when using more than one buffer (basically all results are equal to test with N buffer of 64K, whatever the max size I am using). |
Yep, and I am fixing this now. I will push a fix soon that will allow each transfer to be up to 32MB for superspeed. You should see a very big boost... :-) |
OK here is the new version with the fix in for the max transfer size. https://github.com/mcuee/libusb-win32/releases/tag/test_release_async_order4 |
@sonatique Please check that your transfer sizes go up and if it helps on CPU usage. |
@dontech Woohoo, yes, this time I can have several concurrent transfers of 2MB or more (I tested with 32MB) and low CPU. That is really super good, thanks! (side note: I experienced a BSOD while removing my unit, given I still have USBPcap 1.5.0.3 installed. Not a big deal here since this issue is adressed in #56 , so just mentioning it). |
@sonatique What is the latest numbers? CPU usage? Lets close this one for now, so we can focus on the remaining BSOD, which is probably the last major pain to get this to a release stage, so you can use it for production. |
@dontech I am at around 1.5% CPU or less (I guess the error bars are quite large for such small percentages) consistently for 3 concurrent transfers each with buffer size 2MB (2048KB), transfer rate is 180MBytes/s. Adding more concurrent transfers doesn't change anything. Using larger buffer size doesn't significantly decrease CPU usage (and does not increase transfer rate given I am already saturating my device), but I guess this could be application / scenario / device dependant. I also checked for "corruption" and there is none. I think we're all good here! |
Excellent. Another one bites the dust! :-) |
Found during the investigation of: #51
@dontech Well, I quickly modified my loopback test program for handling multiple transfers at once, but it doesn't show the issue...
The difference in between this simple version and my full application is that in loopback test each transfer has a fixed constant buffer pointer and buffer contents are verified sequentially, given transfer callback all happens within the same "handling thread". Given that verifying buffers is a "long" process, I am suspecting that all transfers are done while I am verifying the first one, I then resubmit it, verify the second, etc. but de facto there might be only one active transfer at a time.
In my full application what I do is that, in callback function, I put the transfer buffer pointer into a list then quickly allocate a new buffer, set the transfer buffer pointer with it and submit the transfer again.
I will try to improve loopback test for simulating this, but it might not be easy.
In the meantime: do you have any idea how your changes could create the behavior I am observing? Thanks.
The text was updated successfully, but these errors were encountered: