-
-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Sync crashes/exits with core.time.TimeException: Invalid ISO Extended String #2813
Comments
@phlibi This issue was never seen in >12 months of testing, it could be unique to your environment. Please do the following:
When building from source, please enable debugging:
To create a debug log output, please follow:
The above steps are required to decode the following:
Without rebuilding your client, this cannot be decoded correctly, and, more importantly it is impossible to work out what/where this is coming from. Thanks. |
@phlibi Please can you compile the following PR:
To build the PR, please follow the steps to prepare your system to build the client as detailed here: https://github.com/abraunegg/onedrive/blob/master/docs/install.md#building-from-source---high-level-requirements When building the client, please use the following:
When running the PR, your version should be: |
@abraunegg Wow, thanks for the quick response! Here's what happens when I run the same with the PR:
Using |
@phlibi |
@phlibi |
@abraunegg The archive with the files is ready, but where should I send it to? I can't find your e-mail address and the support@mynas.com.au mentioned on the issue creation page seems to be a bit unspecific. |
That is the correct email address |
@abraunegg Did you receive my e-mail? |
No email or anything in spam |
@phlibi |
@phlibi |
@abraunegg Thanks, I'll continue here to have the track record in the issue. First try without a resync asserts as expected:
Then with the resync:
It was running like this for some time, apparently spitting out the same "correcting timestamp online" message for each present file. I'm a bit worried about it messing with the data on the server, which is why I've stopped the process then. I also don't know where those mismatches could come from, as I didn't modify the local files and nobody would have changed all of them online. Just to be sure, I checked my local time and this is correct and synchronized via NTP, so should not be an issue. To me, it looks like the online files have their timestamp set to whenever they were actually created (which is fine), while the local file timestamps are from yesterday when I first synchronized the data. So if it now overwrites all those original timestamps with yesterday, that is a bit, well, uncool. To avoid messing with the online data, I then added
Strange thing: It says that it modified the local modification time of the files, but when I check on my disk, almost all of them have still an mtime of the first synchronization yesterday. Now another run without special flags:
So it looks as if it is now working properly. Whether this is simply due to the resync, the patch or something else, I don't know. |
Funny. With one of the folders now fixed, I went ahead to synchronize another one, this time directly starting with
I then restarted the exact same process and it completed without the exception within a few seconds. So it appears that a run with |
Yes .. this is NOT right .... but also shows an issue .. in a
This is what should also be happening in a --resync scenario
So the sync yesterday would have been the 'first' time, so, also why the timestamps locally are potentially not what they should be.
A combination of the patch and
That is good news! What I will do is for this PR:
Regarding this item:
Please raise a new bug so it is tracked separately. Unfortunately I am not going to be able to look at anything today - so looking at these issues any further will be another ~24 hrs or so - sorry. |
@abraunegg Thanks! I'm currently on the way and with only a slow internet connection, so I wasn't yet able to repeat the original process. But a fresh sync on single directory containing mostly small text files still leads to an mtime error:
Again, the files seem to have been downloaded just fine and the stored mtimes also seem to be correct, which is certainly a good thing. Running the same process again with
Out of curiosity: What is this new assertion, or what is asserted against? Is this a data string not matching an expected format (YYYY-MM-DDTHH:MM:SSZ or something), or is it a timestamp value that is out of bounds, or what is it exactly that triggers the exception? Could this have anything to do with my local timezone or locale settings?
Please note that I'm using a "mixed" locale, with US English for the language, but Swiss German for some other formats, including date/time and numbers. |
You need to add |
@abraunegg |
That assertion is triggered when the data, in the database for 'mtime' (which is stored as a string):
Essentially - the data, in the DB record what was stored is 'bad' for some reason .. |
To narrow this down, I have updated the PR to add the following:
If you rebuild the PR, and use |
@abraunegg It's really strange. Since the
I then hacked a bit of debug output into the code to show the actual string it's getting from the DB in this case and got this:
Here's my change in
My assumption is that this just gets some random data, which will then either completely fail UTF-8 decoding as in this example, leading to the UTFException, or be some random bytes that happen to be valid UTF-8, which then obviously don't pass the check for a valid timestamp. I guess that #2813 and #2829 really are due to the same cause, but manifest with a different exception depending on what random data is returned. I'm now running an initial sync again, to see with what it comes up this time. When you checked the SQLite file that I sent you, did you find any such bogus mtime entries? I guess they would need to show up somehow there; if not, that would mean that the data gets corrupted during reading. But that wouldn't explain why it works after a |
Run is done already, again with a very bad string:
|
@phlibi So .. the application is converting this correctly internally - what SQLite version are you running? Now - I use ZFS myself heavily - potentially maybe even some ZFS|database issue going on? Does your system use ECC memory? What version of ZFS are you using? Other cases of ZFS | SQLite corruption: https://forums.unraid.net/bug-reports/stable-releases/sqlite-database-corruption-back-r2566/ so the question is .. is this a true bug? or file system bug ... or a memory bug on your platform? |
@phlibi SQLite default page size is 16KB ... so |
@abraunegg I now reproduced it a few times and it really always affects the About the versions and recordsize:
The recordsize would be far too large, but if I'm not mistaken, this only affects performance. Just to be sure, I ran the same sync on a completely different (and much older) machine and got the same issue there. This time it was a
Out of curiosity, I then patched the code a bit, in the hope that it would crash when the broken item is written to the DB, instead of only when it is read: diff --git a/src/itemdb.d b/src/itemdb.d
index f726d25..a42c987 100644
--- a/src/itemdb.d
+++ b/src/itemdb.d
@@ -657,6 +657,14 @@ final class ItemDatabase {
private void bindItem(const ref Item item, ref Statement stmt) {
with (stmt) with (item) {
+ const string mtimeStr = mtime.toISOExtString();
+ try {
+ assert(isValidUTCDateTime(mtimeStr), "Bad mtime string: '" ~ mtimeStr ~ "'");
+ } catch (Exception e) {
+ writeln("Exception during isValidUTCDateTime assertion: '" ~ mtimeStr ~ "'");
+ throw e; // Re-throw the exception after logging
+ }
+
bind(1, driveId);
bind(2, id);
bind(3, name);
@@ -673,7 +681,7 @@ final class ItemDatabase {
bind(5, typeStr);
bind(6, eTag);
bind(7, cTag);
- bind(8, mtime.toISOExtString());
+ bind(8, mtimeStr);
bind(9, parentId);
bind(10, quickXorHash);
bind(11, sha256Hash); But this was not triggered. So either the corruption really happens inside the SQLite DB, or my patch just doesn't work as expected. As said, I'm never before used D and of course I'm not into your code base deep enough to judge if this is the right place to add such a check. And the right way to do it, either. I see that items are downloaded multithreaded. Is there a way to force it to run with only a single thread, to rule out any concurrency issues? |
It is also interesting, that outside of the initial report .. there are also zero other reports of this .... maybe even something 'locale' related ...
See the 'threads' value - https://github.com/abraunegg/onedrive/blob/master/docs/application-config-options.md#threads |
just to let you know....
|
Some update or progress, whatever it is:
Regarding the "zero other reports" (which is apparently no longer true, thanks @advorsky73 for sharing): I don't know what the real cause of the issue is, but since it only seems to (reproducibly) appear with an initial sync and maybe only with Sharepoint (not plain OneDrive) and maybe only with a certain number of files and probably only with the 2.5 version, there is plenty of potential to get around it, especially for existing users. Time will tell whether more users are affected. |
After the second try, there is the same outcome: No issues with v2.5 and threads=1 and also no issues with v2.4.25. I guess I'll stick for now with the current version and threads=1, at least until there is more insight. |
@advorsky73
|
Very curious | interesting .... In your debug logfile:
This is usually the 'default' this means that your SQLite library is compiled in serialized mode, the safest option for multithreaded use .. and should be used by default when opening the database file ... maybe it is not ...... I was relying on this to ensure the DB was being opened in a threadsafe manner. I have updated the PR to ensure/force the database is opened in the safest multithreaded way. Please can you rebuild the client to use |
Unfortunately no luck:
And to be sure:
I see that according to all logic and documentation, this should be just fine. But obviously it is not :( Just to make sure that I'm not hunting a phantom, I did it twice (clearing all the files and config backup and DB before each run) and it failed twice. Then I did the same again with Maybe a dumb question since I don't know much about D and the application design: Is it possible, that your threads use individual SQLite objects, which don't share the serialization mutex and instead access the DB file concurrently? Or is there some other possibility that SQLite's own locking could be defeated by how it is used by the onedrive client? |
The only threading is really with when it is looking at uploading|downloading actual files/data from Microsoft OneDrive. Whilst this is set in 'main.d':
This only governs the thread pools that is used when creating the threads for upload|download processes. With
See if that makes a difference ..... |
@phlibi I want to try an replicate as close as possible your system, given that I cannot (no matter how hard I try) replicate your issue using a Sharepoint folder and/or 50K worth of objects online and/or using Debian 12 (albeit in a VM, not using ZFS, but the VM, via VMware ESXi is off a data store served via NFS from a ZFS volume). |
Just tried this and it still failed. I first thought that this would be exactly the same as just setting
Are there different thread pools in use?
Not much special there.
Secondary notebook (from the top of my head, as I'm out of the office to check right now):
At least from the client side, I don't think there is anything that would lead to my systems behaving differently than what you can easily replicate, be it with a VM or on bare metal. I see the same effect no matter whether I'm connected to a 1GBit/s wired internet connection or an unstable link using my mobile phone. Where I see most potential for differences is with the server stuff. We use the "Microsoft 365 Business Standard" subscription with close to 20 users. As far as I know, our data is hosted in Swiss or at least EU datacenters, or at least it is supposed to be. Since you already found some differences in behavior when the client connects to EU datacenters, We organize the data mostly in Teams, with Sharepoint just used as the storage backend. To replicate our situation on the server side, it should be sufficient to create a Team in Teams and drop a number of directories and relatively small files into it. I currently synchronize a single directory of a Team/site with the following figures:
~3000 files in 28 directores with a total size of ~100MB, or an average file size of ~33kB. This already reliably triggers the failure in, say, 90% of all initial synchronization runs. I haven't counted exactly, but would say that there happen to be 5-10 severely corrupted mtime entries in the DB file afterwards. This includes the ones flagged as "BLOB", but not those which contain a string of a bad format. While the server/datacenter certainly can't cause such a crash on the client directly, I see several notifications of throttling (too many requests), leading to delays in all threads for sometimes up to ~200s. Since this happens more frequently for us when threads are used, it also greatly reduces the speed gain of multithreading, as the client then simply wastes much more time with waiting for the server to accept new requests. Do you see this effect as well? If not, it could be the reason why we see the corruption and you don't:
Does this sound reasonable? |
Describe the bug
I started actual use of the onedrive client today by synchronizing a few Sharepoint/O365 folders to my PC. This worked fine for one folder, another one is still running, but the third seems to crash the client due to a
TimeException
:This happened after a lot of data was synchronized (about 16GB), probably everything that is in the folder. But now the process exits with this error message, also when started with
--monitor
.Operating System Details
Debian Bookworm (12) with backports enabled Linux phiptp 6.10.6+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.6-1~bpo12+1 (2024-08-26) x86_64 GNU/Linux
Client Installation Method
From 3rd Party Source (PPA, OpenSuSE Build Service etc)
OneDrive Account Type
SharePoint
What is your OneDrive Application Version
onedrive v2.5.0-1+np1+1.1
What is your OneDrive Application Configuration
What is your 'curl' version
Where is your 'sync_dir' located
Local
What are all your system 'mount points'
What are all your local file system partition types
All local data is stored on ZFS
How do you use 'onedrive'
The folder is a company share, about 10 other people have access. It is not under heavy use, though. Most likely, there weren't any remote changes between my subsequent synchronization runs. All other people synchronize via the Windows client.
Steps to reproduce the behaviour
See above: Just running
--sync
or--monitor
with the offending configuration reliably triggers the exception.Complete Verbose Log Output
Screenshots
No response
Other Log Information or Details
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: