Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzzysearch quits without looking up all items #5

Open
reiuwu opened this issue Mar 2, 2023 · 4 comments
Open

fuzzysearch quits without looking up all items #5

reiuwu opened this issue Mar 2, 2023 · 4 comments

Comments

@reiuwu
Copy link

reiuwu commented Mar 2, 2023

When I run ./fuzzysearch-cli.exe match-images --api-key [key] D:\images\ all-sources sources.csv it will hash all files, then after looking up some items will exit without any errors. This is the output before it quits.

⠒ [00:00:02] [>---------------------------------------] 9/32183 (79m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 42
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 42
⠁ [00:00:03] [>---------------------------------------] 18/32183 (69m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 33
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 41
⠴ [00:00:04] [>---------------------------------------] 27/32183 (65m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 24
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 40
⠖ [00:00:05] [>---------------------------------------] 36/32183 (63m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 15
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 39
⠈ [00:00:06] [>---------------------------------------] 45/32183 (62m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 6
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 38
⠒ [00:00:07] [>---------------------------------------] 54/32183 (62m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 1
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 37
  [00:00:07] [>---------------------------------------] 59/32183 (0s):
 INFO  fuzzysearch_cli              > Calculating image sources
 INFO  fuzzysearch_cli              > Sources calculated, writing output
 INFO  fuzzysearch_cli              > Done!

The same thing happens with an offline database.

@Syfaro
Copy link
Owner

Syfaro commented Mar 2, 2023

Hm, that's a weird one! The only thing that breaks it out of the loop for fetching hashes from the API is getting 0 rows back from the database, and that should only happen after it's finished. I pushed up some changes that should at least add a little more visibility into what's going wrong, could you try updating and letting me know what happens?

I'm also extremely confused how that same behavior could happen with an offline database, because the process for pulling hashes into the local database is entirely different and only stops loading records when it's done or if it hits an error (which would crash the program).

It looks like you're on Windows, did you build the program yourself or download it from the GitHub Actions build? And have you tried deleting the local database it creates yet? It should be at something like C:\Users\USER\AppData\Roaming\fuzzysearch.

@reiuwu
Copy link
Author

reiuwu commented Mar 2, 2023

I build the code myself on windows. I couldn't find a binary file on Github. This is the output on the existing database:

 INFO  fuzzysearch_cli > Performing lookups
⠁ [00:00:00] [----------------------------------------] 0/42719 (0s):
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠄ [00:00:01] [----------------------------------------] 0/42719 (0s):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 51
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 7
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠐ [00:00:02] [>---------------------------------------] 9/42719 (2h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 42
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 6
⠐ [00:00:02] [>---------------------------------------] 9/42719 (2h):
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠁ [00:00:03] [>---------------------------------------] 18/42719 (2h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 33
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 5
⠉ [00:00:03] [>---------------------------------------] 18/42719 (2h):
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠴ [00:00:04] [>---------------------------------------] 27/42719 (89m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 24
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 4
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠖ [00:00:05] [>---------------------------------------] 36/42719 (84m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 15
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 3
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠁ [00:00:06] [>---------------------------------------] 45/42719 (83m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 6
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 2
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 6
⠴ [00:00:07] [>---------------------------------------] 54/42719 (83m):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 1
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 1
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 1
 DEBUG fuzzysearch_cli::fuzzysearch > rows was empty
  [00:00:07] [>---------------------------------------] 59/42719 (0s):
 INFO  fuzzysearch_cli              > Calculating image sources
 INFO  fuzzysearch_cli              > Sources calculated, writing output
 INFO  fuzzysearch_cli              > Done!

A new database has the same issue

⠤ [00:33:17] [#>--------------------------------------] 2018/45032 (13h):
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠒ [00:33:18] [#>--------------------------------------] 2027/45032 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 4
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 53
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 4
⠁ [00:33:19] [#>--------------------------------------] 2036/45032 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 1
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 52
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 1
 DEBUG fuzzysearch_cli::fuzzysearch > rows was empty
  [00:33:19] [#>--------------------------------------] 2039/45032 (0s):
 INFO  fuzzysearch_cli              > Calculating image sources
 INFO  fuzzysearch_cli              > Sources calculated, writing output
 INFO  fuzzysearch_cli              > Done!

@Syfaro
Copy link
Owner

Syfaro commented Mar 2, 2023

You can find the latest binary here on the most recent run, although I don't think that's the issue.

I think I might have figured out what was happening, could you try again now? It was checking row count after filtering null values, but the query might have been returning files with a null hash, so it'd get 0 rows before the end.

@reiuwu
Copy link
Author

reiuwu commented Mar 3, 2023

Using the second db file the api method works:

⠉ [00:02:10] [>---------------------------------------] 190/42814 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 40
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 57
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠂ [00:02:11] [>---------------------------------------] 200/42814 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 30
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 56
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠠ [00:02:12] [>---------------------------------------] 210/42814 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 20
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 55
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠓ [00:02:13] [>---------------------------------------] 220/42814 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 10
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 54
 DEBUG fuzzysearch_cli::fuzzysearch > using chunk size: 10
⠚ [00:02:14] [>---------------------------------------] 230/42814 (9h):
 DEBUG fuzzysearch_cli::fuzzysearch > found next rate limit: 0
 DEBUG fuzzysearch_cli::fuzzysearch > found rate limit reset: 53
⠉ [00:03:04] [>---------------------------------------] 240/42814 (9h): Reached rate limit, waiting 54 seconds  

Using the second db file the offline db crashes right away:

 INFO  fuzzysearch_cli > Found 5 files needing hashing
 TRACE fuzzysearch_cli > Opening image: D:\export\a.jpg
 TRACE fuzzysearch_cli > Opening image: D:\export\b.jpg
 TRACE fuzzysearch_cli > Opening image: D:\export\c.png
 TRACE fuzzysearch_cli > Opening image: D:\export\d.jpg
 TRACE fuzzysearch_cli > Opening image: D:\export\e.jpg
  [00:00:00] [----------------------------------------] 0/5 (0s)
 INFO  fuzzysearch_cli > Loading index
  [00:44:37] [########################################] 49555952/49555952 (0s)
 INFO  fuzzysearch_cli > Calculating image sources
 INFO  fuzzysearch_cli > Sources calculated, writing output
 INFO  fuzzysearch_cli > Done!

Can you confirm if the artifact is publicly viewable? I still cannot find any binaries to download.
Edit: Using the API completed all 42k hashes without crashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants