Skip to content

Conversation

@ben-beauhurst
Copy link
Contributor

Added a concurrent version of check_links for speed.

When I tested this out on 1000 URLs, check_links took 1min 37s and concurrent_check_links took 7.85 s (with 20 workers) so a massive speed up is possible by utilizing concurrent.futures with the existing code.

There wasn't any code coverage of check_links so I added that, however it was a bit tricky to write good tests for the concurrent_check_links since this project is using sqlite as it's test db so concurrent writes aren't really possible.

Was raising a `RuntimeWarning: DateTimeField Url.last_checked received a naive datetime ... while time zone support is active.`
Since we're using sqlite as the test db backend,
we can't effectively test the `Url` objects being saved in the `ThreadPoolExecutor` futures,
but there's enough test coverage that under the hood `Url.check_url` is doing the right thing
@claudep
Copy link
Contributor

claudep commented Sep 15, 2025

Technically speaking, it's nice!

However, I'm worried that this may trigger spam/attack detectors when you suddenly burst head requests almost simultaneously to some websites. Of course, if all links are target to different websites or for internal links, that's not an issue. Ideally, the command would group links by domain and then concurrently call "normal" check_links" for each group, but I fear this would add much complexity to the command. Thoughts?

@ben-beauhurst
Copy link
Contributor Author

ben-beauhurst commented Sep 19, 2025

However, I'm worried that this may trigger spam/attack detectors when you suddenly burst head requests almost simultaneously to some websites. Of course, if all links are target to different websites or for internal links, that's not an issue. Ideally, the command would group links by domain and then concurrently call "normal" check_links" for each group, but I fear this would add much complexity to the command. Thoughts?

Yes, that is possibly a risk, but as you point out not for all use cases. I can add a docstring that explains this risk, and users can make their own judgement about whether it's the appropriate choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants