feat:
1. Make active_crawls into a dict instead of set and remove jobs array. Effective lookup and storage of active crawls and crawl control. 2. Put a lock on active_crawls, so similtanious push and pop by coroutines doesn't cause a race condition 3. Move the depth check logic outside the child link for loop, as source_url doesn't change in the loop.
This commit is contained in:
@@ -188,11 +188,11 @@ if __name__ == "__main__":
|
||||
import time
|
||||
|
||||
# Run basic example
|
||||
# start_time = time.perf_counter()
|
||||
# print("Running basic scraper example...")
|
||||
# asyncio.run(basic_scraper_example())
|
||||
# end_time = time.perf_counter()
|
||||
# print(f"Basic scraper example completed in {end_time - start_time:.2f} seconds")
|
||||
start_time = time.perf_counter()
|
||||
print("Running basic scraper example...")
|
||||
asyncio.run(basic_scraper_example())
|
||||
end_time = time.perf_counter()
|
||||
print(f"Basic scraper example completed in {end_time - start_time:.2f} seconds")
|
||||
|
||||
# # Run advanced example
|
||||
print("\nRunning advanced scraper example...")
|
||||
|
||||
Reference in New Issue
Block a user