Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 5289 - Relacionando Navegador e User-Agent #10926

Merged
merged 5 commits into from
Jun 21, 2023
Merged

Issue 5289 - Relacionando Navegador e User-Agent #10926

merged 5 commits into from
Jun 21, 2023

Conversation

louisaturn
Copy link
Contributor

A escolha do navegador passa a ser acompanhada de uma sugestão de user-agent compatível. Também é possível que o usuário insira outro user-agent de sua preferência.

Os testes, realizados com sites detectores de navegador e user-agent, além de mostrar que o navegador e o user-agent era o escolhido, não foi detectado nenhum problema 404 bad request.
Informações sobre a motivação da resolução dessa issue podem ser encontradas na especificação da issue e na discussão deste link.

O próximo passo é verificar se a combinação adequada de navegador/user-agent é capaz de resolver coletas bloqueadas que pareciam necessitar do uso do Mozilla Firefox (de acordo com testes de construção de coletores standalone).

Possíveis testes simples podem ser feitos usando o coletor abaixo, variando o tipo de web browser:

{
    "source_name": "Teste user-agent",
    "base_url": "https:\/\/searchenginereports.net\/what-is-my-browser",
    "obey_robots": false,
    "ignore_data_crawled_in_previous_instances": false,
    "crawler_description": "Teste 5289",
    "crawler_type_desc": "Contratos",
    "crawler_issue": 0,
    "data_path": "navegador",
    "sc_scheduler_persist": true,
    "sc_scheduler_queue_refresh": 10,
    "sc_queue_hits": 10,
    "sc_queue_window": 60,
    "sc_queue_moderated": true,
    "sc_dupefilter_timeout": 600,
    "sc_global_page_per_domain_limit": null,
    "sc_global_page_per_domain_limit_timeout": 600,
    "sc_domain_max_page_timeout": 600,
    "sc_scheduler_ip_refresh": 60,
    "sc_scheduler_backlog_blacklist": true,
    "sc_scheduler_type_enabled": true,
    "sc_scheduler_ip_enabled": true,
    "sc_scheduler_item_retries": 3,
    "sc_scheduler_queue_timeout": 3600,
    "sc_httperror_allow_all": true,
    "sc_retry_times": 3,
    "sc_download_timeout": 10,
    "antiblock_download_delay": 2,
    "antiblock_autothrottle_enabled": false,
    "antiblock_autothrottle_start_delay": 2,
    "antiblock_autothrottle_max_delay": 10,
    "antiblock_ip_rotation_enabled": false,
    "antiblock_ip_rotation_type": "tor",
    "antiblock_max_reqs_per_ip": 10,
    "antiblock_max_reuse_rounds": 10,
    "antiblock_proxy_list": "",
    "antiblock_user_agent_rotation_enabled": false,
    "antiblock_reqs_per_user_agent": 100,
    "antiblock_user_agents_list": "",
    "antiblock_insert_cookies_enabled": false,
    "antiblock_cookies_list": "",
    "captcha": "none",
    "has_webdriver": false,
    "webdriver_path": "",
    "img_xpath": "",
    "sound_xpath": "",
    "dynamic_processing": true,
    "browser_type": "firefox",
    "browser_user_agent": "Mozilla\/5.0 (X11; Linux i686; rv:111.0) Gecko\/20100101 Firefox\/111.0",
    "skip_iter_errors": false,
    "browser_resolution_width": 1280,
    "browser_resolution_height": 720,
    "create_trace_enabled": false,
    "video_recording_enabled": false,
    "explore_links": false,
    "link_extractor_max_depth": null,
    "link_extractor_allow_url": "",
    "link_extractor_allow_domains": "",
    "link_extractor_tags": "",
    "link_extractor_attrs": "",
    "link_extractor_check_type": false,
    "link_extractor_process_value": "",
    "download_files": false,
    "download_files_allow_url": "",
    "download_files_allow_extensions": "",
    "download_files_allow_domains": "",
    "download_files_tags": "",
    "download_files_attrs": "",
    "download_files_process_value": "",
    "download_files_check_large_content": true,
    "download_imgs": false,
    "steps": "{\"step\":\"root\",\"depth\":0,\"children\":[{\"step\":\"espere\",\"depth\":1,\"arguments\":{\"segundos\":\"8\"}},{\"step\":\"screenshot\",\"depth\":1,\"arguments\":{}}]}",
    "encoding_detection_method": 1,
    "expected_runtime_category": "medium",
    "templated_url_parameter_handlers": [],
    "templated_url_response_handlers": [],
    "instance_id": "16802724450290",
    "crawler_id": 3
}

@codecov
Copy link

codecov bot commented Jun 13, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (2f0d56f) 68.23% compared to head (5a487f2) 68.23%.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev   #10926   +/-   ##
=======================================
  Coverage   68.23%   68.23%           
=======================================
  Files          20       20           
  Lines        1212     1212           
  Branches      228      228           
=======================================
  Hits          827      827           
  Misses        343      343           
  Partials       42       42           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rennancl rennancl merged commit d22e004 into dev Jun 21, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants