Skip to content

TwoRavens/modelsof

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

modelsof

  1. python3 modelsof.py get_datases jop

Scrapes Dataverse for all articles. Produces out/jop/datasets.csv with title, href, date, description, keywords.

  1. python3 modelsof.py get_files jop

Scrapes Dataverse for all files associated with each article in datasets.csv. Produces out/jop/files.csv with title, href, date, filename, file_href.

  1. python3 modelsof.py get_downloads jop [2018]

Downloads all files with ext of .do .7z .7zip .gz .rar .tar .zip in files.csv optionally limited by year. Produces out/jop/downloads/{year}/{dataset}/{file}. Errors logged to out/jop/downloads/errors.csv with title, href, date, filename, file_href, error.

  1. python3 modelsof.py unzip jop

Recursively unzips all files with ext of .7z .7zip .gz .rar .tar .zip in downloads. Requires 7zip (p7zip-full and p7zip-rar on Ubuntu).

  1. python3 modelsof.py get_all_files jop

Union of files.csv and files in downloads. Produces out/jop/all_files.csv with file.

  1. python3 modelsof.py plot_files

Uses out/**/all_files.csv to produce distribution counts at out/files_dist.csv and out/files_by_datasets_dist.csv, then runs plots.R to produce out/files_dist.png and out/files_by_datasets_dist.png (whether a dataset contains a kind of file).

  1. python3 stata.py jop

Parses all .do files in out/jop/downloads and produces corresponding .do.json at out/jop/results/{year}/{dataset}/{file} as well as out/jop/files.json and out/jop/stats.json.

  1. python3 modelsof.py plot_commands

Uses out/**/stats.json to produce distribution counts at out/commands_dist.csv, then runs plots.R to produce out/commands_dist.png.

stats.json counts

Some prefix commands are run in isolation (not as a prefix). They are counted as len_prefix. Those prefix commands that are used as a prefix to another command are counted as len_prefix_as_prefix. The latter do not show up in overall counts (len).

The first item is a count of regression commands in all files. Given two commands:

svy: reg ...
reg ...

the count will be:

svy:reg = 1
reg = 1

The remaining items (counts per file) count prefix and "command" (regression or otherwise) separately except for the 'regressions' key, which works the same as the previous section.

errors

Some files have syntax errors. In the case of missing closing delimiters, they are closed. In the case of missing closing */, the comment is assumed to extend to the end of the file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages