Skip to content

Commit

Permalink
shorten methodology
Browse files Browse the repository at this point in the history
  • Loading branch information
caimeng2 committed Mar 5, 2024
1 parent bdf3145 commit 522f731
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Please run `example.ipynb` to see more example usage.

## Methodology

In an era of large language models, `seesus` chooses to use predefined regular expression patterns instead of machine learning for text classification, because this method is more transparent, replicable, and controllable. Users of `seesus` can examine the matching logic and customize the syntax if necessary, so users can always understand and maintain control over the results. The regular expression syntax was developed for the 17 SDGs and the 169 SDG targets, including both direct and indirect matching. The accuracy of the matching syntax was manually tested, reviewed, and improved using randomly selected statements from corporate reports. Three rounds of adjustments were conducted to finalize the syntax. `seesus` achieves an accuracy rate of 76%, as determined by alignment with manual coding. Human intercoder agreement on the same text stands at 83%. Considering the inherent ambiguity and complexity of language, as well as the interconnected nature of the SDGs, the accuracy of `seesus` is rather high. Detailed information on the accuracy evaluation and manual refinement can be found in <a target="_blank" href="https://github.com/Yingjie4Science/SDGdetector">`SDGdector`</a>.
In an era of large language models, `seesus` chooses to use predefined regular expression patterns instead of machine learning, because this method is more transparent, replicable, and controllable. Users of `seesus` can examine the matching logic and customize the syntax if necessary, so users can always understand and maintain control over the results. The regular expression syntax was developed for the 17 SDGs and the 169 SDG targets, including both direct and indirect matching. The accuracy of the matching syntax was manually tested, reviewed, and improved using randomly selected statements from corporate reports. Three rounds of adjustments were conducted to finalize the syntax. `seesus` achieves an accuracy rate of 76%, as determined by alignment with manual coding. Human intercoder agreement on the same text stands at 83%. Considering the inherent ambiguity and complexity of language, as well as the interconnected nature of the SDGs, the accuracy of `seesus` is rather high. Detailed information on the accuracy evaluation and manual refinement can be found in <a target="_blank" href="https://github.com/Yingjie4Science/SDGdetector">`SDGdector`</a>.


## Maintenance
Expand Down

0 comments on commit 522f731

Please sign in to comment.