Skip to content

Code to add segmentation of message text for messages from the IETF mailarchive.

Notifications You must be signed in to change notification settings

sodestream/metadata_seg

Repository files navigation

metadata-seg

Code to add segmentation of message text for messages from the IETF mailarchive.

To generate segmentations install the ietfdata library available at this link.

Then checkout this repository and run:

./preprocess-seg-to-csv.sh

This may take some time (possibly a few hours). Then run:

`python preprocess-seg-to-db.py

This will write the results to the your mongodb mailarchive2 instance. You can now access the segmentations using this code:

ml_name = "100all"
ml = archive.mailing_list(ml_name)
if ml:
   threads_dict = ml.threads(this_list_only = True)
   for thread_root_msg in threads_dict.values():
       for msg in iterate_over_thread(thread_root_msg):                            
           # segmentations 
           for segment in get_segmentation(msg):
               print("Segment %d is quoting segment %d" % (segment.id, segment.antecedent))
               print()
               print("Content of segment:")
               print(segment.content)
               print()
               print("Type of segment:")
               print(segment.type)

About

Code to add segmentation of message text for messages from the IETF mailarchive.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published