Skip to content

Latest commit

 

History

History
38 lines (28 loc) · 3.02 KB

ADMIN_HOWTO_decide_on_Tool_Needs.md

File metadata and controls

38 lines (28 loc) · 3.02 KB

Choosing Disinformation Deployment Tools

Disinformation deployments are not all the same

CogSecCollab supports deployments of different sizes and types. Deployments we've run include:

  • single analyst working with open datasets, looking for patterns of behavior and extending knowledge about disinformation creators' assets.
  • Small team (4-6 people) collaborating on a short-term data task
  • Mid-sized team (20-50 people) collecting data and doing initial analysis on disinformation incidents lasting days
  • Large team (50+ people) going from collection to response

Some of the factors in tool and process choice will include:

  • Team size: big teams need different resources to keep them organised
  • Tempo: longer, more academic investigations will need different toolings to real-time responses.
  • Environment: expected output forms/ formats will affect tool choice, as will any tool constraints created by being embedded in a larger team or system
  • Localisation: is this a single language or multiple? If there are maps being used, do the map tools include the areas of interest (e.g. non-US regions, disputed territories etc)

Example: larger deployment

We will have n people distributed around the world, adding data to our system. That data is:

  • Instances of disinformation. Those instances are mostly going to be single examples of text, images, video or audio, with associated metadata, but could be groups of examples. The meta will need to include the date & time that an instance appeared, where it appeared (twitter, facebook, etc), user, group, etc it appeared from/in, hashtags and other information added to the instance (e.g. image descriptions), and a URL to the original instance if possible.
  • New disinformation narratives, e.g. “black people can’t catch Covid19”.
  • New hashtags and other high-level information associated with disinformation incidents.

We would like to take these instances into our system as raw data, and:

  • Add them to our datastore
  • Triage them
  • Where possible, add them to external storage, e.g. archive.org, archive.is so a record is kept if the original data is removed from e.g. social media

We will also have people in our communities who will add datasets to our system. That data is:

  • Results of searches for messages, images, video and audio associated with disinformation-related phrases, images, groups etc. Much of this data will be in a format specific to the social media channel it was scraped from, which is often json or csv formatted.
  • Datasets provided by external disinformation researchers, relevant to Covid19; e.g. datasets released by social media platforms and other providers.

We would like to take these datasets into our system as raw datasets, and:

  • Add them to our datastore
  • Convert them to a common format for analysis

After that, we still have analysis, action and dissemination (e.g. making data and results available to journalists and other researchers, in a way that the provenance is clear and the data is reasonably clean but raw data is still accessible to them if they need it)