Skip to content

Commit

Permalink
Update 04-workflow.Rmd
Browse files Browse the repository at this point in the history
@fawda123 : Thank you for putting Section 4.1.3=5 together, I've attempted to generalize in some places to not put as much focus on ChatGPT's use. Please review my changes to see if you agree. Otherwise, thanks again for putting this together. Once complete, please feel free to pass along to the ANEP Directors and Science listserves for their input, comment, discussion.
  • Loading branch information
esherwoo77 committed Sep 21, 2023
1 parent 5650da6 commit d989d50
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions 04-workflow.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ The TBEP has a [group GitHub page](https://github.com/tbep-tech) where all of ou

### Use of artificial intelligence {#aiuse}

Applications that use artificial intelligence (AI) to generate seemingly novel content have received increased attention for their potential to improve, support, or even replace workflows. How these tools can be used or misused to support open science applications is worth discussing. In particular, the [ChatGPT](https://chat.openai.com/){target="_blank"} interface provided by the OpenAI research organization can be leveraged to support open science, in addition to supporting countless other applications. It's important to understand the advantages of these tools, while respecting their limitations and potential for misuse.
Applications that use artificial intelligence (AI) to generate seemingly novel content have received increased attention for their potential to improve, support, or even replace workflows. How these tools can be used or misused to support open science applications is worth discussing. In particular, the [ChatGPT](https://chat.openai.com/){target="_blank"} interface provided by the OpenAI research organization, as well as other emerging interfaces, can be leveraged to support open science, in addition to supporting countless other applications. It's important to understand the advantages of these tools, while respecting their limitations and potential for misuse.

"Artificial intelligence" (AI) is a generic term that the public and private spheres use loosely, much like "big data" was a buzzword a few years ago. AI doesn't describe a specific method or tool, but rather the collective body of research and applications focused on using computers to mimic human logic and reasoning. What makes ChatGPT notable among other AI applications is its ability to produce text-based content that reads like it was written by a human. This content is created based on a user prompt, such as a question or request. The response is, in all practicality, informed by all of the information on the internet.

Expand Down Expand Up @@ -144,15 +144,15 @@ provide real-time health insights.
etc.
```

And so on - the answer continues with a few more examples. It's apparent that the text goes well beyond the information you might get from a standard search engine. You can even ask ChatGPT to generate novel text for you, not just providing answers to questions. For example, you might ask ChatGPT to write a cover letter for you based on job application information copied/pasted in your prompt. The answer you receive will be highly tailored to the content contained in your prompt.
And so on - the answer continues with a few more examples. It's apparent that the text goes well beyond the information you might get from a standard search engine. You can even ask generative AI technologies to create novel text for you, not just providing answers to questions. For example, you might ask ChatGPT to write a cover letter for you based on job application information copied/pasted in your prompt. The answer you receive will be highly tailored to the content contained in your prompt.

It may seem like ChatGPT has achieved the holy grail of AI research, but the underlying model is not fundamentally different than any other predictive tool used in conventional statistics. ChatGPT uses a "large language model", which is fancy speak for a predictive model that is based on "learned" associations between words and grammar rules. By evaluating millions of lines of online text, certain words are found to be more likely associated with one another, e.g., "red ball" as compared to "red newspaper". Even grammar rules can be identified as emergent properties from text, so words that commonly appear together can be combined into sentences and even whole paragraphs. The response returned by a prompt is simply the best guess provided by the model of what is the most likely answer to your query. These models are not new - what's new is the computational power to train the models on very, very large datasets.
It may seem like new generative AI technologies are achieving the holy grail of AI research, but the underlying models are not fundamentally different than any other predictive tool used in conventional statistics. For example, ChatGPT uses a "large language model", which is fancy speak for a predictive model that is based on "learned" associations between words and grammar rules. By evaluating millions of lines of online text, certain words are found to be more likely associated with one another, e.g., "red ball" as compared to "red newspaper". Even grammar rules can be identified as emergent properties from text, so words that commonly appear together can be combined into sentences and even whole paragraphs. The response returned by a prompt is simply the best guess provided by the model of what is the most likely answer to your query. These models are not new - what's new is the computational power to train the models on very, very large datasets.

ChatGPT is not perfect and it's guaranteed that you will receive incorrect answers to questions. The answers will only be correct in so much as information on the internet is correct, which of course is a risky assumption. Therefore, interpreting answers from AI services like ChatGPT should be done with extreme caution. A more dangerous application is claiming its services as your own, such as the example above describing how to use ChatGPT to write a cover letter. This content is not a reflection of your abilities and using it in professional or acedemic settings could be considered plagiarism, in addition to risking your career.
Most, if not all, generative AI technologies are not perfect and it's guaranteed that you will receive incorrect answers to questions. The answers will only be correct in so much as the information the technologies rely upon is correct, which of course is a risky assumption when some of the modles rely on internet sources. Therefore, interpreting answers from AI services like ChatGPT should be done with extreme caution. A more dangerous application is claiming its services as your own, such as the example above describing how to use ChatGPT to write a cover letter. This content is not a reflection of your abilities and using it in professional or acedemic settings could be considered plagiarism, in addition to risking your career.

Dismissing ChatGPT as an immoral application that provides incorrect information would completely ignore its potential to enhance your own workflows. In reality, most of the information provided by ChatGPT is reasonably correct and most users are probably not applying it for nefarious purposes. Simply being aware of these caveats is requisite to its use. As such, the following describes how the TBEP is using ChatGPT to enhance our open science workflows without irresponsible application beyond its limitations.
Dismissing generative AI as an immoral application that provides incorrect information would completely ignore its potential to enhance your own workflows. In reality, most of the information provided by these technologies are reasonably correct and most users are probably not applying it for nefarious purposes. Simply being aware of these caveats is requisite to its use. As such, the following describes how the TBEP is using one generative AI technology (ChatGPT) to enhance our open science workflows without irresponsible application beyond its limitations.

ChatGPT is an outstanding resource for coding, particularly for open-source applications like R, where millions of lines of code are readily available for training AI. Further, identifying incorrect or inaccurate answers to code-based questions is much simpler than for answers to knowledge-based prompts. The code works or it doesn't. As such, the TBEP is currently using ChatGPT to assist with coding to support the dozens of open science applications used by the program. Many of these activities are considered routine and require minimal thought by the programmer, thereby freeing time to focus on more thoughtful coding applications. These activities are summarized below.
ChatGPT is an outstanding resource for computer coding, particularly for open-source applications like R, where millions of lines of code are readily available for training generative AI. Further, identifying incorrect or inaccurate answers to code-based questions is much simpler than for answers to knowledge-based prompts that mimic human linguistics. The code works or it doesn't. As such, the TBEP is currently using ChatGPT to assist with coding to support the dozens of open science applications used by the program. Many of these activities are considered routine and require minimal thought by the programmer, thereby freeing time to focus on more thoughtful coding applications. These activities are summarized below.

1. __Developing code templates__: Creating templates to begin more comprehensive applications, such as initial code for a plot or Shiny application.
1. __Assisting with package development__: Many aspects of package development are routine, such as writing code documentation or unit tests. Entire functions can be copied/pasted into a prompt for a request to generate supporting information. All answers are thoroughly reviewed for accuracy and verified they work as intended before use in a package.
Expand Down Expand Up @@ -196,7 +196,7 @@ $ Color_345_F45_PCU <chr> …
$ Color_345_F45_Q <chr> …
```

Many also worry that ChatGPT will outcompete humans for their jobs, particularly in the tech industry. This fear is unfounded (at least not at this time) - ChatGPT is not a replacement for the programmer with years of experience. Although it's true you can quickly get usable code through a simple request, this doesn't mean you'll understand everything the code does if you don't know the language. More importantly, it also means that you won't be able to debug or easily modify the code for your own use. Most code returned by ChatGPT includes generic placeholders or object names that require modification, preventing the code from running in isolation. The code may also simply include incorrect information. ChatGPT can sometimes create packages or function names that have never existed. These issues will be very difficult to diagnose without prior experience. The more important question you should be asking is how you can benefit from the positive aspects of ChatGPT to simplify your existing workflows.
Many also worry that generative AI technologies will outcompete humans for their jobs, particularly in the tech industry. This fear is unfounded (at least not at this time) - as these technologies are not a replacement for a programmer with years of experience. Although it's true you can quickly get usable code through a simple request, this doesn't mean you'll understand everything the code does if you don't know the language. More importantly, it also means that you won't be able to debug or easily modify the code for your own use. For example, most code returned by ChatGPT includes generic placeholders or object names that require modification, preventing the code from running in isolation. The code may also simply include incorrect information. Further, generative AI can sometimes create packages or function names that have never existed. These issues will be very difficult to diagnose without prior experience. The more important question you should be asking is how you can benefit from the positive aspects of the technology to simplify your existing workflows.

Tools like ChatGPT continue to evolve and how TBEP, as well as the broader scientific and management communities, engage with them will also change. This will require thorough understanding of their limitations and pitfalls before they can be used responsibly. Moral and ethical implications aside, these tools are here to stay and the responsibility of understanding how best to leverage these applications to improve efficiency in creating open science applications is paramount.

Expand Down

0 comments on commit d989d50

Please sign in to comment.