Ethical considerations

Mastering AI Chatbots: Optimizing Research through Effective Use

Konstantin Hebenstreit

2024-10-29

Ethics in using AI for research

General topics of ethics an AI
What it means for research

Reminder: Chatbots Are Hard to Control

AI companies invest significant effort to ensure their chatbots behave ethically.

For example, chatbots should not assist users in harmful or destructive behaviors, but challenges remain…

Example: Jailbreaking

Jailbreaking: Techniques to bypass AI safety measures.

Bias and Fairness in AI Outputs

AI can perpetuate societal biases present in training data.
Examples:
- Associating “doctor” with male gender.
- Associating “nurse” with female gender.

Unintended Consequences of Debiasing

Gemini: “generate a picture of a US senator from the 1800s.”

Transparency

“Black Box”: Opaque how models work.
Even for “white box” open-weights Llama models not open on which data they are trained.

Accountability

Example:

2023: Two US lawyers fined for (unknowingly) submitting court cases as evidence made up by ChatGPT, leading to a $5,000 fine.

Data privacy concerns

Training data extraction: reconstruct training data from model outputs
LLMs can combine large amounts of ‘harmless’ user data to predict sensitive attributes

Copyright

is a passive right, applies to everything you put online
bound to a person
AI cannot be holder of copyright, as it is not a person

Example of a copyright violation:

Getty images sued midjourney (AI image creation) for training on their data.

What does this mean for research?

AI use in research

Increasing use of LLMs in paper writing

ICLR conference using AI to help reviewers

1. Encouraging reviewers to rephrase vague review comments, making them more actionable for the authors.
2. Highlighting sections of the paper that may already address some of the reviewer’s questions.
3. Identifying and addressing unprofessional or inappropriate remarks in the review.

The feedback system will not replace any human reviewers.

AI in Journal Policies

Policies are evolving rapidly in response to AI developments.
Always review journal guidelines before submitting your work.

Elsevier’s Policies for Journals

This is a summary, for details please check the link to their website.

For Authors:
- AI tools can only be used to improve language and readability.
- Must disclose AI use in manuscripts.
- AI cannot be listed as an author.
Figures & Images:
- AI cannot alter or create images (except if part of research methods).
- AI use in research must be documented in methods.

Elsevier’s Policies for Journals 2

For Reviewers:
- Do not upload manuscripts to AI tools (confidentiality breach).
- AI should not assist in peer review.
For Editors:
- AI tools must not be used to evaluate or make decisions on manuscripts.
- Maintain confidentiality in all communications.

MedUni Vienna PhD thesis guideline

“A doctoral thesis is expected to contain text that is the intellectual product of the student. If, for any reason, a student feels that he or she must rely on an AI algorithm or a writing enhancement tool (Chat-GPT, Grammarly, Google Translate, DeepL, etc.) to translate, generate, or paraphrase texts, this must be indicated as such with an appropriate citation at the end of the relevant paragraph and in the Bibliography. While this indication will serve to defend the student against any claims of reviewer deception, there is no guarantee that the reviewers will be appreciative of ample AI-generated texts in the thesis. Accordingly, theses relying extensively on AI may fail to receive a positive evaluation by the reviewers. Students should note AI-detection algorithms will still recognise machine-generated texts even after being paraphrased by the authors, and that subsequent revised thesis submissions remain marked. Hence, students should be aware that relying on AI may incurr a risk, which is solely their own, and it therefore is strongly recommended that they disclose all AI use in the Declaration, the List of Algorithms, the body of the text, and the Bibliography. For more information, please see https://apastyle.apa.org/blog/how-to-cite-chatgpt”

Implications for Academic Research

Hallucinations:
- Citation
- Wrong information
Plagiarism issues:
- Altough verbatim repetitions of text are very rare in LLMs, the use of a plagiarism checker is advised.

What happens with your input data?

AI companies are not transparent about this

Model training: most data is used for future model training at OpenAI/GPT & Google/Gemini (not Anthropic/Claude)
Security check of all data for safety violations

Opt out of (1) in Settings - Data Controls:
- Turn off the option “Improve model for everyone”
Claude does not train on your data by default
Security checks happen anyways - human review

General Data Protection Regulation

For all individuals within the EU
Avoid entering sensitive data that identifies individuals
Turning off model training on your inputs does not imply GDPR compliance.
If you are using LLMs to process personal data of subjects, please consult experts from your university. Especially for everything that concerns patient data.

Possible Future

How AI will be used in research

Self-driving car

Autonomous research systems

Human accountability