deepresearch

LLM Research Reports

Mastering AI Chatbots: Optimizing Research through Effective Use

Mag. Dr. Hannah Metzler & Konstantin Hebenstreit, M.Sc.
Complexity Science Hub & Medical University of Vienna

Deep Research Mode vs. “academic” LLMs

Agentic in-depth research

Gemini: 10/month for free
ChatGPT Plus only: 30/month + 15 with lightweight version
Genspark (free for now)
Grok3
Perplexity

Connected to scientific data bases

Let’s start our exercise for later: Exercise (10 min)

Define a research question in your area of expertise, where you can judge accuracy. Be specific on study population, methodology (RCT), type of study (meta-analyses), outcome variables, etc.

For example: “What are effects of social media use on well-being and mental health, including depression, anxiety and sleep? How does the effect differ between particular subgroups (like teenagers, or girls vs. boys, individuals with and without a mental health condition)? Does it matter how much individuals use social media (dose-dependent effects), and how they use it (for example, passive vs. active use)? Overall, please focus on how robust the evidence is and on what the highest-quality studies say. Prioritize meta-analyses and causal studies (RCTs, natural experiments) whenever they exist. Please include only articles from scientific journals. Please cite all studies in APA format and include a list of references at the end.”

Open a tab for each of these models & enter the prompt. Let the models compute while we continue with our content.

Gemini 2.5 Pro (Not Deep Research) (or ChatGPT 4.5)
Consensus and/or Dimensions GPT
Gemini Deep Research
ChatGT 4.5 Deep Research mode if you have a paid account
Genspark.ai Deep Research

What is Deep Research Mode?

Agentic AI fine-tuned on reasoning models (e.g. o3)
Suggests & follows an (editable) multi-step research plan
Collects sources: Iterative autonomous web search
Reasons through information in multiple rounds
Synthesizes information: long & structured research reports
Your prompt really matters: Expensive compute & limits
Use all the prompting tips we talked about to write detailed & specific prompts.

Deep Research Mode

Limitations

factual errors will happen
not good at communicating uncertainty
not only scientific sources (except Perplexity)
- but: requesting scientific sources only worked in ChatGPT for me
only open access sources

“Academic” LLMs

LLMs connected to scientific article databases
Database: Third party provider, only open access papers
References: only scientific sources

Limitations:

Reports much shorter
Often bullet point format
Often single citations
Quality of studies varies greatly

Retrieval Augmented Generation (RAG)

Retrieval: LLM can search external sources
Augmented: add external information as context
Generate an answer based on this augmented prompt.
Advantages:
- Current information & specific data access
- Less hallucinations
- References

How do models identify relevant information/papers?

Each word or document is a vector of numbers

Word/Token/Document

Even …

→

Embedding [-0.3185, 0.5976, 0.4817, 0.7306, -0.5938, -0.6372, 0.9381, -0.9165, -0.9396, 0.3540, 0.0262, -0.6131, 0.3634, -0.0391, -0.4732, -0.2341, -0.8044, -0.3637, -0.5958, -0.8710, 0.3722, -0.8544, -0.7819, -0.5487, -0.9314, 0.3949, -0.3168, -0.3363, -0.6973, -0.3789, 0.7200, -0.6201, -0.7010, -0.3735, 0.7437, -0.9795, -0.4916, 0.2130, 0.6817, 0.1972, 0.8518, -0.8700, -0.4013, -0.6310, -0.9597, 0.2763, -0.9173, 0.2900, -0.1896, 0.8286, -0.8617, 0.2566, 0.7024, -0.2448, 0.0994, -0.6664, -0.0699, -0.5830]

How do embeddings encode meaning?

Word embeddings

Semantic dimensions: gender & royalty

RAG:

Such embedding vectors of query and search findings are used for semantic search.

RAG with semantic search

Ideas for use cases

by Andrew Stapleton

Doing literature reviews
Overview of existing research areas
Ask for research gaps
Brainstorm ideas in a research area
Grant applications
Ask about common methodologies for a specific question
Ask about the work of specific researchers
Translate your research for a general audience
Fact-check (very) specific claims

Andy Stapleton prompt ideas

For inspiration, there is zero guarantee that any of this produces any better result then prompts you would come with!

Doing regular literature reviews to keep up to date: Summarize the current research on […]. Include the main findings, major authors and cite recent key studies in the field.
Research gaps: Perform an analysis on […]. Find research gaps or open questions that need further investigation. Provide references to support these identified gaps.
Brainstorm some potential research questions about the effects of […] on […]. Provide a list of unique questions or hypotheses, each with a brief explanation or rationale drawn from existing studies.
Grant applications: Gather facts and references for a grant proposal. Outline the current challenges, why this problem is important solve, and how recent advances suggest our approach will work. Include key statistics…
Methodologies: I am planning a study on […]. Which methodologies are commonly used for […]. Compare approaches and summarize the best practices and recommendations from recent studies with references.
Keep up with new trends: Tell me about what new developments occurred in […] research in the past year. Highlight any breakthrough studies, new treatment trials, or changing theories. Provide a brief summary of each development along with the citation or source (journal or conference) where it was reported.
Asking about the work of specific researchers: summarize their most influential work and current research focus. Include a few example papers (or other contributions). Check if their research group has any recent publications I should be aware of.
Prepare a layperson-friendly explanation of my research on […]. [Provide your paper(s), presentations, any references). The goal is a 2-minute talk for a science festival. Explain what […] is and why it can improve […]. Also mention real-world examples (with sources) that highlight the potential of […].
Fact-checking: Check the following statement: [Biodiversity has declined by over 50% in the last 40 years.] Is this accurate and reflects current empirical evidence? If so, provide several sources.

Hannah’s preliminary conclusions

Excellent to get a first overview much more quickly, a great start for many research tasks. But a bit superficial.
Hallucinations: You need to check before you cite in your papers.
No actual critical thinking: summarizes majority opinions, will miss critical perspectives/constraints if you don’t highlight them, no strong arguments in the face of conflicting evidence.
Check the number and quality of sources. Often limited.
Ask Deep Research mode to include only academic sources
Only open access: Do additional research to not miss half of the research. (& publish open access!)

A possible iterative workflow

AI provides draft literature reviews, research summaries or research ideas
Apply traditional critical analysis to identify gaps, errors, or misrepresentations. Manually check references.
Return to AI with refined questions informed by deeper understanding
Repeat until achieving a comprehensive, accurate review

Back to the exercise output (15 min)

How accurate are the reports?
Do they reflect your expert knowledge on the field?
Which differences between models/services did you notice?
- Quality of sources
- Number of sources searched
- Length
- Format …
You can also use my example reports on the next slide.

Some impressions from comparing

Gemini

Intermediate search time: 5-10 min
Integration with Google drive
Large number of websites searched
Follow up questions create new reports

ChatGPT

Slowest: 15-30 minutes
Most comprehensive research capabilities
Deep complex topics
Most expensive

Consensus/Dimensions

very similar
short, but accurate

Genspark

high quality
Table of content
Supplementary resources (videos), similar knowledge section
Mixture of Agents: “best” combination of models for your specific research question (+ Creative output formats (e.g. Mindmaps)

Perplexity

Shorter reports: Extracts key facts
Pretty wide range of sources
You can ask follow up questions
Poor context window (memory)
Fewer sources
30 minute search

Prompt suggestion for the future

Context (your background and goal): what you already know, what you are researching.
Central research question
Specifications: time period, population, study type, output variables…
Desired Report Output: Structure, Content Elements (Table, Graphs, Subtitles, Policy Implications, Case Studies…), Target length, Citation style
Source preferences:
- Prioritize: [e.g., “Peer-reviewed journals,” “Government reports,” “Reputable news sources,” “Industry analysis reports”]`
- Avoid: [e.g., “Blog posts,” “Opinion pieces,” “Websites with known biases,” “Social media”]
- Bias Considerations: [e.g., “Acknowledge potential biases in industry-funded research,” “Consider perspectives from multiple stakeholders,” “Prioritize sources with transparent methodologies”]