LLM Research Reports


Mastering AI Chatbots: Optimizing Research through Effective Use


Mag. Dr. Hannah Metzler & Konstantin Hebenstreit, M.Sc.
Complexity Science Hub & Medical University of Vienna

Deep Research Mode vs. “academic” LLMs

Agentic in-depth research

Connected to scientific data bases

Let’s start our exercise for later: Exercise (10 min)

Define a research question in your area of expertise, where you can judge accuracy. Be specific on study population, methodology (RCT), type of study (meta-analyses), outcome variables, etc.

For example: “What are effects of social media use on well-being and mental health, including depression, anxiety and sleep? How does the effect differ between particular subgroups (like teenagers, or girls vs. boys, individuals with and without a mental health condition)? Does it matter how much individuals use social media (dose-dependent effects), and how they use it (for example, passive vs. active use)? Overall, please focus on how robust the evidence is and on what the highest-quality studies say. Prioritize meta-analyses and causal studies (RCTs, natural experiments) whenever they exist. Please include only articles from scientific journals. Please cite all studies in APA format and include a list of references at the end.”

Open a tab for each of these models & enter the prompt. Let the models compute while we continue with our content.

  1. Gemini 2.5 Pro (Not Deep Research) (or ChatGPT 4.5)
  2. Consensus and/or Dimensions GPT
  3. Gemini Deep Research
  4. ChatGT 4.5 Deep Research mode if you have a paid account
  5. Genspark.ai Deep Research

What is Deep Research Mode?

  • Agentic AI fine-tuned on reasoning models (e.g. o3)
  • Suggests & follows an (editable) multi-step research plan
  • Collects sources: Iterative autonomous web search
  • Reasons through information in multiple rounds
  • Synthesizes information: long & structured research reports
  • Your prompt really matters: Expensive compute & limits
  • Use all the prompting tips we talked about to write detailed & specific prompts.

Deep Research Mode

Limitations

  • factual errors will happen
  • not good at communicating uncertainty
  • not only scientific sources (except Perplexity)
    • but: requesting scientific sources only worked in ChatGPT for me
  • only open access sources

“Academic” LLMs

  • LLMs connected to scientific article databases
  • Database: Third party provider, only open access papers
  • References: only scientific sources

Limitations:

  • Reports much shorter
  • Often bullet point format
  • Often single citations
  • Quality of studies varies greatly

Retrieval Augmented Generation (RAG)

  • Retrieval: LLM can search external sources
  • Augmented: add external information as context
  • Generate an answer based on this augmented prompt.
  • Advantages:
    • Current information & specific data access
    • Less hallucinations
    • References

How do models identify relevant information/papers?

Each word or document is a vector of numbers

Word/Token/Document


Even

Embedding [-0.3185, 0.5976, 0.4817, 0.7306, -0.5938, -0.6372, 0.9381, -0.9165, -0.9396, 0.3540, 0.0262, -0.6131, 0.3634, -0.0391, -0.4732, -0.2341, -0.8044, -0.3637, -0.5958, -0.8710, 0.3722, -0.8544, -0.7819, -0.5487, -0.9314, 0.3949, -0.3168, -0.3363, -0.6973, -0.3789, 0.7200, -0.6201, -0.7010, -0.3735, 0.7437, -0.9795, -0.4916, 0.2130, 0.6817, 0.1972, 0.8518, -0.8700, -0.4013, -0.6310, -0.9597, 0.2763, -0.9173, 0.2900, -0.1896, 0.8286, -0.8617, 0.2566, 0.7024, -0.2448, 0.0994, -0.6664, -0.0699, -0.5830]

How do embeddings encode meaning?

Word embeddings

Semantic dimensions: gender & royalty

RAG:

Such embedding vectors of query and search findings are used for semantic search.

Ideas for use cases

by Andrew Stapleton

  • Doing literature reviews
  • Overview of existing research areas
  • Ask for research gaps
  • Brainstorm ideas in a research area
  • Grant applications
  • Ask about common methodologies for a specific question
  • Ask about the work of specific researchers
  • Translate your research for a general audience
  • Fact-check (very) specific claims

Andy Stapleton prompt ideas

For inspiration, there is zero guarantee that any of this produces any better result then prompts you would come with!

  • Doing regular literature reviews to keep up to date: Summarize the current research on […]. Include the main findings, major authors and cite recent key studies in the field.
  • Research gaps: Perform an analysis on […]. Find research gaps or open questions that need further investigation. Provide references to support these identified gaps.
  • Brainstorm some potential research questions about the effects of […] on […]. Provide a list of unique questions or hypotheses, each with a brief explanation or rationale drawn from existing studies.
  • Grant applications: Gather facts and references for a grant proposal. Outline the current challenges, why this problem is important solve, and how recent advances suggest our approach will work. Include key statistics…
  • Methodologies: I am planning a study on […]. Which methodologies are commonly used for […]. Compare approaches and summarize the best practices and recommendations from recent studies with references.
  • Keep up with new trends: Tell me about what new developments occurred in […] research in the past year. Highlight any breakthrough studies, new treatment trials, or changing theories. Provide a brief summary of each development along with the citation or source (journal or conference) where it was reported.
  • Asking about the work of specific researchers: summarize their most influential work and current research focus. Include a few example papers (or other contributions). Check if their research group has any recent publications I should be aware of.
  • Prepare a layperson-friendly explanation of my research on […]. [Provide your paper(s), presentations, any references). The goal is a 2-minute talk for a science festival. Explain what […] is and why it can improve […]. Also mention real-world examples (with sources) that highlight the potential of […].
  • Fact-checking: Check the following statement: [Biodiversity has declined by over 50% in the last 40 years.] Is this accurate and reflects current empirical evidence? If so, provide several sources.

Hannah’s preliminary conclusions

  • Excellent to get a first overview much more quickly, a great start for many research tasks. But a bit superficial.
  • Hallucinations: You need to check before you cite in your papers.
  • No actual critical thinking: summarizes majority opinions, will miss critical perspectives/constraints if you don’t highlight them, no strong arguments in the face of conflicting evidence.
  • Check the number and quality of sources. Often limited.
  • Ask Deep Research mode to include only academic sources
  • Only open access: Do additional research to not miss half of the research. (& publish open access!)

A possible iterative workflow

  • AI provides draft literature reviews, research summaries or research ideas
  • Apply traditional critical analysis to identify gaps, errors, or misrepresentations. Manually check references.
  • Return to AI with refined questions informed by deeper understanding
  • Repeat until achieving a comprehensive, accurate review

Back to the exercise output (15 min)

  • How accurate are the reports?
  • Do they reflect your expert knowledge on the field?
  • Which differences between models/services did you notice?
    • Quality of sources
    • Number of sources searched
    • Length
    • Format …
  • You can also use my example reports on the next slide.

Research report examples on social media & mental health

(Always check the prompt version, it evolved slightly between the different trials.)

Some impressions from comparing

Gemini

  • Intermediate search time: 5-10 min
  • Integration with Google drive
  • Large number of websites searched
  • Follow up questions create new reports

ChatGPT

  • Slowest: 15-30 minutes
  • Most comprehensive research capabilities
  • Deep complex topics
  • Most expensive

Consensus/Dimensions

  • very similar
  • short, but accurate

Genspark

  • high quality
  • Table of content
  • Supplementary resources (videos), similar knowledge section
  • Mixture of Agents: “best” combination of models for your specific research question (+ Creative output formats (e.g. Mindmaps)

Perplexity

  • Shorter reports: Extracts key facts
  • Pretty wide range of sources
  • You can ask follow up questions
  • Poor context window (memory)
  • Fewer sources
  • 30 minute search

Prompt suggestion for the future

  1. Context (your background and goal): what you already know, what you are researching.
  2. Central research question
  3. Specifications: time period, population, study type, output variables…
  4. Desired Report Output: Structure, Content Elements (Table, Graphs, Subtitles, Policy Implications, Case Studies…), Target length, Citation style
  5. Source preferences:
    • Prioritize: [e.g., “Peer-reviewed journals,” “Government reports,” “Reputable news sources,” “Industry analysis reports”]`
    • Avoid: [e.g., “Blog posts,” “Opinion pieces,” “Websites with known biases,” “Social media”]
    • Bias Considerations: [e.g., “Acknowledge potential biases in industry-funded research,” “Consider perspectives from multiple stakeholders,” “Prioritize sources with transparent methodologies”]