Beste Aydemir

I am an MSc student in Data Science at Ludwig-Maximilian University of Munich.

My research interests are vision language models and their applications to embodied agents and autonomous driving.

Email  /  GitHub  /  LinkedIn

profile photo

Education

LMU Munich Logo Ludwig Maximilian University of Munich
Master of Science in Data Science, GPA: 1.3 (1 is the best)
  • Relevant Courses: Statistical Reasoning and Inference, Predictive Modeling, Advanced Deep Learning for Robotics
Bilkent University Logo Bilkent University
Bachelor of Science in Electrical and Electronics Engineering (Comprehensive Scholarship); GPA: 3.68/4
  • Relevant Courses: Probability and Statistics, Fundamental Structures of Computer Science, Neural Networks, Algorithms and Programming I-II, Stochastic Models
  • Signals and Systems, Linear Algebra and Differential Equations, Nonlinear Systems Theory

Projects

AViLA Project Image AViLA: Asynchronous Vision-Language Agent
arXiv Paper
  • Addressing the problem that in streaming multimodal data, questions and their answers can appear at different times
  • Introducing a three-module architecture: memory retention, evidence retrieval, and evidence-grounded trigger for answering at the right time
  • Including AnytimeVQA-1K, a new benchmark of 1,000 Q&A pairs over 189 long videos to test temporal awareness
TUM Oxford Project Image Navigation and Manipulation with Vision-Language Models
Collaboration between Technical University of Munich (TUM) and Oxford University
Report
  • Class-agnostic 3D instance masks generated using Mask3D + OpenMask3D pipelines, which use CLIP features and 2D SAM masks on SceneFun3D dataset
  • Building hierarchical open-vocabulary scene graph with CLIP embeddings, and using Qwen3-14B to extract contextual objects, spatial relations, and functional components
  • Using PartField (trained on Objaverse) to decompose point cloud into functional parts (e.g., handles, knobs, cushions).
  • Qwen2.5-VL-3B selects the correct functional part from segmented components using labeled 2D renderings
project image Autonomous Driving LLM-based Agent in Streaming Videos
Report
  • Utilized multimodal models such as InstructBLIP and Video-Llava for in-context learning on driving datasets
  • Implemented text based memory systems for LLM agents in streaming environments
ADLR Project Image Diffusion Robot Path Planning
Project Poster
  • Replication of "Planning with Diffusion for Flexible Behavior Synthesis" (Janner et al., 2022) in for 2D Maze environment in PyTorch
  • Extension of the paper to 3D environment of robotic arm with 7 DoF
  • Training on Google Cloud Engine

Experience

Huawei Logo Huawei Munich Research Center
Student Worker (November 2024 - April 2025)
  • Using video language models for autonomous driving tasks
  • Experiments on fine-grained action understanding of videos in traffic scenarios
  • Utilizing existing autonomous driving datasets to curate a more descriptive dataset to be used for retrieval
Fraunhofer Logo Fraunhofer
Student Worker (November 2023 - November 2024)
  • Simulation of centralized MPC control of multi-agent drone system for navigations, and experiments
  • Developing learning-based alternatives to bypass linear solvers of CasADi for speedup
UoE Logo The University of Edinburgh
Student Researcher (June 2020 - April 2024, Online)
  • Q-Learning based models for behavioral analysis in the Morris Water Maze experiment
  • Developed the cognitive value matric to assess the spatial navigation capabilities in the simulations
  • Featured a poster at the 49th European Brain and Behaviour Society (EBBS) Meeting

Design and source code from Jon Barron's website