Beste Aydemir
I am an MSc student in Data Science at Ludwig-Maximilian University of Munich.
My research interests are vision language models and their applications to embodied agents and autonomous driving.
Email /
GitHub /
LinkedIn
|
|
Education
|
Ludwig Maximilian University of Munich
Master of Science in Data Science, GPA: 1.3 (1 is the best)
- Relevant Courses: Statistical Reasoning and Inference, Predictive Modeling, Advanced Deep Learning for Robotics
|
|
Bilkent University
Bachelor of Science in Electrical and Electronics Engineering (Comprehensive Scholarship); GPA: 3.68/4
- Relevant Courses: Probability and Statistics, Fundamental Structures of Computer Science, Neural Networks, Algorithms and Programming I-II, Stochastic Models
- Signals and Systems, Linear Algebra and Differential Equations, Nonlinear Systems Theory
|
Projects
|
AViLA: Asynchronous Vision-Language Agent
arXiv Paper
- Addressing the problem that in streaming multimodal data, questions and their answers can appear at different times
- Introducing a three-module architecture: memory retention, evidence retrieval, and evidence-grounded trigger for answering at the right time
- Including AnytimeVQA-1K, a new benchmark of 1,000 Q&A pairs over 189 long videos to test temporal awareness
|
|
Navigation and Manipulation with Vision-Language Models
Collaboration between Technical University of Munich (TUM) and Oxford University
Report
- Class-agnostic 3D instance masks generated using Mask3D + OpenMask3D pipelines, which use CLIP features and 2D SAM masks on SceneFun3D dataset
- Building hierarchical open-vocabulary scene graph with CLIP embeddings, and using Qwen3-14B to extract contextual objects, spatial relations, and functional components
- Using PartField (trained on Objaverse) to decompose point cloud into functional parts (e.g., handles, knobs, cushions).
- Qwen2.5-VL-3B selects the correct functional part from segmented components using labeled 2D renderings
|
|
Autonomous Driving LLM-based Agent in Streaming Videos
Report
- Utilized multimodal models such as InstructBLIP and Video-Llava for in-context learning on driving datasets
- Implemented text based memory systems for LLM agents in streaming environments
|
|
Diffusion Robot Path Planning
Project Poster
- Replication of "Planning with Diffusion for Flexible Behavior Synthesis" (Janner et al., 2022) in for 2D Maze environment in PyTorch
- Extension of the paper to 3D environment of robotic arm with 7 DoF
- Training on Google Cloud Engine
|
Experience
|
Huawei Munich Research Center
Student Worker (November 2024 - April 2025)
- Using video language models for autonomous driving tasks
- Experiments on fine-grained action understanding of videos in traffic scenarios
- Utilizing existing autonomous driving datasets to curate a more descriptive dataset to be used for retrieval
|
|
Fraunhofer
Student Worker (November 2023 - November 2024)
- Simulation of centralized MPC control of multi-agent drone system for navigations, and experiments
- Developing learning-based alternatives to bypass linear solvers of CasADi for speedup
|
|
The University of Edinburgh
Student Researcher (June 2020 - April 2024, Online)
- Q-Learning based models for behavioral analysis in the Morris Water Maze experiment
- Developed the cognitive value matric to assess the spatial navigation capabilities in the simulations
- Featured a poster at the 49th European Brain and Behaviour Society (EBBS) Meeting
|
|