ai4se
AI for Software Engineering
My research investigates how artificial intelligence, and especially Large Language Models, can be applied to core software engineering tasks: code generation, translation, testing, verification, and trustworthiness assessment. A central thread is rigorous benchmarking: building the evaluation frameworks needed to measure what AI tools can and cannot do reliably in real engineering contexts. I also write opinion and perspective pieces on what the AI revolution means for software quality, human expertise, and professional responsibility.
Opinion & Perspectives
Thoughts, analyses, and reflections on the impact of artificial intelligence on the software engineering profession.
Teaching Software Engineering in the Age of Generative AI: What Is Really at Stake?
Read Full OpinionWhy We Should Trust Systems, Not Just Their AI/ML Components
Read at IEEE ComputerLeveraging LLMs for Trustworthy Software Engineering: Insights and Challenges
Read at IEEE ComputerCourses, Keynotes & Tutorials
University courses, presentations, and lectures on the intersection of generative AI and software engineering.
Benchmarking GenAI for Software Engineering: Challenges and Insights
View details & presentationLLMs for Trustworthy Software Engineering: Insights and Challenges
View details & presentationResearch Papers & Frameworks
Selected works on AI and ML applied to software engineering tasks. Full list available on the publications page.
Open Research Artifacts
Interactive dashboards and open datasets published alongside our accepted papers — freely accessible for replication and further research.
Polyglot Code Translation Dashboard
Polyglot · ASE 2025
Interactive exploration of LLM performance across code translation tasks. Filter by model, prompting strategy, and problem complexity across multiple target languages.
TestForge Benchmark Dashboard
TestForge · SANER 2026
Benchmarking results for LLM-based test case generation. Explore model performance across different programming languages, test oracle types, and generation strategies.
PROBE Benchmark Dataset
PROBE · EMSE 2026
Raw benchmark data for evaluating code generation quality in LLMs, covering syntactic correctness, execution reliability, and semantic preservation metrics across multiple models.