Bhaskar Ganesh Devalla (邓柏森) — NLP2CT, University of Macau

§ 01

About

关于

Profile 个人简介

I am a Master's student in Computer Science at the University of Macau, working under the supervision of Prof. Derek F. Wong at the NLP & Portuguese–Chinese Machine Translation Laboratory (NLP2CT). My research focuses on trustworthy large language models, with an emphasis on LLM-generated text detection, adversarial robustness, and safety alignment across multilingual and low-resource settings.

I am additionally interested in LLM agents and machine-translation evaluation, as well as the robustness and reliability of detectors across languages and model families. I am fortunate to be closely mentored by Junchao Wu, whose guidance has been instrumental in shaping my research trajectory.

I am actively looking for collaborations on trustworthy LLMs, multilingual NLP, and LLM-generated content detection — please feel free to reach out via email.

我是澳门大学计算机科学专业的硕士研究生，在自然语言处理与葡中机器翻译实验室（NLP2CT）由 Derek F. Wong 教授指导开展研究。我的研究方向聚焦于可信赖大语言模型，尤其关注大语言模型生成文本检测、对抗鲁棒性，以及多语言与低资源环境下的安全对齐。

我同时关注大语言模型智能体与机器翻译评估，以及检测器在不同语言和模型族之间的鲁棒性与可靠性。我有幸得到吴俊超师兄的悉心指导，他的建议对我的研究方向起到了关键作用。

我正在积极寻找合作，方向包括可信赖大语言模型、多语言自然语言处理以及大语言模型生成内容检测，欢迎通过邮件与我联系。

§ 02

Research Overview

研究概述

Thematic pillars 研究方向

I.

LLM-Generated Text Detection

大语言模型生成文本检测

Building cross-lingual benchmarks and adversarial stress tests for detectors across Brahmic-script languages, with a focus on robustness under generator shifts and adversarial perturbations.

构建面向婆罗米文字系语言的跨语言基准与对抗性压力测试，重点关注检测器在生成器切换与对抗扰动下的鲁棒性。

II.

Trustworthy & Multilingual NLP

可信赖与多语言自然语言处理

Adversarial robustness, safety alignment, and evaluation of large language models in low-resource languages (Hindi, Telugu, Tamil); machine-translation evaluation and failure triage.

大语言模型在低资源语言（印地语、泰卢固语、泰米尔语）中的对抗鲁棒性、安全对齐与评估；机器翻译评估与错误分析。

III.

LLM Agents & Evaluation

大语言模型智能体与评估

Reliable evaluation protocols for agentic LLM systems; reproducible experimentation pipelines combining Docker, MLflow, and W&B across HPC and cloud environments.

面向智能体大语言模型系统的可靠评估协议；基于 Docker、MLflow 与 W&B 的可复现实验流程，覆盖高性能计算与云端环境。

§ 03

News

动态

Latest updates 最新消息

May 20262026年5月

Submitted CultMetric, a gloss-grounded cultural-fidelity evaluation framework for machine translation, to ACL Rolling Review (ARR). 向 ACL Rolling Review (ARR) 投稿 CultMetric——一套基于词条释义的机器翻译文化忠实度评估框架。

Feb 20262026年2月

Submitted IndicDetect to ACL Rolling Review (ARR) — manuscript under review. 向 ACL Rolling Review (ARR) 投稿 IndicDetect——论文审稿中。

Sep 20252025年9月

Started MSc in Computer Science at the University of Macau; appointed Teaching Assistant for Discrete Structures and Software Project Management. 开始在澳门大学攻读计算机科学硕士学位；担任《离散结构》与《软件项目管理》课程助教。

May 20252025年5月

Graduated with B.Tech (Honors) in CSE from KL University — GPA 9.00/10.00, First Class with Distinction. 获 KL 大学计算机科学与工程荣誉工学学士学位——GPA 9.00/10.00，一等优异荣誉。

May 20242024年5月

Joined the NLP2CT Laboratory as an undergraduate exchange researcher. 加入 NLP2CT 实验室，担任本科交流研究员。

§ 04

Education

教育背景

Academic record 学业履历

University of Macau 澳门大学 Aug 2025 — Present2025年8月 — 至今

Master of Science in Computer Science

计算机科学理学硕士

Macau SAR, China

中国澳门特别行政区

GPA3.72 / 4.00 Supervisor 导师 Prof. Derek F. Wong

Koneru Lakshmaiah Education Foundation (KL University) Koneru Lakshmaiah 教育基金会（KL 大学） Aug 2021 — May 20252021年8月 — 2025年5月

Bachelor of Technology (Honors) in Computer Science and Engineering

计算机科学与工程荣誉工学学士

Guntur, India

印度贡土尔

CGPA9.00 / 10.00 Classification 等级 First Class with Distinction 一等优异荣誉

§ 05

Experience

经历

Research & Teaching 研究与教学

NLP2CT Laboratory · University of Macau NLP2CT 实验室 · 澳门大学 May 2024 — Present2024年5月 — 至今

Research Assistant (Prev. Undergraduate Exchange Researcher)

研究助理（原本科交流研究员）

Macau SAR, China

中国澳门特别行政区

Joined as an undergraduate exchange researcher in May 2024 and continued as a graduate researcher from August 2025, sustaining continuous contribution across two years of multilingual NLP and LLM evaluation projects.
Conceived and led IndicDetect, a cross-lingual benchmark for LLM-generated text detection across Hindi, Telugu, and Tamil; drove dataset construction, evaluation framework design, and analysis of zero-shot and fine-tuned detector performance.
Built CultMetric, a reference-free, gloss-grounded framework for evaluating cultural fidelity in machine translation, reproducing human rankings with a deterministic LLM-judge protocol.
Developed and optimised Transformer-based models (BERT, RoBERTa, GPT-style) for multilingual machine translation, specialising in failure triage and configuration tuning for low-resource language pairs.
Built automated evaluation pipelines (BLEU, ROUGE, BERTScore, WER) and containerised workflows with Git, Docker, MLflow, and Weights & Biases to streamline experimentation and ensure reproducibility.
Co-authored manuscripts submitted to ACL Rolling Review (ARR) 2026, contributing experimental design, writing, and result analysis.

2024 年 5 月以本科交流研究员身份加入，2025 年 8 月起继续以研究生身份开展研究，在两年间持续参与多语言自然语言处理与大语言模型评估项目。
主导并提出 IndicDetect——面向印地语、泰卢固语与泰米尔语的跨语言大语言模型生成文本检测基准；负责数据集构建、评估框架设计，以及零样本与微调检测器的性能分析。
构建 CultMetric——一套无参考译文、基于词条释义的机器翻译文化忠实度评估框架，通过确定性的大语言模型评判协议复现人工排名。
针对多语言机器翻译开发并优化 Transformer 模型（BERT、RoBERTa、GPT 系列），专注于低资源语对的错误分析与配置调优。
构建自动化评估流水线（BLEU、ROUGE、BERTScore、WER），并基于 Git、Docker、MLflow 与 Weights & Biases 搭建容器化工作流，以简化实验流程并确保可复现性。
合作撰写并向 ACL Rolling Review (ARR) 2026 投稿论文，参与实验设计、写作与结果分析。

University of Macau 澳门大学 Sep 2025 — Present2025年9月 — 至今

Teaching Assistant

助教

Macau SAR, China

中国澳门特别行政区

Facilitated weekly tutorials for undergraduate courses in Discrete Structures and Software Project Management, mentoring students in algorithmic complexity and systematic quality-assurance practice.
Engineered assessment frameworks and grading rubrics, applying test-coverage principles and edge-case analysis to evaluate student submissions.
Collaborated with lead faculty to design supplementary instructional materials and coordinate grading across large-scale computer-science classes.

为本科课程《离散结构》与《软件项目管理》主持每周辅导课，指导学生掌握算法复杂度与系统化质量保障实践。
设计评估框架与评分标准，运用测试覆盖率原则与边界情况分析评估学生作业。
与主讲教师协作设计辅助教学材料，并协调大规模计算机科学课程的评分工作。

CISC 1002 · Discrete Structures CISC 4002 · Software Project Management CISC 1002 · 离散结构 CISC 4002 · 软件项目管理

§ 06

Selected Projects

代表项目

Research output 研究成果

IndicDetect — Cross-Lingual LLM-Generated Text Detection Benchmark IndicDetect——跨语言大语言模型生成文本检测基准 Under Review · ACL ARR 2026 审稿中 · ACL ARR 2026

Sep 2025 — Feb 20262025年9月 — 2026年2月

Curated a benchmark of 84K human-written and LLM-generated samples across Hindi, Telugu, and Tamil, spanning four domains (academic, news, creative, movie reviews) using GPT-4.1, Qwen-Plus, and DeepSeek-v3.2.
Engineered seven Brahmic-script-aware adversarial attacks (paraphrase via back-translation, character perturbation, whitespace, insert-paragraph, alternative spelling, misspelling, synonym swap) to stress-test detector robustness.
Benchmarked eight detectors, including zero-shot statistical methods (Log-Likelihood, Log-Rank, LRR, FastDetectGPT, Binoculars) and supervised neural models (XLM-RoBERTa Base/Large, QLoRA fine-tuned Qwen 2.5-7B) across six evaluation settings.
Demonstrated that Qwen 2.5-7B achieved top average scores of 87.16 (Telugu), 85.74 (Hindi), 87.23 (Tamil), while zero-shot detectors degraded sharply under generator shifts and adversarial perturbations.

构建了涵盖印地语、泰卢固语与泰米尔语的 8.4 万条人写与大语言模型生成样本基准，跨越四个领域（学术、新闻、创意写作、影评），使用 GPT-4.1、Qwen-Plus 与 DeepSeek-v3.2 生成。
设计了七种面向婆罗米文字系的对抗性攻击（回译改写、字符扰动、空白字符、段落插入、替代拼写、拼写错误、同义词替换），用于压力测试检测器的鲁棒性。
在六个评估设置下对八种检测器进行基准测试，包括零样本统计方法（Log-Likelihood、Log-Rank、LRR、FastDetectGPT、Binoculars）与监督式神经模型（XLM-RoBERTa Base/Large、QLoRA 微调的 Qwen 2.5-7B）。
结果表明 Qwen 2.5-7B 取得最佳平均分——泰卢固语 87.16、印地语 85.74、泰米尔语 87.23，而零样本检测器在生成器切换与对抗扰动下表现显著下降。

CultMetric — Gloss-Grounded Cultural Fidelity Evaluation for Machine Translation CultMetric——基于词条释义的机器翻译文化忠实度评估 Under Review · ACL ARR 2026 审稿中 · ACL ARR 2026

Jan 2026 — May 20262026年1月 — 2026年5月

Designed a reference-free MT evaluation framework that grounds an LLM judge in a curated 872-entry glossary of culture-specific items, classifies failures into five typed categories, and produces a deterministic 0–100 score with bit-exact reproducibility.
Curated the CSI glossary from Classical Chinese source texts via LLM-assisted extraction and expert validation against authoritative scholarly translations, covering religious, social, material, ecological, and linguistic categories.
Evaluated four MT systems (GLM-5.1, DeepSeek-V4 Flash, Llama-3, Google Translate) on ~6,400 segments, reproducing the human cultural-fidelity ranking exactly — Spearman correlation 2.7× stronger than the best non-judge baseline.
Ran ablation studies across two independent judge models (GPT-4o, Qwen-3.6 Flash) and culturally-flattened paraphrase conditions, demonstrating ranking robustness across all configurations.

设计了无参考译文的机器翻译评估框架，将大语言模型评判器锚定于含 872 条文化特有项的人工词表，将翻译错误归为五类，并生成确定性的 0–100 分评分，具备逐位可复现性。
基于古汉语源文本，通过大语言模型辅助抽取并经专家对照权威学术译本校验，构建文化特有项（CSI）词表，覆盖宗教、社会、物质、生态与语言五类范畴。
在约 6,400 个句段上评估四套机器翻译系统（GLM-5.1、DeepSeek-V4 Flash、Llama-3、Google 翻译），完全复现人工文化忠实度排名，Spearman 相关性较最佳非评判器基线强 2.7 倍。
在两个独立评判模型（GPT-4o、Qwen-3.6 Flash）及文化扁平化改写条件下开展消融实验，验证排名在所有配置下的鲁棒性。

Emotion Aid — Emotion Speech Recognition for Disordered Speech Emotion Aid——针对障碍语音的情感语音识别 Undergraduate · Completed 本科 · 已完成

Aug — Dec 20232023年8月 — 12月

Built a CNN-LSTM pipeline for multi-class emotion recognition from disordered clinical speech, extracting MFCC, prosodic, and spectral features with Librosa and addressing class imbalance via pitch shifting, time stretching, and noise injection.
Automated the end-to-end evaluation pipeline (feature extraction, inference, multi-class reporting) and conducted systematic failure analysis across emotion categories to guide targeted augmentation.

构建 CNN-LSTM 流水线，用于从障碍临床语音中进行多类别情感识别，使用 Librosa 提取 MFCC、韵律与频谱特征，并通过音调变换、时间拉伸与噪声注入处理类别不平衡问题。
自动化端到端评估流水线（特征提取、推理、多类别报告），并对各情感类别进行系统性错误分析，以指导针对性数据增强。

§ 07

Publications

论文发表

Papers 论文

C = Conference · J = Journal · S = In Submission · T = Thesis · * = equal contribution

C = 会议 · J = 期刊 · S = 投稿中 · T = 学位论文 · * = 共同一作

S.1 In Submission · ACL ARR 2026 投稿中 · ACL ARR 2026
IndicDetect: Evaluating Cross-Lingual LLM-Generated Text Detection for Hindi, Telugu, and Tamil

IndicDetect：面向印地语、泰卢固语与泰米尔语的跨语言大语言模型生成文本检测评估

Bhaskar Ganesh Devalla, Greeshma Yaluru, Junchao Wu, Nilesh Dokuparthi, Tatiana Muniz Rodriguez, Lidia S. Chao, and Derek F. Wong. (2026).

Manuscript submitted to ACL Rolling Review (ARR).

论文已投稿至 ACL Rolling Review (ARR)。
S.2 In Submission · ACL ARR 2026 投稿中 · ACL ARR 2026
CultMetric: Gloss-Grounded Cultural Fidelity Evaluation for Machine Translation

CultMetric：基于词条释义的机器翻译文化忠实度评估

Bhaskar Ganesh Devalla, Yuting Zhong, Dejing Zhou, Junchao Wu, Shudong Liu, Lidia S. Chao, and Derek F. Wong. (2026).

Manuscript submitted to ACL Rolling Review (ARR).

论文已投稿至 ACL Rolling Review (ARR)。

§ 08

Technical Skills

技术技能

Competencies 专业能力

Programming Languages

编程语言

Python, C/C++, Java, SQL, R, Bash

Python、C/C++、Java、SQL、R、Bash

Deep Learning & LLM Frameworks

深度学习与大语言模型框架

PyTorch, TensorFlow / Keras, Hugging Face Transformers, QLoRA, Fairseq, OpenNMT; Transformer architectures (BERT, RoBERTa, XLM-RoBERTa, GPT-style); BPE, SentencePiece.

PyTorch、TensorFlow / Keras、Hugging Face Transformers、QLoRA、Fairseq、OpenNMT；Transformer 架构（BERT、RoBERTa、XLM-RoBERTa、GPT 系列）；BPE、SentencePiece。

NLP & Speech

自然语言处理与语音

NLTK, SpaCy, SpeechBrain, ESPnet, Librosa; MFCC, prosodic, and spectral feature extraction; CNN-LSTM classifiers; cross-lingual benchmarking; low-resource NLP.

NLTK、SpaCy、SpeechBrain、ESPnet、Librosa；MFCC、韵律及频谱特征提取；CNN-LSTM 分类器；跨语言基准测试；低资源自然语言处理。

Evaluation Metrics

评估指标

AUROC, F1, BLEU, ROUGE, BERTScore, WER.

AUROC、F1、BLEU、ROUGE、BERTScore、WER。

MLOps & Reproducibility

MLOps 与可复现性

MLflow, Weights & Biases, Docker, Git, CI pipeline integration; pytest and shell-based test automation.

MLflow、Weights & Biases、Docker、Git、CI 流水线集成；pytest 与基于 shell 的测试自动化。

Data & Analysis

数据与分析

pandas, NumPy, SciPy, scikit-learn, Matplotlib, Seaborn.

pandas、NumPy、SciPy、scikit-learn、Matplotlib、Seaborn。

Developer Tools

开发工具

Linux / HPC cluster environments, Google Cloud Platform, LaTeX, Jupyter, VS Code.

Linux / 高性能计算集群环境、Google Cloud Platform、LaTeX、Jupyter、VS Code。

Languages

语言

English (Fluent) · Telugu (Native) · Hindi (Fluent) · Mandarin (Conversational).

英语（流利）· 泰卢固语（母语）· 印地语（流利）· 普通话（日常交流）。

§ 09

Relevant Coursework

Academic Service

学术服务

Reviewing 审稿工作

Student Reviewer

学生审稿人

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) — reviewed three manuscripts under the supervision of Prof. Suryakanth V. Gangashetty. 2023 — 2024

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)——在 Suryakanth V. Gangashetty 教授指导下审阅了三篇论文。 2023 — 2024

§ 11

About

关于

Research Overview

研究概述

News

动态

Education

教育背景

Experience

经历

Selected Projects

代表项目

Publications

论文发表

Technical Skills

技术技能

Relevant Coursework

相关课程

Academic Service

学术服务

References

推荐人