Nguyễn Ngọc Song Thương

Computer Science researcher working at the intersection of machine learning and the systems that put it to use: recommender systems, knowledge-graph reasoning, and LLM evaluation and serving.

Email GitHub LinkedIn CV

About

I am a recent Computer Science graduate of Ho Chi Minh University of Technology – VNUHCM (HCMUT). Most recently I have studied the accuracy–diversity trade-off to tackle popularity bias in knowledge-graph conversational recommenders, which advanced into a peer-reviewed, co-first-author paper. I also build and evaluate large-scale AI and LLM systems in production as a software engineer.

I am applying to graduate research programs and am most excited by problems involving data-intensive systems, information retrieval, and integration with language models.

Recommendation systems Information retrieval Data-intensive systems Language models

Education

Ho Chi Minh University of Technology (HCMUT–VNU) 2022 – Jun 2026

B.S. Computer Science — GPA 4.0/4.0 (9.18/10)

Academic Incentive Scholarship for 7 consecutive semesters (top 5% of cohort by GPA).

View transcript

Relevant coursework: Probability & Statistics (10), Big Data (9.9), Data Mining (9.0), Natural Language Processing (8.8), Linear Algebra (8.8), Machine Learning (8.2).

Le Hong Phong High School for the Gifted 2019 – 2022

Major in Mathematics

Research

Programming Integration Project Sep 2024 – Dec 2024

Survey of session-based recommendation systems

Individual project

Recommender systems Sequential models Ranking metrics

Reviewed sequential and session-based recommendation system, evaluated how a sequential recommender (Bert4Rec) performs when user history is a short session rather than a long history.

Report

Multidisciplinary Project Jan 2025 – May 2025

Pretrained SASRec for conversational recommendation

Conversational recommendation Pretrained models Transfer learning

Studied transfer learning in conversational recommendation: compared learning item representations from scratch against reusing pretrained ones. Found that pretraining improves accuracy at the top ranks but weakens it deeper in the recommendation list.

Slides

Specialized Project Jun 2025 – Dec 2025

Knowledge-graph integration and diversity in CRS

Knowledge graphs Diversity Literature review

Reviewed knowledge-graph conversational recommenders across the two main types: attribute-based and dialogue-based. Compared how each handles the accuracy–diversity trade-off and found that diversity is largely unaddressed in dialogue-based models — the gap my thesis targets.

Report

Undergraduate Thesis Sep 2025 – Jun 2026

Diversity Approach to KG Conversational Recommenders

Team of 2 - Scored 9.7/10 - Advisor: A/Prof. Thoai Nam

Diversity Reinforcement learning Knowledge graphs

Diversity measures — catalog coverage, novelty, intra-list diversity — cannot be optimised directly by standard training, so models learn to ignore them. I proposed two methods that bring diversity into training, tested on six knowledge-graph CRS models across two benchmarks:

Soft-Rank Diversity (SRD) — a diversity-aware training loss that can be optimised directly, so the model learns diversity alongside accuracy.
DivKG — a reinforcement-learning fine-tuning step that rewards coverage and novelty while keeping accuracy stable.

+20–40%

catalog coverage

~95%

of baseline accuracy retained

6 × 2

backbones × benchmarks

Thesis report Defense slides Publication

Publications

SOMET 2026 · Intl. Conference on Intelligent Software Methodologies, Tools and Techniques

Soft-Rank Diversity: Diversity Regularisation for Multi-hop Knowledge Graph Reasoning in Conversational Recommendation

Nguyễn Ngọc Song Thương*, Nguyễn Hữu Thanh* (*equal contribution / co-first author)

Peer-reviewed Accepted — Full paper Co-first author

Selected Projects

Team of 3 Sep 2025 – Dec 2025

Personalized News Recommender System

A lazy-loading approach to training with large-scale interaction data

Turned user-behavior logs into ranking signals to predict engagement over large-scale user–article interactions.
Avoided materializing a ~125k × 1M item–user similarity matrix (~500 GB) by delegating nearest-neighbor retrieval to Elasticsearch vector search and lazy-loading embeddings on demand.
Served low-latency real-time inference through a Kafka + Redis pipeline.

Report GitHub

Team of 5 Sep 2025 – Dec 2025

Brain Tumor Classification: Traditional ML to Deep Learning

A controlled comparison across the ML spectrum

Implemented and benchmarked models from classical machine learning through deep learning on a brain-tumor image-classification task.
Evaluated each method on suitability and accuracy rather than raw performance alone, characterizing the trade-offs across the spectrum.

GitHub

Experience

East Agile — Software Engineer Jun 2025 – Present

Rememberizer — agentic document search (Jun 2025 – Jan 2026)

Replaced manual relevance review with an automated LLM-as-evaluator pipeline (OpenAI & Anthropic SDKs) that scores retrieval quality at scale.
Enhanced the document-search MCP servers used by Claude through clearer system/user prompt engineering and improved server–backend I/O compatibility.
Refactored the embedding pipeline to embed chunks from multiple documents in parallel, improving throughput.
Diagnosed and fixed production bugs across on-premise and AWS infrastructure

TalkDoc — speech-based depression analysis (Feb 2026 – Present)

Built a depression-analysis quiz web app that submits user speech via presigned URLs to a model on AWS serverless.
Cut the model's memory footprint to fit a 6 GB serverless limit via feature-importance pruning, retaining 90% of baseline accuracy.
Automated MongoDB backups on AWS EC2 with scheduled daily and incremental dumps.

Technical Skills

Languages

Python · C/C++ · SQL

ML & Frameworks

PyTorch · Hugging Face · scikit-learn · pandas · OpenAI SDK · Anthropic SDK · Django · FastAPI · ReactJS

Data & Tools

PostgreSQL (pgvector) · MySQL · Redis · Elasticsearch · Kafka · BigQuery · Docker · Git · LaTeX

Languages (human)

English (IELTS 8.0) · Vietnamese (native)

Awards & Activities

Academic Incentive Scholarship — HCMUT7 consecutive semesters

Awarded for seven consecutive semesters to students in the top 5% of the cohort by GPA.

Busan Global Data HackathonJul 2025

Built data-driven shipping-load insights for Busan Ship Transport Co. in a Korean–Vietnamese team.

Big Data Club — Member · HPCC LabSep 2024 – Present

Contributing to big-data and high-performance-computing research projects.