Nguyễn Ngọc Song Thương

Computer Science researcher working at the intersection of machine learning and the systems that put it to use: recommender systems, knowledge-graph reasoning, and LLM evaluation and serving.

Nguyễn Ngọc Song Thương
About

I am a recent Computer Science graduate of Ho Chi Minh University of Technology – VNUHCM (HCMUT). Most recently I have studied the accuracy–diversity trade-off to tackle popularity bias in knowledge-graph conversational recommenders, which advanced into a peer-reviewed, co-first-author paper. I also build and evaluate large-scale AI and LLM systems in production as a software engineer.

I am applying to graduate research programs and am most excited by problems involving data-intensive systems, information retrieval, and integration with language models.

Recommendation systems Information retrieval Data-intensive systems Language models
Education
Research
Programming Integration Project Sep 2024 – Dec 2024

Survey of session-based recommendation systems

Individual project
Recommender systems Sequential models Ranking metrics

Reviewed sequential and session-based recommendation system, evaluated how a sequential recommender (Bert4Rec) performs when user history is a short session rather than a long history.

Multidisciplinary Project Jan 2025 – May 2025

Pretrained SASRec for conversational recommendation

Conversational recommendation Pretrained models Transfer learning

Studied transfer learning in conversational recommendation: compared learning item representations from scratch against reusing pretrained ones. Found that pretraining improves accuracy at the top ranks but weakens it deeper in the recommendation list.

Specialized Project Jun 2025 – Dec 2025

Knowledge-graph integration and diversity in CRS

Knowledge graphs Diversity Literature review

Reviewed knowledge-graph conversational recommenders across the two main types: attribute-based and dialogue-based. Compared how each handles the accuracy–diversity trade-off and found that diversity is largely unaddressed in dialogue-based models — the gap my thesis targets.

Undergraduate Thesis Sep 2025 – Jun 2026

Diversity Approach to KG Conversational Recommenders

Team of 2 - Scored 9.7/10 - Advisor: A/Prof. Thoai Nam
Diversity Reinforcement learning Knowledge graphs

Diversity measures — catalog coverage, novelty, intra-list diversity — cannot be optimised directly by standard training, so models learn to ignore them. I proposed two methods that bring diversity into training, tested on six knowledge-graph CRS models across two benchmarks:

  • Soft-Rank Diversity (SRD) — a diversity-aware training loss that can be optimised directly, so the model learns diversity alongside accuracy.
  • DivKG — a reinforcement-learning fine-tuning step that rewards coverage and novelty while keeping accuracy stable.
+20–40%
catalog coverage
~95%
of baseline accuracy retained
6 × 2
backbones × benchmarks
Publications
SOMET 2026 · Intl. Conference on Intelligent Software Methodologies, Tools and Techniques
Soft-Rank Diversity: Diversity Regularisation for Multi-hop Knowledge Graph Reasoning in Conversational Recommendation
Nguyễn Ngọc Song Thương*, Nguyễn Hữu Thanh*  (*equal contribution / co-first author)
Peer-reviewed Accepted — Full paper Co-first author
Selected Projects
Team of 3 Sep 2025 – Dec 2025

Personalized News Recommender System

A lazy-loading approach to training with large-scale interaction data

  • Turned user-behavior logs into ranking signals to predict engagement over large-scale user–article interactions.
  • Avoided materializing a ~125k × 1M item–user similarity matrix (~500 GB) by delegating nearest-neighbor retrieval to Elasticsearch vector search and lazy-loading embeddings on demand.
  • Served low-latency real-time inference through a Kafka + Redis pipeline.
Team of 5 Sep 2025 – Dec 2025

Brain Tumor Classification: Traditional ML to Deep Learning

A controlled comparison across the ML spectrum

  • Implemented and benchmarked models from classical machine learning through deep learning on a brain-tumor image-classification task.
  • Evaluated each method on suitability and accuracy rather than raw performance alone, characterizing the trade-offs across the spectrum.
Experience
East Agile — Software Engineer Jun 2025 – Present

Rememberizer — agentic document search (Jun 2025 – Jan 2026)

  • Replaced manual relevance review with an automated LLM-as-evaluator pipeline (OpenAI & Anthropic SDKs) that scores retrieval quality at scale.
  • Enhanced the document-search MCP servers used by Claude through clearer system/user prompt engineering and improved server–backend I/O compatibility.
  • Refactored the embedding pipeline to embed chunks from multiple documents in parallel, improving throughput.
  • Diagnosed and fixed production bugs across on-premise and AWS infrastructure

TalkDoc — speech-based depression analysis (Feb 2026 – Present)

  • Built a depression-analysis quiz web app that submits user speech via presigned URLs to a model on AWS serverless.
  • Cut the model's memory footprint to fit a 6 GB serverless limit via feature-importance pruning, retaining 90% of baseline accuracy.
  • Automated MongoDB backups on AWS EC2 with scheduled daily and incremental dumps.
Technical Skills
Languages
Python · C/C++ · SQL
ML & Frameworks
PyTorch · Hugging Face · scikit-learn · pandas · OpenAI SDK · Anthropic SDK · Django · FastAPI · ReactJS
Data & Tools
PostgreSQL (pgvector) · MySQL · Redis · Elasticsearch · Kafka · BigQuery · Docker · Git · LaTeX
Languages (human)
English (IELTS 8.0) · Vietnamese (native)
Awards & Activities
Academic Incentive Scholarship — HCMUT7 consecutive semesters

Awarded for seven consecutive semesters to students in the top 5% of the cohort by GPA.

Busan Global Data HackathonJul 2025

Built data-driven shipping-load insights for Busan Ship Transport Co. in a Korean–Vietnamese team.

Big Data Club — Member · HPCC LabSep 2024 – Present

Contributing to big-data and high-performance-computing research projects.