Research Portfolio / Systems-Oriented Builder

AI
&SYSTEMS

AI Systems / Efficient Inference / Hardware-Aware Execution

AI Systems Researcher and Engineer

I study and build efficient AI systems across models, inference pipelines, and hardware-aware execution. My current interests include multimodal learning, CUDA optimization, runtime scheduling, and edge-oriented AI infrastructure.

Efficient InferenceCUDA & SystemsMultimodal AI

Beyond model usage, toward systems thinking.

I am interested in how AI models actually run in practice, including inference efficiency, memory behavior, runtime scheduling, and the coupling between models and hardware.

From CUDA kernel optimization and RAG-based code analysis to multimodal research and system-level inference thinking.

5+Selected technical projects
1Manuscript in preparation
50+Technical blog posts
3.7/4.0Master's CAP
Explore research and selected work01
Scroll
Yiming HuangResearch + EngineeringModel to Hardware

Focus

Research Focus

My current focus is how modern AI models can run more efficiently in real systems, especially under constraints of memory, latency, and hardware resources.

01

Efficient AI Inference

I care about real deployment behavior, not only benchmark numbers. That includes latency, memory movement, and inference efficiency under practical constraints.

02

Compiler and Runtime Co-optimization

I am interested in treating quantization, operator fusion, code generation, and runtime scheduling as one connected systems problem.

03

Hardware-Aware Systems Thinking

My perspective is shaped by memory hierarchy, data locality, register pressure, instruction throughput, and the realities of edge and GPU platforms.

Work

Selected Projects

These projects show how I approach AI systems problems through both research reasoning and hands-on implementation.

01

AI Systems

CUDA GEMM Optimization and Architectural Analysis

Implemented and optimized GEMM kernels with shared memory tiling, register blocking, and profiling-guided analysis to improve arithmetic intensity and execution performance.

CUDAProfilingMemory Hierarchy

02

AI Systems

LLM + RAG Code Architecture Analysis System

Built a repository analysis system that combines LLMs, AST-based chunking, vector retrieval, and CUDA-aware parsing for structured code understanding.

LLMRAGAST

03

AI Systems

Multimodal Video Captioning Research

Designed a multimodal video-to-text pipeline with transformer-based alignment and efficiency-oriented system thinking.

MultimodalViTInference

Profile

Background

My background combines research-oriented study with practical engineering experience across data systems, maintenance, networking, and technical tool building.

Capability Thread

01-05
  • 01Master's research in multimodal AI and efficient inference
  • 02Publication manuscript under preparation
  • 03Engineering experience in data pipelines and automation
  • 04Earlier systems and network operations experience
  • 05Ongoing study in CSAPP, system programming, and performance analysis

Links

Materials

Currently focused on machine learning systems, low-level operator optimization, GPU programming, and AI infrastructure engineering.

GitHubBlogEmail
Tech InsightsOpen

WeChat

Tech Insights

A small window into my ongoing technical notes, system-level observations, and engineering reflections. Scan the QR code to follow the account.

On mobile, tap to open the panel and long-press the QR code to scan.

Tech Insights