Research

  • LLM-Support Database Manual Reader and Query Optimizer at USC, CA, Los Angeles, guided by Professor Ibrahim Sabek, Jia Robin and Ryan Marcus (UPenn) 08 2024 -
    • Replayed related works like DB-Bert, GPTuner, LLM-Tune. Get a basic usage of these open-source projects.
    • Learned how to collect related text materials, and fine-tune the LLM model using OPENAI interface.
    • Finished the architecture design of two-phase optimizer. How to do the knob-tuning and query optimizer. Get insights from the NLP domain to design the prompt.
    • Finished the rough implementation of the second phase design.
  • LifeLong Modular Reinforcement Learning Database Query Optimizer Design at USC, CA, Los Angeles, guided by Professor Ibrahim Sabek 09 2023 - 06 2024
    • Replayed the model Bao, Balsa, Neo and gain an understanding of how they function and interacted with Postgresql internal through extensions.
    • Did some new experiments to show limitations on the original systems, used benchmark like TPC-H, IMDB, StackOverflow Data.
    • Explored query plan tree-decomposition algorithms to find the most suitable one to introduce to the systems.
    • Introduced Lifelong Modular RL algorithm to solve the problems of data drifting, workload shifting, .etc.
    • Heavily modified the original system to deal with the problem of data drifting, dynamic environment.
    • Under review of VLDB 2025.
    • The link of an extended version of paper manuscript is here.
  • Distributed System Design and Distributed Graph Database Kernel Development at UESTC, Chengdu, China, guided by Professor Hancong Duan Github, Undergraduate Thesis 05 2022 - 05 2023
    • Studied the typical architectures of open source distributed graph database platforms such as neo4j, dgraph, Nebula graph, surrealdb, etc. Analyze the technical principles related to logical and physical query plans, and learned the classical optimization algorithms related to query schemes. Stdudied the implementation of network protocols and how to implement them in Java code.
    • Learned Janus Graph’s distributed cluster deployment and cloud deployment method, combined with Cassandra backend storage and Elasticsearch index to complete cluster deployment in AliCloud.
    • The tasks and some key problems related to the optimal placement of physical partitions and dynamic scheduling were solved mainly by graph partitioning and abstract algorithms. Designed the distributed graph sketch algorithm DGSS in Java, and the algorithm achieved good experimental results and was successfully integrated into the Janus Graph internal.
  • Reinforcemet Learning Algorithm Design in Social Commerce at UESTC, Chengdu, China, guided by Professor Dong Hao, 12 2020 - 10 2021
    • Built social commerce framework with networkx, pytorch, numpy and pandas in Python.
    • Coded on the framework and ran the existing virus spreading model, introduced reinforcement learning algorithm, and implemented automatic selection of point sort and infection strategy according to the network characteristics.
    • Modified the existing model into SEIRSC model with improved algorithm, introduced competition mechanism, and validated the new model with reinforcement learning algorithm such as DQN.
    • Took charge of conclusion report writing and oral defense for project conclusion (Awarded Excellent Project Conclusion).