Work Experience

um

University of Michigan Library

Data Engineer

Sep 2023 - Present

  • Collect relevant publications information using RestAPI and web scraping.
  • Perform data cleaning and store data using both RDBMS and MongoDB based on different needs on query efficiency and data scalability.
  • Use python pandas and sklearn to analyze publications that involved self-plagiarism from different perspective like reasons and categories. Data visualization with seaborn and matplotlib.
  • Cultivate project management skills. Design and schedule the project from the beginning. Automate codes for later use.
ey

Ernst & Young

Data Analytics Intern

Dec 2022 - Mar 2023

  • In-depth participation in data analysis projects with four client companies. Research on related projects, financial data and revenue situations.
  • Participate in IPO project of publicly listed company (TUHU Car Inc.), conduct user purchase analysis, identify abnormal store operations, and compare third-party transactions based on structural data analysis and machine learning skills like SVM and Regression Models.
  • Use data analysis tools like SQL and Python to do database normalization and perform basic ETL operations.
  • Utilize big data tools like Spark SQL and Hive to analyze data of millions of entries.
  • Improve database query efficiency by 50% in PostfreSQL by adding B-Tree type indexes and introducing inverted indexes like GIN.
ji

Shanghai Jiaotong University, Joint Institute

Teaching Assistant

Sep 2022 - Dec 2022

  • Course VE215 Circuit Analysis Teaching Assistant.
  • Conduct tutorials and review sessions to reinforce understanding of circuit analysis techniques.
  • Assist students during laboratory sessions to ensure smooth execution of experiments.
  • Offer regular office hours to address individual student concerns and provide additional assistance.

Project Experience

Information Retrieval System Implementation

  • Implement a Google-like search engine based on inverted index and text analysis.
  • Use different scorer like BM25 and TFIDF to rank results returned by search algorithm.
  • Use neural network called learning to rank to examine and modify search results and improve accuracy by 25%.
  • Add deep learning features by using BiEncoders and CrossEncoders to take queries into consideration and further increase model robustness and completeness.

Data Augmentation in Deep Reinforcement Learning

  • Adapt the baseline from Mujoco platform and DrQ-v2 in Pytorch, which is an open-source advanced deep reinforment learning method.
  • Implement basic methods of data augmentation on the original inputs as well as the feature maps, including RandomRotation, RandomAffine, etc.
  • Use saliency map from deep learning models to extract important features from images and augment the inputs. The accuracy gets improved by over 20%.

Alzheimer MRI Diagnosis and Reproduction with CNN and GAN

  • Use classification methods like Naive Bayes, Logistic Regression, SVM to classify Alzheimer illness degrees.
  • Increase the accuracy to up to 98.30% by using deep learning tools like CNN.
  • Data augmentation skills like rotation, squeeze, clipping to produce more training samples.
  • Use GAN to reproduce similar MRI images of different degrees of illness with tuned GLoss 5.1 and DLoss 0.4.

Visual Analysis on Exploring Cyber Asset Graphs of Cybercrime Gangs

  • Web crawl to gather data indicating edges and vertices of the cyber asset graph.
  • Utilize search algorithms including BFS, DFS and A* to detect potential relationships between nodes.
  • Create an interactive web page with Flask and JavaScript D3 library to show relations between different cyber assets.

A Risk-Resistant Trading Strategy Modeled by VaR-LP based on SW-ANN Prediction

  • Make forecasting on stock price based on various regression models and compare their performance.
  • Develop a strategy combining ideas of sliding-window and neural networks to make proper portfolio investments which can at most increase original assets by 40 times based on the given dataset.