Work Experience

University of Michigan Library
Data Engineer
Sep 2023 - Present
- Collect relevant publications information using RestAPI and web scraping.
- Perform data cleaning and store data using both RDBMS and MongoDB based on different needs on query efficiency and data scalability.
- Use python pandas and sklearn to analyze publications that involved self-plagiarism from different perspective like reasons and categories. Data visualization with seaborn and matplotlib.
- Cultivate project management skills. Design and schedule the project from the beginning. Automate codes for later use.

Ernst & Young
Data Analytics Intern
Dec 2022 - Mar 2023
- In-depth participation in data analysis projects with four client companies. Research on related projects, financial data and revenue situations.
- Participate in IPO project of publicly listed company (TUHU Car Inc.), conduct user purchase analysis, identify abnormal store operations, and compare third-party transactions based on structural data analysis and machine learning skills like SVM and Regression Models.
- Use data analysis tools like SQL and Python to do database normalization and perform basic ETL operations.
- Utilize big data tools like Spark SQL and Hive to analyze data of millions of entries.
- Improve database query efficiency by 50% in PostfreSQL by adding B-Tree type indexes and introducing inverted indexes like GIN.

Shanghai Jiaotong University, Joint Institute
Teaching Assistant
Sep 2022 - Dec 2022
- Course VE215 Circuit Analysis Teaching Assistant.
- Conduct tutorials and review sessions to reinforce understanding of circuit analysis techniques.
- Assist students during laboratory sessions to ensure smooth execution of experiments.
- Offer regular office hours to address individual student concerns and provide additional assistance.
Project Experience
Information Retrieval System Implementation
- Implement a Google-like search engine based on inverted index and text analysis.
- Use different scorer like BM25 and TFIDF to rank results returned by search algorithm.
- Use neural network called learning to rank to examine and modify search results and improve accuracy by 25%.
- Add deep learning features by using BiEncoders and CrossEncoders to take queries into consideration and further increase model robustness and completeness.
Data Augmentation in Deep Reinforcement Learning
- Adapt the baseline from Mujoco platform and DrQ-v2 in Pytorch, which is an open-source advanced deep reinforment learning method.
- Implement basic methods of data augmentation on the original inputs as well as the feature maps, including RandomRotation, RandomAffine, etc.
- Use saliency map from deep learning models to extract important features from images and augment the inputs. The accuracy gets improved by over 20%.
Alzheimer MRI Diagnosis and Reproduction with CNN and GAN
- Use classification methods like Naive Bayes, Logistic Regression, SVM to classify Alzheimer illness degrees.
- Increase the accuracy to up to 98.30% by using deep learning tools like CNN.
- Data augmentation skills like rotation, squeeze, clipping to produce more training samples.
- Use GAN to reproduce similar MRI images of different degrees of illness with tuned GLoss 5.1 and DLoss 0.4.
Visual Analysis on Exploring Cyber Asset Graphs of Cybercrime Gangs
- Web crawl to gather data indicating edges and vertices of the cyber asset graph.
- Utilize search algorithms including BFS, DFS and A* to detect potential relationships between nodes.
- Create an interactive web page with Flask and JavaScript D3 library to show relations between different cyber assets.
A Risk-Resistant Trading Strategy Modeled by VaR-LP based on SW-ANN Prediction
- Make forecasting on stock price based on various regression models and compare their performance.
- Develop a strategy combining ideas of sliding-window and neural networks to make proper portfolio investments which can at most increase original assets by 40 times based on the given dataset.