Contact: Institute of Computing Technology, Chinese Academy of Sciences, No.6 Kexueyuan South Road Zhongguancun, Haidian District Beijing, 100190, China
About me
I am an assistant professor with Institute of Computing Technology, Chinese Academy of Sciences. I received my PhD degree from University of Chinese Academy of Sciences in 2019 under guidance of Prof. Zhaoqi Wang. I was a visiting professor in Prof. Zhigang Deng lab in University of Houston from Oct. 2015 to Dec. 2018.
My research primarily centers on computer vision and computer graphics, especially visual traffic simulation, trajectory prediction with deep learning and autonomous driving.
Synthesizing indoor scene layouts is challenging and critical, especially for digital design and gaming entertainment. Although there has been significant research on the indoor layout synthesis of rectangular-shaped or L-shaped architecture, there is little known about synthesizing plausible layouts for more complicated indoor architecture with both geometric and semantic information of indoor architecture being fully considered. In this paper, we propose an effective and novel framework to synthesize plausible indoor layouts in various and complicated architecture. The given indoor architecture is first encoded to our proposed representation, called InAiR, based on its geometric and semantic information. The indoor objects are grouped and then arranged by functional blocks, represented by oriented bounding boxes, using dynamic convolution networks based on their functionality and human activities. Through comparisons with other approaches as well as comparative user studies, we find that our generated indoor scene layouts for diverse, complicated indoor architecture are visually indistinguishable, which reach state-of-the-art performance.
Realistic 3D facial modeling and animation have been increasingly used in many graphics, animation, and virtual reality applications. However, generating realistic fine-scale wrinkles on 3D faces, in particular, on animated 3D faces, is still a challenging problem that is far away from being resolved. In this paper we propose an end-to-end system to automatically augment coarse-scale 3D faces with synthesized fine-scale geometric wrinkles. By formulating the wrinkle generation problem as a supervised generation task, we implicitly model the continuous space of face wrinkles via a compact generative model, such that plausible face wrinkles can be generated through effective sampling and interpolation in the space. We also introduce a complete pipeline to transfer the synthesized wrinkles between faces with different shapes and topologies. Through many experiments, we demonstrate our method can robustly synthesize plausible fine-scale wrinkles on a variety of coarse-scale 3D faces with different shapes and expressions.
This work presents a novel First-person View based Trajectory predicting model (FvTraj) to estimate the future trajectories of pedestrians in a scene given their observed trajectories and the corresponding first-person view images. First, we render first-person view images using our in-house built First-person View Simulator (FvSim), given the ground-level 2D trajectories. Then, based on multi-head attention mechanisms, we design a social-aware attention module to model social interactions between pedestrians, and a view-aware attention module to capture the relations between historical motion states and visual features from the first-person view images. Our results show the dynamic scene contexts with ego-motions captured by first-person view images via FvSim are valuable and effective for trajectory prediction. Using this simulated first-person view images, our well structured FvTraj model achieves state-of-the-art performance.
Most of existing traffic simulation methods have been focused on simulating vehicles on freeways or city-scale urban networks. However, relatively little research has been done to simulate intersectional traffic to date despite its obvious importance in real-world traffic phenomena. In this paper we propose a novel deep learning-based framework to simulate and edit intersectional traffic. Specifically, based on an in-house collected intersectional traffic dataset, we employ the combination of convolution network (CNN) and recurrent network (RNN) to learn the patterns of vehicle trajectories in intersectional traffic. Besides simulating novel intersectional traffic, our method can be used to edit existing intersectional traffic. Through many experiments as well as comparison user studies, we demonstrate that the results by our method are visually indistinguishable from ground truth and perform better than other methods.
Accurate vehicle trajectory prediction can benefit many Intelligent Transportation System (ITS) applications such as traffic simulation and advanced driver assistance system. This ability is pronounced with the emergence of autonomous vehicles, as they require the prediction of nearby agents' trajectories to navigate safely and efficiently. Recent studies based on deep learning have greatly improved prediction accuracy. However, one prominent issue is that these models often lack explainability. We alleviate this issue by proposing STA-LSTM, an LSTM model with spatial-temporal attention mechanisms. STA-LSTM not only outperforms other state-of-the-art models in prediction accuracy but also identifies the influence of historical trajectories and neighboring vehicles on the target vehicle via spatial-temporal attention weights. We provide analyses of the learned attention weights in various traffic scenarios based on target vehicle class, target vehicle location, and traffic density. An analysis showing that STA-LSTM can capture fine-grained lane-changing behaviors is also provided.
Virtualized traffic via various simulation models and real-world traffic data are promising approaches to reconstruct detailed traffic flows. A variety of applications can benefit from the virtual traffic, including, but not limited to, video games, virtual reality, traffic engineering, and autonomous driving. In this survey, we provide a comprehensive review on the state-of-the-art techniques for traffic simulation and animation. We start with a discussion on three classes of traffic simulation models applied at different levels of detail. Then, we introduce various data-driven animation techniques, including existing data collection methods, and the validation and evaluation of simulated traffic flows. Next, We discuss how traffic simulations can benefit the training and testing of autonomous vehicles. Finally, we discuss the current states of traffic simulation and animation and suggest future research directions.
Trajectory prediction for objects is challenging and critical for various applications (e.g., autonomous driving, and anomaly detection). Most of the existing methods focus on homogeneous pedestrian trajectories prediction, where pedestrians are treated as particles without size. However, they fall short of handling crowded vehicle-pedestrian-mixed scenes directly since vehicles, limited with kinematics in reality, should be treated as rigid, non-particle objects ideally. In this paper, we tackle this problem using separate LSTMs for heterogeneous vehicles and pedestrians. Specifically, we use an oriented bounding box to represent each vehicle, calculated based on its position and orientation, to denote its kinematic trajectories. We then propose a framework called VP-LSTM to predict the kinematic trajectories of both vehicles and pedestrians simultaneously. In order to evaluate our model, a large dataset containing the trajectories of both vehicles and pedestrians in vehicle-pedestrian-mixed scenes is specially built. Through comparisons between our method with state-of-the-art approaches, we show the effectiveness and advantages of our method on kinematic trajectories prediction in vehicle-pedestrian-mixed scenes.
Human trajectory prediction is challenging and critical in various applications (e.g., autonomous vehicles and social robots). Because of the continuity and foresight of the pedestrian movements, the moving pedestrians in crowded spaces will consider both spatial and temporal interactions to avoid future collisions. However, most of the existing methods ignore the temporal correlations of interactions with other pedestrians involved in a scene. In this work, we propose a Spatial-Temporal Graph Attention network (STGAT), based on a sequence-to-sequence architecture to predict future trajectories of pedestrians. Besides the spatial interactions captured by the graph attention mechanism at each time-step, we adopt an extra LSTM to encode the temporal correlations of interactions. Through comparisons with state-of-the-art methods, our model achieves superior performance on two publicly available crowd datasets (ETH and UCY) and produces more "socially" plausible trajectories for pedestrians.
In this paper, we propose a new data-driven model to simulate the process of lane-changing in traffic simulation. Specifically, we first extract the features from surrounding vehicles that are relevant to the lane-changing of the subject vehicle. Then, we learn the lane-changing characteristics from the ground-truth vehicle trajectory data using randomized forest and back-propagation neural network algorithms. Our method can make the subject vehicle to take account of more gap options on the target lane to cut in as well as achieve more realistic lane-changing trajectories for the subject vehicle and the follower vehicle. Through many experiments and comparisons with selected state-of-the-art methods, we demonstrate that our approach can soundly outperform them in terms of the accuracy and quality of lane-changing simulation. Our model can be flexibly used together with a variety of existing car-following models to produce natural traffic animations in various virtual environments.