Li Jie
Senior Manager, eBay Payments Risk Control
B.S. in Computer Science and Technology from Xi'an Jiaotong University and M.S. in Computer Science and Technology from Peking University. He has worked in Morgan Stanley, Baidu, CITIC Securities and other companies. Currently, he is the senior manager of eBay's payment and risk control department, and the person in charge of eBay's real-time risk control feature platform development. He has many years of practical experience in payment, clearing and settlement, and big data related to risk control.
Topic
eBay Risk Control Real-Time Characterization Platform
eBay's transaction risk control business uses a large number of machine learning models and rules to deal with online fraudulent activities. The machine learning model requires the feature platform to quickly generate hundreds of millions of feature simulation data to train the model, ensure that the offline simulation data and online feature data are consistent in order to prevent the model from being deployed with less than expected results, and complete the batch reading of hundreds of features in tens of milliseconds in order to satisfy the time-consuming requirements of real-time reasoning scenarios; the risk control rules require the feature platform to quickly release customized feature computation logic in tens of minutes to respond to unexpected fraud activities. The risk control rules require the feature platform to quickly release customized feature calculation logic within minutes to cope with unexpected fraudulent activities, and to complete a cold start within a few hours for features that have a look-back window of several months. The eBay Real-Time Feature Platform for Risk Control is an intelligent data processing engine that supports a dynamic computational paradigm. The platform features high data accuracy, low data update latency, online and offline data consistency, etc. It well supports real-time rule decision-making, real-time model inference, large-scale feature simulation backtracking, feature cold-starting, model automation iteration and other scenarios in eBay's risk control field, allowing eBay to efficiently and flexibly respond to a variety of online fraudulent activities. Details of the platform's technical features are listed below: 1, efficient data storage model. For Sliding Window and Lastk, the two most widely used feature types, the platform provides advanced storage models to achieve efficient read, write and storage performance. For Sliding Window features, by storing different dimensions of “days”, “hours” and “minutes” in the same data storage unit, the feature batch reads are reduced. For Lastk features, compared with the traditional solution of coding and decoding the feature objects as a whole, an index block-based data model is redesigned to achieve the unification of the computation and storage models, thus avoiding repeated coding and decoding when updating the feature values, and increasing the throughput of a single Flink single TM by more than 10 times. 2. Dynamically define the feature calculation logic. In addition to standard operators such as Min, Max, Count, Sum, Distinct, Average, Standard Deviation, Time Decay, ZScore, etc., algorithmic teams can define any computational logic on demand with the help of DSL provided by the platform. 3, online and offline data consistency. The platform builds a low-latency online computing engine based on Flink and a high-throughput offline computing engine based on Spark. The high matching rate of online and offline data is mainly ensured from the following three aspects: 1) using DSL to define features, and can be dynamically executed by Flink and Spark to reach the consistency of online and offline task execution logic; 2) snapshots of Event data consumed by online Flink tasks are deposited into offline HDFS files as the data source of offline Spark tasks, ensuring the consistency of the data source of the online and offline computing engines; 3) the consistency of the data source of the online and offline computing engines; and 4) the consistency of the data source of the online and offline computing engines. engine data source consistency; 3) the online Flink task ensures that all Event data is processed exactly once even when the underlying components such as the snapshot storage file system, database system, etc. are unstable through an intelligent snapshot data model combined with an advanced feature data model; the offline Spark task ensures that offline data is processed at least once by combining the de-duplication mechanism with the Event snapshot dropping mechanism to ensure that offline data is processed at least once. The offline Spark tasks ensure that offline data is processed exactly once by combining the de-duplication mechanism with the at least once mechanism of Event snapshot drop tasks. Using the above scheme, the platform can ensure that the statistical matching rate of online and offline eigenvalues for most features is more than 99%. 4.Large-scale feature simulation backtracking. By understanding the DSL definition of features and the Parquet file header for storing Event snapshots, the offline Spark task can load the Event data columns that need to participate in the computation on-demand. Combined with the feature query-key based salting mechanism, integrating the features that rely on the same storage unit and reusing intermediate computation results, dynamically optimizing the number of slices in the dataset to reduce the data skewing and avoiding memory overflows, and other optimization techniques, the platform is able to complete the instantaneous value (Point - In - Time) computation of billion features in less than 1 hour. 5、Real-time feature cold start. Based on the efficient large-scale simulation backtracking capability and online and offline data consistency, the platform provides a perfect cold-start mechanism for new features, so that feature data that needs to look back at a one or two year time window can complete the cold-start in a few hours and start to serve online models and rules.