Speakers
Details
Machine Learning and Data Science (ML) is starting to take the place in industry that "Information Technology" had in the late 1990s: businesses of all sizes and in all sectors, are recognizing how necessary it has become to develop predictive capabilities for continued profitability of their core competencies. To be effective, ML algorithms rely on high-quality training data – and not just any data, but data that is specific to the business problem that ML is applied to. Obtaining relevant training data can be very difficult for firms to do themselves, especially those early in their path towards incorporating ML into their operations. This problem is only further exacerbated, as businesses increasingly need to solve these prediction problems in real-time (e.g. a ride-share company setting prices, retailers/restaurants sending targeted coupons to clear inventory), which means that data gets “stale” quickly. Therefore, it is imperative that there are real-time market structures for the buying and selling of training data for ML. Further it is insufficient to view ML performance metrics (e.g. RMSE) in isolation of real-world applications; for example, a 10% increase in prediction accuracy means very different things for a hedge fund maximizing profits vs. a retailer decreasing inventory costs vs. a hospital trying to save lives. Hence the value of a dataset will necessarily have to consider more than simply the prediction accuracy it provides. Domain knowledge will be just as essential, if not more so, if we aim to view data as an asset and create a rigorous method to define its value.