There are so many good talks happening at the same time that it’s impossible to not miss out on good sessions. But imagine I had a time-turner necklace and could actually “attend” 3 (maybe 5) sessions happening at the same time. Taking into account my current personal interests and tastes, here’s how my day would look:
11:00 a.m.
- Apache Spark and real-time analytics: From interactive queries to streaming
- Netflix: Making big data small (the Netflix big data platform)
- How LinkedIn built a text analytics platform at scale
- In search of database nirvana: The challenges of delivering HTAP (the state of hybrid transactional/analytical processing systems)
11:50 a.m.
- Taking Spark Streaming to the next level with DataFrames
- Data scientists, you can help save lives (deep learning)
- Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo
- Old industries, sexy data: How machine learning is reshaping the world’s backbone industries
1:50 p.m.
- Ask me anything: Apache Spark
- Twitter Heron at scale (Twitter’s streaming system; API compatible with Apache Storm)
- Architecting immediacy: The design of a high-performance, portable wrangling engine
- Toppling the mainframe: Enterprise-grade streaming under 2 ms on Hadoop
2:40 p.m.
- Scalable schema management for Hadoop and Spark applications
- Architecting distributed systems for failure: How Druid guarantees data availability
- Ask me anything: Apache Kafka
- Building DistributedLog, a high-performance replicated log service (Twitter’s replicated log service, built on top of Apache BookKeeper)
4:20 p.m.
- Large-scale product classification via text and image-based signals using a fusion of discriminative and deep learning-based classifiers
- Vowpal Wabbit: The essence of speed in machine learning
- Cancer genomics analysis in the cloud with Apache Spark and ADAM
- Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid