The document discusses the urgency for enterprises to transition from traditional Hadoop architectures to cloud-based solutions like Databricks due to rising costs and inefficiencies. It highlights significant business benefits, including increased revenue and productivity, as well as the advantages of a unified data platform for analytics and AI workloads. The content emphasizes the importance of modernization in achieving innovation and competitive advantage in an era of accelerated digital transformation.
Introduction by Guido Oswald and Matt Graves on modernizing to cloud data architecture.
Key areas of focus: modernization reasons, success stories, fast migrations, and fireside chat.
Impact of accelerating digital transformation in sectors like e-commerce and IoT, with a $100B cloud adoption by 2023, stressing pressure on traditional infrastructures.
Enterprises face complexity with siloed data, rising costs of Hadoop, and the need for real-time data access.
Critical requirements for modern architectures: cost-effectiveness, manageability, and the need for predictive insights.
Forrester TEI study indicates 417% ROI from Databricks, with 47% cost savings and increased team productivity.
Introduction to the Databricks Lakehouse Platform as a unified solution for data, analytics, and AI.
Overview of Databricks capabilities: ETL, streaming, and enhanced processing speeds using Spark.
Automation tools can reduce migration costs by 55-66% and timelines by 2-3x compared to manual migrations.
Overview of partner ecosystem for assisting migrations, including tools and consulting services.
Summary of benefits of modernization with Databricks, emphasizing cost, productivity and innovation.
Invitation to visit Databricks for further information on migration resources.
Introduction to the fireside chat with Matt Graves discussing enterprise data and analytics.
Additional backup information related to the presentation.
Encouragement for attendees to provide feedback on the presentation.
Modernizing to a
clouddata
architecture
Guido Oswald, Solutions Architect, Databricks
Matt Graves, VP of Enterprise Data & Analytics,
GCI Communication Corp
2.
Agenda
• Top reasonsto modernize from Hadoop to Databricks
• Success stories, technical and business benefits
• Fast migrations with low costs & low risk
• Fireside Chat: Matt Graves
Digital transformation isaccelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Source: Gartner cited by Battery Ventures - Open Cloud report
Cloud adoption is
accelerating by $100B
from 2021 - 2023
6.
Today, most enterprisesstruggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming
Data Science & Machine
Learning
Extract
Transform
Streaming data sources
Streaming Data Engine
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics
Tibco Spotfire
Google Dataflow
Confluent
Disconnected systems and proprietary data formats make integration difficult
Data
Scientists
Data
Engineers
Data
Analysts
Data
Engineers
Siloed data teams decrease productivity
Load Real-time Database
7.
Is your architectureenabling growth?
Legacy on-premise data and analytics architectures are not keeping up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
8.
Hadoop is costly,complex and ineffective
Hadoop ecosystem is complex,
hard to manage, and prone to
failures
24/7 HDFS clusters that need
to built for peak usage and are
costly to upgrade
• RIGID AND INELASTIC
• DEVOPS INTENSIVE
No out-of-box support for
ML/AI and separate data and AI
environments
• LACKS AI CAPABILITIES
Low Productivity Cost Prohibitive Slow Innovation
X
9.
Enterprises need amodern data and analytics
architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive
innovation
10.
Modernization delivers businessvalue
Forrester TEI study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
11.
The Databricks LakehousePlatform is one simple platform to unify all
your data, analytics, and AI workloads
Original creators of popular data and machine learning open-source projects
Global company with over 5,000 customers and more than 450 partners
Structured Semi-structured UnstructuredStreaming
Lakehouse Platform
Data Engineering
BI & SQL
Analytics
Real-time Data
Applications
Data Science
& Machine
Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
From BI to AI
All your data,
analytics and
AI on one
Lakehouse
platform
14.
Data Eng, ML
(Spark)
Scalableapps on Columnar store
(Hbase)
ETL, SQL
(Hive/ Impala)
Databricks jobs / Delta Lake / SparkSQL
(Highly tuned Spark engine: faster, less compute, one-stop-shop)
Batch Process
(MapReduce)
Real-time Event Processing
(Storm/ Spark)
Databricks Spark jobs
(orders of magnitude faster - but may need manual work)
Databricks Structured Streaming
(Spark Structured Streaming + Delta Lake: Streaming + Batch ingest)
Databricks jobs/ Delta Lake
(Highly tuned Spark engine: faster, less compute, one-stop-
shop)
Databricks Spark integrates w/ HBase on cloud
(Alternatively: use cloud data stores well integrated with Databricks)
Technology mapping: deliver better outcomes
55-66 % reductionin costs and 2-3x reduction in
timelines by using automation tools
Data Migration
Assessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
Same tool
used for pre-
migration
Assessment
17.
Our partner ecosystemaccelerates migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
18.
Modernization with Databricks- recap
Why - costs, productivity, innovation → business
impact
Your competitors and market leaders are doing it
NOW
Databricks experts and automation strategy can
help you migrate faster, with much lower cost and
risk