Cloud Infrastructure
Monitoring with
Grafana Cloud
| Webinar
Imma Valls
Staff Developer Advocate
| Webinar
This webinar is being recorded
Submit questions in the Q&A feature to be answered at
the end of the presentation
All registrants will receive access to the recording
Agenda
● Introduction to Cloud Service Models
● Implementation Strategies
● Collecting Telemetry Data
● Enhancing Visibility
● Q&A
What are the Cloud
Service Models?
DO
I CARE!?
MAYBE…
IT
DEPEN
DS
Server Team Network Team
Customer
said the App
is down!
Cloud Team
Servers are
UP!
Network is
UP!
Cloud is UP!
Itʼs somewhere in
the stack!
App Team
● Undetected incidents
● Undefined RCAs
● Extended MTTR
● Ineffective
● Collaboration
● SLA breaches
How does Grafana
help?
Grafana Alloy
Your
Environment
Loki
Logs
Grafana
Visualizations
Tempo
Traces
Mimir
Metrics
Pyroscope
Profiles
Applications
and
Infrastructure
Native
OTel, Prometheus
No Lock-in
Open standard
Composable Stack
OSS or Commercial
Your open and composable observability stack
Keep data where it is
Big Tent
Data Sources
SLO & Alerting | AI/ML Insights | Cost Management | Security & Governance | Configuration (as code)
Kubernetes
Server/VM Cloud
Providers
eBPF
OnCall Incident
Serverless
Application
Observability
Testing Infrastructure
Observability
Incident Response
Management
Frontend Service Maps Kubernetes Server/VM
Cloud
Providers
Database
Core Observability
Frontend Application Infrastructure
Contextual Root Cause Analysis
Asserts
Performance
Testing
Synthetic
Monitoring
Understand
How are components connected
across deployments?
Access
How can we connect to isolated
networks?
Govern
How can we control and optimize the
cost of my Observability
1 3 4
Collect data through different protocols,
push/pull models, etc
Securely Connect to data sources that
are Onprem
Measure reliability across heterogenous
environnements
Correlate between data sources Assist
in root cause analysis
Infer interactions between infra and
application components
Have end to end transaction across
networks and environnements
Monitor costs and usage
Reduce and optimize costs
Attribute costs per deployment,
datacenter, region, etc
Understand and analyze costs
Reduce egress costs and secure data on
the Internet
Monitor environments that can not be
exposed to the Internet
Unify
How to break
data silos?
2
Aggregate data from a variety of
monitoring tools
Unify the ingestion and querying
Gather Onprem and Cloud Observability
data
Link infrastructure with applications and
middleware
Understand
How are components connected
across deployments?
Access
How can we connect to isolated
networks?
Govern
How can we control and optimize the
cost of my Observability
1 3 4
Collect data through different protocols,
push/pull models, etc
Securely Connect to data sources that
are Onprem
Measure reliability across heterogenous
environnements
Correlate between data sources Assist
in root cause analysis
Infer interactions between infra and
application components
Have end to end transaction across
networks and environnements
Monitor costs and usage
Reduce and optimize costs
Attribute costs per deployment,
datacenter, region, etc
Understand and analyze costs
Reduce egress costs and secure data on
the Internet
Monitor environments that can not be
exposed to the Internet
Unify
How to break
data silos?
2
Aggregate data from a variety of
monitoring tools
Unify the ingestion and querying
Gather Onprem and Cloud Observability
data
Link infrastructure with applications and
middleware
Query the data in place…
Keep data where it is
Big Tent
Data Sources
Observability &
Monitoring
Google Monitoring, Splunk,
Dynatrace ..)
ITSM
Jira, ServiceNow,
etc)
Dev tools
Github, Gitlab,
Sentry ..)
Databases
BigQuery,
MongoDB,
Databricks ..)
Cloud
OSS &
Ent
150 Datasources
First Pane of Glass across data stores ...
First Pane of Glass across your Cloud(s)
Go deeper with component-
level dashboards or deep
link into the data source
Native Cloud Provider Metrics
● No consistent industry naming standard (ie prometheus)
● Different aggregations
● Different API calls
● Different identifiers
● Proprietary, controlled by vendor
Your
Environment
Loki
Logs
Grafana
Visualizations
Tempo
Traces
Mimir
Metrics
Pyroscope
Profiles
Applications
and
Infrastructure
Native
OTel, Prometheus
No Lock-in
Open standard
Composable Stack
OSS or Commercial
Keep data where it is
Big Tent
Data Sources
Ship data to our telemetry backends…
Grafana Cloud
Services
How Infrastructure Integrations work
Dashboards
Metrics Logs
Alerts
Dashboards
Logs
Alerts
Dashboards
Metrics
Alerts
Grafana
Alloy
collect forward
…with a very convenient collector, Alloy
● Supports collecting Metrics, Logs, Traces,
Profiles
● Service discovery (file, dns, K8S, etc)
● Deployable locally or remotely
● Can be clustered
● Integrations are easy-mode exporters/scrape
configs
● Get started with the Helm/Operator & minimal
configs
No Vendor Lock-In
● Open standards for telemetry data (metrics, logs, traces, …)
● Vendor-neutral, governed by the CNCF (like Prometheus)
Open Source Instrumentation of Applications
● SDKs for a large number of programming languages
● Auto-instrumentation libraries for lots of popular application
frameworks
● Tools for collecting, processing, and exporting telemetry data
Integration with the Open Source Monitoring Ecosystem
● Prometheus metrics support out-of-the-box
● Support for Grafana open source telemetry databases Mimir,
Loki, Tempo)
● Currently 80+ receivers and 40+ exporters
Why OpenTelemetry OTel
Exemplars &
Data links
Service discovery &
labels
LogQL Metric
extraction
Derived fields
Or Automated logging
Labels
Metrics
from spans &
Custom query
specification
Metrics: Is something happening?
Logs: What is happening?
Traces: Where is it happening?
Profiles: How do I fix it?
Trace links and
stack traces
Resource usage
over time
How metrics/logs/traces/profiles complement each other
OOTB experience including
dashboards, alerts, and
infrastructure cost-monitoring
tools. Zero to complete K8s
monitoring in < 5 minutes per
cluster
So - how do I get started?
Your Environment Collect
Applications
OpenTelemetry
Grafana Alloy
Store Visualize
etc…
Infrastructure
Shipping off Infrastructure Directly
So - how do I get started?
Store Visualize
Private Link option is available for Grafana Cloud
Ship From CSP Monitoring Service
Monitor health
& performance
Visualize
service data
Infrastructure vs. Datasource
Infrastructure Integrations Datasource Integrations
General Rule of Thumb
Use the Grafana Alloy or
OTel Collector when possible
Grafana
Alloy
Sometimes, installing an agent on cloud
based services is not possible - this is
when to use plugins or integrations
What types of
dashboards can I
build?
Infrastructure/Application Monitoring
Database/Data Warehouse
CI/CD Pipeline and Sprint Tracking
Analytics
Cloud Financials
Management and Governance
Storage
Recap
35
Best
Practices
● Use Alloy or OTel where possible
● Use Integrations or Plugins when you canʼt
install a collector
● Correlate, donʼt silo
● Think outside the box - get creative with
your Cloud Dashboards!
play.grafana.org
Grafana QuickPizza
Demo Dashboards
Grafana Cloud
Database Observability
3 users
10,000 series for metrics
50 GB of logs
50 GB of traces
14-day retention
The fastest way to get started.
Includes free forever access to:
grafana.com/cloud
Get started:
Please complete our webinar survey.
Have more questions?
Join us at community.grafana.com
or Grafana public slack grafana.slack.com
#general grafana community.grafana.com
Cloud Infrastructure monitoring with Grafana Cloud

Cloud Infrastructure monitoring with Grafana Cloud

  • 1.
    Cloud Infrastructure Monitoring with GrafanaCloud | Webinar Imma Valls Staff Developer Advocate
  • 2.
    | Webinar This webinaris being recorded Submit questions in the Q&A feature to be answered at the end of the presentation All registrants will receive access to the recording
  • 3.
    Agenda ● Introduction toCloud Service Models ● Implementation Strategies ● Collecting Telemetry Data ● Enhancing Visibility ● Q&A
  • 4.
    What are theCloud Service Models?
  • 6.
  • 7.
    Server Team NetworkTeam Customer said the App is down! Cloud Team Servers are UP! Network is UP! Cloud is UP! Itʼs somewhere in the stack! App Team ● Undetected incidents ● Undefined RCAs ● Extended MTTR ● Ineffective ● Collaboration ● SLA breaches
  • 8.
  • 9.
    Grafana Alloy Your Environment Loki Logs Grafana Visualizations Tempo Traces Mimir Metrics Pyroscope Profiles Applications and Infrastructure Native OTel, Prometheus NoLock-in Open standard Composable Stack OSS or Commercial Your open and composable observability stack Keep data where it is Big Tent Data Sources SLO & Alerting | AI/ML Insights | Cost Management | Security & Governance | Configuration (as code) Kubernetes Server/VM Cloud Providers eBPF OnCall Incident Serverless Application Observability Testing Infrastructure Observability Incident Response Management Frontend Service Maps Kubernetes Server/VM Cloud Providers Database Core Observability Frontend Application Infrastructure Contextual Root Cause Analysis Asserts Performance Testing Synthetic Monitoring
  • 10.
    Understand How are componentsconnected across deployments? Access How can we connect to isolated networks? Govern How can we control and optimize the cost of my Observability 1 3 4 Collect data through different protocols, push/pull models, etc Securely Connect to data sources that are Onprem Measure reliability across heterogenous environnements Correlate between data sources Assist in root cause analysis Infer interactions between infra and application components Have end to end transaction across networks and environnements Monitor costs and usage Reduce and optimize costs Attribute costs per deployment, datacenter, region, etc Understand and analyze costs Reduce egress costs and secure data on the Internet Monitor environments that can not be exposed to the Internet Unify How to break data silos? 2 Aggregate data from a variety of monitoring tools Unify the ingestion and querying Gather Onprem and Cloud Observability data Link infrastructure with applications and middleware
  • 11.
    Understand How are componentsconnected across deployments? Access How can we connect to isolated networks? Govern How can we control and optimize the cost of my Observability 1 3 4 Collect data through different protocols, push/pull models, etc Securely Connect to data sources that are Onprem Measure reliability across heterogenous environnements Correlate between data sources Assist in root cause analysis Infer interactions between infra and application components Have end to end transaction across networks and environnements Monitor costs and usage Reduce and optimize costs Attribute costs per deployment, datacenter, region, etc Understand and analyze costs Reduce egress costs and secure data on the Internet Monitor environments that can not be exposed to the Internet Unify How to break data silos? 2 Aggregate data from a variety of monitoring tools Unify the ingestion and querying Gather Onprem and Cloud Observability data Link infrastructure with applications and middleware
  • 12.
    Query the datain place… Keep data where it is Big Tent Data Sources
  • 13.
    Observability & Monitoring Google Monitoring,Splunk, Dynatrace ..) ITSM Jira, ServiceNow, etc) Dev tools Github, Gitlab, Sentry ..) Databases BigQuery, MongoDB, Databricks ..) Cloud OSS & Ent 150 Datasources
  • 14.
    First Pane ofGlass across data stores ...
  • 15.
    First Pane ofGlass across your Cloud(s) Go deeper with component- level dashboards or deep link into the data source
  • 16.
    Native Cloud ProviderMetrics ● No consistent industry naming standard (ie prometheus) ● Different aggregations ● Different API calls ● Different identifiers ● Proprietary, controlled by vendor
  • 17.
    Your Environment Loki Logs Grafana Visualizations Tempo Traces Mimir Metrics Pyroscope Profiles Applications and Infrastructure Native OTel, Prometheus No Lock-in Openstandard Composable Stack OSS or Commercial Keep data where it is Big Tent Data Sources Ship data to our telemetry backends…
  • 18.
    Grafana Cloud Services How InfrastructureIntegrations work Dashboards Metrics Logs Alerts Dashboards Logs Alerts Dashboards Metrics Alerts Grafana Alloy collect forward
  • 19.
    …with a veryconvenient collector, Alloy ● Supports collecting Metrics, Logs, Traces, Profiles ● Service discovery (file, dns, K8S, etc) ● Deployable locally or remotely ● Can be clustered ● Integrations are easy-mode exporters/scrape configs ● Get started with the Helm/Operator & minimal configs
  • 20.
    No Vendor Lock-In ●Open standards for telemetry data (metrics, logs, traces, …) ● Vendor-neutral, governed by the CNCF (like Prometheus) Open Source Instrumentation of Applications ● SDKs for a large number of programming languages ● Auto-instrumentation libraries for lots of popular application frameworks ● Tools for collecting, processing, and exporting telemetry data Integration with the Open Source Monitoring Ecosystem ● Prometheus metrics support out-of-the-box ● Support for Grafana open source telemetry databases Mimir, Loki, Tempo) ● Currently 80+ receivers and 40+ exporters Why OpenTelemetry OTel
  • 21.
    Exemplars & Data links Servicediscovery & labels LogQL Metric extraction Derived fields Or Automated logging Labels Metrics from spans & Custom query specification Metrics: Is something happening? Logs: What is happening? Traces: Where is it happening? Profiles: How do I fix it? Trace links and stack traces Resource usage over time How metrics/logs/traces/profiles complement each other OOTB experience including dashboards, alerts, and infrastructure cost-monitoring tools. Zero to complete K8s monitoring in < 5 minutes per cluster
  • 22.
    So - howdo I get started? Your Environment Collect Applications OpenTelemetry Grafana Alloy Store Visualize etc… Infrastructure Shipping off Infrastructure Directly
  • 23.
    So - howdo I get started? Store Visualize Private Link option is available for Grafana Cloud Ship From CSP Monitoring Service
  • 24.
    Monitor health & performance Visualize servicedata Infrastructure vs. Datasource Infrastructure Integrations Datasource Integrations
  • 25.
    General Rule ofThumb Use the Grafana Alloy or OTel Collector when possible Grafana Alloy Sometimes, installing an agent on cloud based services is not possible - this is when to use plugins or integrations
  • 26.
  • 27.
  • 28.
  • 29.
    CI/CD Pipeline andSprint Tracking
  • 30.
  • 31.
  • 32.
  • 33.
  • 35.
  • 36.
    Best Practices ● Use Alloyor OTel where possible ● Use Integrations or Plugins when you canʼt install a collector ● Correlate, donʼt silo ● Think outside the box - get creative with your Cloud Dashboards!
  • 37.
  • 38.
  • 39.
    3 users 10,000 seriesfor metrics 50 GB of logs 50 GB of traces 14-day retention The fastest way to get started. Includes free forever access to: grafana.com/cloud Get started:
  • 40.
    Please complete ourwebinar survey. Have more questions? Join us at community.grafana.com or Grafana public slack grafana.slack.com #general grafana community.grafana.com