Google Cloud: Professional Cloud Architect (PCA) Exam Notes – Part X

Machine Learning and AI Products

Look out, this section will change dramatically after 2023 with all things Bard and ChatGPT changing the world – but for now this covers the exam topics (as of April 2023)

  • AutoML is used to build on Google’s ML capabilities to create your own custom ML models without any coding
    • AutoML Vision: when you have image data
    • AutoML Natural Language: when you have text data
    • AutoML Tables: when you have structured data
  • Cloud ML Engine is a massively scalable managed service for training ML models and making predictions
    • Based on TensorFlow, can use datasets of any size and any use cases
    • Integrates with GCS/BQ, Cloud Datalab, Cloud Dataflow (preprocessing)
    • Supports online and batch predictions
    • Download model and make predictions at the edge
    • HyperTune automatically tunes model hyperparameters
  • Cloud Vision API classifies images into categories, detects objects/faces, and finds/reads printed text
    • Pre-trained ML model to analyze images and discover their contents
    • Upload images or point to ones stored in GCS
  • Cloud Speech API is an automatic speech recognition to turn spoken words into text and vice-versa. Pre-trained ML model for recognizing speech in 110+ languages
    • Accepts pre-recorded or real-time audio
    • Handles noise source and filters 
  • Cloud Natural Language API analyzes texts for sentiment, intent, and content classification and extracts the information 
    • Combines well with the other APIs
    • Extracts people, places, things, and links to Wikipedia
  • Cloud Translation API translates 100+ languages and includes semantics, not just syntax. Send plain text or HTML and receive translation in kind
  • Dialogflow is to build conversational interfaces for websites, mobile apps, messaging, IoT devices. Pre-trained ML model for chatbots, provide dataset of examples for custom entity types
  • Cloud Video Intelligence API annotates videos in GCS with info about what they contain. Enables you to search a video catalog the same way you search text documents
  • Explainable AI provides tools and frameworks to understand and interpret your ML models
  • Transcoder API allows you to convert video files and package them for optimized delivery to web, mobile and connected TVs
  • Cloud Job Discovery helps career sites, company job boards, etc. to improve engagement and conversion. Integrates with many job/hiring systems 

Big Data & Eventing Services

  • 4 Steps of Data: Ingest, Store, Process & Analyze, Explore & Visualize
  • Cloud Pub/Sub is a global, infinitely scalable at-least-once messaging for ingestion, decoupling, etc. “Shock absorber” when scaling systems – the glue that links everything together
    • Cloud Pub/Sub has topics and subscriptions, use Cloud Tasks for queues and when you need explicit invocation by a publisher/controller
    • Can publish from anywhere and consume from anywhere, based on HTTPS load balancer
    • Messages can be up to 10MB, undelivered messages are stored for 7 days but there is no dead letter queue. You can create another topic as a dead-letter topic to collect these, but it does require some configuration
    • By default, subscriptions with no activity (push or pull requests) for 31 days may be deleted automatically
    • Push mode delivers to HTTPS endpoints “slow start algorithm” – uses a configurable exponential backoff when retrying
    • Pull mode delivers messages to requesting clients and waits for ACK to delete
    • Long polling 
    • Pay for data volume, min 1KB
  • Cloud Datalab is a Jupyter Notebook based tool for data exploration, analysis, visualization, and machine learning
    • Spins up a repository and VM under the hood
    • This is being deprecated, use Vertex AI instead
  • Cloud Data Studio is a Big data visualization tool for dashboards and reporting, when combined with BigQuery BI engine, an in-memory analysis service, data exploration and virtual interactivity reach sub-second speeds over massive datasets.
  • Cloud Genomics – Store and process genomic data and related experiments and uses open industry standards

Some Web Review Notes

  • TCP connections use a three-way handshake to establish a reliable connection. HTTP/S connections run over TCP. The handshake establishes rules most commonly referred to by the TCP three-way handshake. SYN, SYN-ACK, ACK. A TLS handshake is used to set up secure communication. What Is a Three-Way Handshake in TCP? ← Quick video to review.
    • Similarly a TLS handshake is the process that kicks off a communication session that uses TLS. During a TLS handshake, the two communicating sides exchange messages to ACK each other, verify each other, and pick which cryptographic algorithms they will use, ultimately agreeing and using session keys. Breaking Down the TLS Handshake ← a video by F5 networks to review.
  • Cross-Origin Resource Sharing (CORS) is an HTTP-header based mechanism that allows a server to indicate any other origins (domain, scheme, or port) than its own from which a browser should permit loading of resources. For security reasons, browsers restrict cross-origin HTTP requests initiated by scripts. 
  • RTT/TTFB
    • Round-Trip Time (RTT) is the amount of time it takes for a signal to be sent plus the amount of time it takes for acknowledgement of that signal having been received.
    • Time to first byte (TTFB) is a measurement used as a latency metric for a web server or other network resources. It measures the duration from the user or client making an HTTP request to the first byte of the page being received by a browser.

HTTP Verbs for RESTful Services

  • POST: Used to create a subordinate resource (child) to the parent resource collection identified by a URI.
  • GET: Used to read or retrieve a representation of a resource, usually XML or JSON and HTTP 200
  • PUT: Used to create or update a resource identified by a URI
  • DELETE: Deletes a resource identified by a URI
  • OPTIONS: Provides a list of options/supported HTTP verbs for URI
  • HEAD: Asks for response identical to GET but without response body
  • CONNECT: Establishes a tunnel to server identified by target uri
  • TRACE: Perform message loopback test along the path to target uri – [Unknown compatibility with modern browsers]
  • PATCH: Applies partial modifications to a resource.

Status Codes (Source)

  • 200 – The request succeeded, OK
  • 201 – The request has succeeded and a new resource has been created as a result. This is typically the response sent after POST requests, or some PUT requests.
  • 202 – The request has been received but not yet acted upon. It is noncommittal, since there is no way in HTTP to later send an asynchronous response indicating the outcome of the request. It is intended for cases where another process or server handles the request, or for batch processing.
  • 301 – The URL of the requested resource has been changed permanently. The new URL is given in the response and client is informed that it should not request old URL anymore
  • 302 – The URL of the requested resource has been changed temporarily. The new URL is given in the response and the client should request the resource again, because the redirection URL may change in the future.
  • 304 – This is used for caching purposes. It tells the client that the response has not been modified, so the client can continue to use the same cached version of the response.
  • 400 – Bad Request, The server could not understand the request due to invalid syntax.
  • 403 – Forbidden, The client does not have access rights to the content; that is, it is unauthorized, so the server is refusing to give the requested resource. Unlike 401, the client’s identity is known to the server.
  • 404 – Not Found, endpoint may be valid but resource doesn’t exist
  • 429 – Too many requests (check quotas, number of replicas, limits, etc.)
  • 500 – Internal Server Error, The server has encountered a situation it doesn’t know how to handle.
  • 503 – Service Unavailable – May be overloaded or down for maintenance
  • Google Cloud Storage Specific Status Codes

Commonly Used Port Numbers

  • HTTP: TCP/80
  • HTTPS: TCP/443
  • FTP: TCP/21
  • FTPS / SSH: TCP/22
  • SMTP: TCP/25 (Alternate: TCP/26)
  • SMTP SSL: TCP/587
  • MySQL: TCP/3306
  • RDP: TCP/3389
  • DNS: UDP (or TCP)/53
  • ICMP (Ping): IP protocol 7 (it’s neither UDP nor TCP)

Other DevOps Tools & Practices

  • Jenkins is open source and can help automate integrating and deploying software, we GA’d Cloud Deploy early in 2022 to have a managed deployment service. 
  • Ansible, Chef, and Puppet are tools that can automate the configuration management of computers and VMs
  • Terraform is a tool for creating infrastructure as code
  • Post-mortems should be blameless and identify what processes may need to be changed. They should not blame underlying individuals but instead try to understand the underlying causes that led to a particular event. Remember that identifying and fixing issues late in the product cycles is as expensive as handling 100 issues earlier on. Fix things early!
  • In Scrum, the product owner is responsible for defining and prioritizing product backlog items that dev teams will work on. For additional context, see the Scrum Guide

Firebase

Firebase is a suite of products to help app developers build and run successful apps. There are three categories of tools that fall under Firebase, quick overview here.

Build

  • Cloud Firestore is the same service listed above in NoSQL databases
  • Realtime Database allows you to store and sync JSON data between users in near-realtime or offline with strong user-based security
  • Remote Config is for creating feature flags so that you can dynamically control and optimize the user experience in production
  • Cloud Functions for Firebase are just like Cloud Functions, write the code and string together event-driven architectures
  • Authentication is a feature to easily add an end-to-end identity solution to your app
  • Cloud Messaging provides a reliable and battery-efficient connection between your server and devices that allows you to deliver and receive messages and notifications on iOS, Android, and the web at no cost.
  • Hosting is self explanatory, includes the global CDN
  • Cloud Storage is the same as GCS

Release & Monitor

  • Crashlytics tracks, prioritizes, and fixes stability issues that erode app quality in real time
  • Google Analytics is just like GA anywhere else
  • Performance Monitoring is to address issues quickly
  • Test Lab spots errors by validating your app on physical and virtual devices

Engage

  • Predictions applies the power of ML to predict future user behavior
  • Dynamic Links organically grow your app by deep linking users to the right place so they can find and share content