Listing Thumbnail

    Google BigQuery Connector for AWS Glue

     Info
    Deployed on AWS
    Easily connect to Google BigQuery from AWS Glue

    Overview

    The Google BigQuery Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from BigQuery, and also load data into BigQuery. This connector provides comprehensive access to BigQuery data, facilitating cloud ETL processes for operational reporting, backup and disaster recovery, data governance, and more.

    Highlights

    • * Connect to Google BigQuery from AWS Glue Jobs * Simplify data extracts from Google BigQuery * Simplify data loads to Google BigQuery

    Details

    Delivery method

    Supported services

    Delivery option
    Glue 3.0
    Glue 1.0/2.0

    Latest version

    Operating system
    Linux

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Google BigQuery Connector for AWS Glue

     Info
    This product is available free of charge. Free subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Vendor refund policy

    No Refunds

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Glue 3.0

    Supported services: Learn more 
    • Amazon ECS
    • Amazon EKS
    Container image

    Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.

    Version release notes

    Google BigQuery Connector for AWS Glue 0.24.2.

    • This version is built with spark-bigquery-connector  0.24.2.
    • This version is compatible with AWS Glue 3.0, 2.0 and 1.0.
    • This version supports both read from and write into Google BigQuery.

    Additional details

    Usage instructions

    Please subscribe to the product from AWS Marketplace and Activate the Glue connector from AWS Glue Studio .

    Pre-requisite

    • An account in Google Cloud, specifically a service account that has permissions to Google BigQuery
    • GCP credentials (service_account_json_file)
    • GCS bucket (only for writes)
    • BigQuery dataset (only for writes)
    • AWS Secrets Manager secret (you can create the secret in following steps)

    Create a new secret for Google BigQuery in AWS Secrets Manager

    We create a secret in AWS Secrets Manager to store the Google service account file contents as a base64-encoded string.

    1.Download the service account credentials JSON file from Google Cloud.

    • For base64 encoding, you can use one of the online utilities or system commands to do that. For Linux and Mac, you can use base64 [service_account_json_file] to print the file contents as a base64-encoded string.
    1. On the Secrets Manager console, choose Store a new secret.
    2. For Secret type, select Other type of secret.
    3. Enter your key as credentials and the value as the base64-encoded string.
    4. Leave the rest of the options at their default.
    5. Choose Next.
    6. Give a name to the secret bigquery_credentials.
    7. Follow through the rest of the steps to store the secret.

    Connection options

    You can pass the following options to the connector.

    • parentProject (required): The Google Cloud Project ID of the table
    • dataset(optional unless omitted in table): The BigQuery dataset containing the table.
    • table (required): The BigQuery table in the format [[project:]dataset.]table
    • temporaryGcsBucket (optional. required for writes):

    You can see other available options here: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/tree/0.24.2 

    Spark configurations

    Following Spark configurations are required only for writes into BigQuery.

    • spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
    • spark.hadoop.google.cloud.auth.service.account.json.keyfile=true

    You also need to configure credentials in one of following set of configurations.

    Credential file

    • spark.hadoop.fs.gs.auth.service.account.json.keyfile=credentials.json

    You need to upload credentials.json to your S3 bucket, and set the file path in Referenced files path.

    Private key

    • spark.hadoop.fs.gs.auth.service.account.email= [your-email-extracted-from-service_account_json_file]
    • spark.hadoop.fs.gs.auth.service.account.private.key.id= [your-private-key-id-extracted-from-service_account_json_file]
    • spark.hadoop.fs.gs.auth.service.account.private.key= [your-private-key-body-extracted-from-service_account_json_file]

    You can set these Spark configurations in one of following ways.

    • The param --conf of Glue job parameters
    • The job script using SparkConf

    from pyspark.conf import SparkConf conf = SparkConf() conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem") conf.set("spark.hadoop.fs.gs.auth.service.account.enable", "true") conf.set("spark.hadoop.google.cloud.auth.service.account.json.keyfile", "credentials.json")

    Support

    Vendor support

    Please allow 24 hours

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    4
    1 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    100%
    0%
    0%
    0%
    1 AWS reviews
    |
    1 external reviews
    Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.
    Preview

    Aws

    Reviewed on Apr 23, 2024
    Review from a verified AWS customer

    This is the coolest product ever and it's so useful, and really amazing I appreciate it, so have, it guys

    Education Management

    Glue Connector Integrations with BQ

    Reviewed on Apr 08, 2022
    Review provided by G2
    What do you like best about the product?
    AWS glue has been a game changer for me. We've been utilizing the Glue Schema Registries, it provides versioning of schema, which wasn't available when we were dealing with Pub/Sub Schema.
    What do you dislike about the product?
    Unfortunately the Glue client is available only in Java. Particularly for the SerDe operations on our Avro Data.
    What problems is the product solving and how is that benefiting you?
    We use it mostly to provide schemas for our tables in BigQuery. The idea behind using Glue is to inferr avro schema from the data we have from CDC, and move it to BigQuery.
    Recommendations to others considering the product:
    Strong Tool to manage out your metadata!
    View all reviews