MindPyramid Limited is a multinational information technology and outsourcing company headquartered in Vizag, India and New Jersey, USA.
Founded in 2003, the company employs approximately 2000 employees.
The company offers consulting services in cloud computing, big data and analytics.
They offer services to major cloud providers including AWS.
The team is working with one of the major clients having their infrastructure build on AWS.
Currently the client is having lot of performance issues in their DWH built on Redshift, and wants to understand the design best practices of Redshift from MindPyramid team.
Please suggest the best practices in improving the queries.
Select 4 options.
Click on the arrows to vote for the correct answer
A. B. C. D. E. F.Answer : A, B, D, E.
Option A is correct - Use a CASE Expression to perform complex aggregations instead of selecting from the same table multiple times.
https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.htmlOption B is correct -Using them can drive up the cost of the query by requiring large numbers of rows to resolve the intermediate steps of the query.
https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.htmlOption C is incorrect - Use predicates to restrict the dataset as much as possible.
https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.htmlOption D is correct -The query planner can then use row order to help determine which records match the criteria, so it skips scanning of large numbers of disk blocks.
https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.htmlOption E is correct -A query might qualify for one-phase aggregation when its GROUP BY list contains only sort key columns, one of which is also the distribution key.
The sort key columns in the GROUP BY list must include the first sort key, then other sort keys that you want to use in sort key order.
https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.htmlOption F is incorrect -we should use them for performance.
https://docs.aws.amazon.com/redshift/latest/dg/c_designing-queries-best-practices.htmlSure, I'd be happy to provide an explanation of the best practices for improving queries in Redshift:
A. Use CASE Expression to perform complex aggregations instead of selecting from the same table multiple times: It is a good practice to use the CASE statement for performing complex aggregations, rather than selecting from the same table multiple times. This can help to simplify the query and reduce the amount of data that needs to be read.
B. Avoid using functions in query predicates: Using functions in query predicates can result in slower query performance because it requires extra processing time to evaluate the function. Instead, try to use simple expressions or values in query predicates.
C. Avoid using predicates to restrict the dataset: While it may seem counterintuitive, using predicates to restrict the dataset can actually slow down query performance. This is because Redshift is designed to read and process large amounts of data efficiently, and adding predicates can interfere with this process.
D. Use a WHERE clause to restrict the dataset: Instead of using predicates, use a WHERE clause to restrict the dataset. This can help to improve query performance because it allows Redshift to read and process only the data that is actually needed for the query.
E. Use sort keys in the GROUP BY clause to improve aggregations: Using sort keys in the GROUP BY clause can help to improve query performance by reducing the amount of data that needs to be sorted. This is because sort keys enable Redshift to group and aggregate data more efficiently.
F. Do not use subqueries in cases where one table in the query is used only for predicate conditions and the subquery returns a minor number of rows: Using subqueries can be a useful tool for some queries, but it is important to use them judiciously. In cases where one table in the query is used only for predicate conditions and the subquery returns a minor number of rows, it is generally more efficient to use a JOIN statement instead.
Overall, these best practices can help to improve query performance in Redshift and ensure that queries are processed efficiently and accurately.