My work often require that I analyze flat files to understand the data, relationships, cardinality, the unique keys etc. To do this effectively, I always:
If running through Teradata Studio Express, it will prompt you for the following:
dbname - the database the table is in
tname - the name of the table
The output will be two data sets:
- Load the data into a relational DB so that I can run queries and test theories.
- Profile the data to get a sense of the the likely values, the frequency of null, etc.
If running through Teradata Studio Express, it will prompt you for the following:
dbname - the database the table is in
tname - the name of the table
The output will be two data sets:
- One row per column containing the following fields:
- table_name - name of the table
- seq - sequence of the column in the table, this does not start with 1, but it is sequential
- column_name - name of the column
- #_of_records - the number of rows in the table
- distinct_values - the number of distinct values in the table for this column
- min_value - the smallest value in the table for this column
- max_value - the largest value in the table for this column
- min_length - the size of the value with the smallest size
- max_length - the size of the value with the largest size
- num_nulls - the number of nulls in the table for this column
- num_emptys - the number of empty strings in the table for this column
- Up to 100 rows per column containing the following fields for the top occurring values:
- table_name - name of the table
- seq - sequence of the column in the table, this does not start with 1, but it is sequential
- column_name - name of the column
- occurrences - the number of times this value occurs in the data
- val - the value of the column
SELECT 'select '''|| trim(TableName) || ''' table_name,
cast('''||ColumnId||''' as integer) as seq,
cast('''||TRIM(ColumnName)||''' as varchar(100)) column_name,
count(*) as #_OF_RECORDS,
count(distinct "'||TRIM(ColumnName)||'") as DISTINCT_VALUES,
cast(min("'||TRIM(ColumnName)||'") as varchar(255)) as MIN_VALUE,
cast(max("'||TRIM(ColumnName)||'") as varchar(255)) as MAX_VALUE,
min(character_length(cast("'||TRIM(ColumnName)||'" as varchar(2000)))) as MIN_LENGTH,
max(character_length(cast("'||TRIM(ColumnName)||'" as varchar(2000)))) as MAX_LENGTH,
sum(case when "'||TRIM(ColumnName)||'" IS NULL then 1 else 0 end) AS NUM_NULLS,
sum(case when TRIM(cast("' || Trim(ColumnName) || '" as
varchar(255))) = '''' then 1 else 0 end) AS NUM_EMPTYS
from ' || trim(DatabaseName) || '.' || trim(TableName) ||'
union all' q
FROM DBC.COLUMNS
WHERE DatabaseName = ?\dbname
AND TableName = ?\tname;
SELECT 'select * from (select top 100 '''|| trim(TableName) || ''' table_name,
cast('''||ColumnId||''' as integer) as seq,
cast('''||TRIM(ColumnName)||''' as varchar(100)) column_name,
count("' ||TRIM(ColumnName)|| '") occurences,' ||
case when columnType = 'CV' and columnLength > 100 then
'"' || TRIM(ColumnName) || '"' || ' val' else
'cast("' ||TRIM(ColumnName)|| '" as varchar(100)) val' end || '
from ' || trim(DatabaseName) || '.' || trim(TableName) || '
group by 5 order by 4 desc) "' || trim(ColumnName) || '"
union all' q
FROM DBC.COLUMNS
WHERE DatabaseName = ?\dbname
AND TableName = ?\tname;