Find max value without grouping

JPowell · 12 September 2024 02:42

This pipeline will return the maximum value for each unique group in column ‘A’

-> group(
by: A,
select: {max: {A, max(B)}}
)
-> save(name: 'output')

so

A,B
a,1
a,4
g,3
g,2

will return

max.A,max.max_B
a,4
g,3

I would like to get the maximum value in column ‘B’ but removing the group function

input(relation: 'test.csv')
-> select({max(B)})
-> save(name: 'output')

gives the error

Failed to validate model for execution
  - Failed to validate 'select({max(B)})' step for execution
    - Failed to validate expression '{max(B)}' against input type {A=>Text, B=>Text}
      - Problems found with 'max' riskscape function
        - Wrong number of arguments provided. Expected 2, but got 1

How can I use max with a column of data outside of a group function?

timbeale · 12 September 2024 03:15

Hi John,

Why do you want to get rid of the group step?

Basically the group step aggregates across several rows of data, whereas the select step processes each row of data separately (i.e. it operates on a scalar value). So the select step only ever sees one value for column B at a time.

The slightly confusing thing here is that the max() function can be used both in scalar-mode (i.e. select step) and aggregate-mode (i.e. group step). In the scalar version it expects to compare two numbers and return the higher number, e.g. max(B, C). This is why it’s giving you this error.

If all you want to do here is get the max across all the rows of data, then you could just remove the by clause from the group step, i.e.

input(relation: 'test.csv')
-> group({max(B)})
-> save(name: 'output')

Cheers,
Tim

Topic		Replies	Views
Assign values to strings for sorting Community	1	11	10 March 2025
Access items in a list Community	2	14	6 March 2025
Use group step to produce a shapefile with aggregated information - help! Community	2	151	3 October 2023
Insert output file into the next step of pipeline Community	1	181	14 April 2023
Aggregating outputs Community	1	218	15 November 2022

Find max value without grouping

Related topics