Convert long data to wide data

JPowell · 12 September 2023 00:19

Is it possible in Riskscape to convert long data into wide eg

address	ARI	depth
1	    10	0
1	    20	1
1	    30	2
2	    10	10
2	    20	15
2	    30	20

to

address	Depth_ARI10	Depth_ARI20	Depth_ARI30
1	    0	        1	        2
2	    10	        15	        20

timbeale · 12 September 2023 00:38

Sure, you can use the bucket() aggregation function for that, E.g.

input(value: { depth: 15, ARI: 20, address: 'foo' }) ->
group(by: address,
      select: {
        *,
        bucket(by: ARI,
               pick: b -> b = ARI,
               select: {
                    max(depth) as Depth
               },
               buckets: {
                    "10": 10,
                    "20": 20,
                    "30": 30,
               }) as ARI
      })

evaluating that pipeline produces:

address,ARI.10.Depth,ARI.20.Depth,ARI.30.Depth
foo,,15,

JPowell · 12 September 2023 23:54

I have added

-> select({*, sample_one(geometry: geom, coverage: event.coverage) as hazard_sampled})

-> group(by: address_id,
      select: {
        *,
        bucket(by: event.ari,
               pick: b -> b = event.ari,
               select: {
                    max(hazard_sampled) as Depth
               },
               buckets: {
                    "2": 2,
                    "5": 5,
                    "10": 10,
                    "20": 20,
                    "50": 50,
                    "100": 100,
                    "200": 200,
                    "1000": 1000,
               }) as ARI
      })

to my pipeline. I now get a csv in the shape I want but with no data in it

address_id	ARI.2.Depth	ARI.5.Depth	ARI.10.Depth	ARI.20.Depth	ARI.50.Depth	ARI.100.Depth	ARI.200.Depth	ARI.1000.Depth
1709559								
1709558								
1641441								
845209								
845208								
1641442								
1641440								
845210								
1709560								
845216								
1641445								
845215								
1641446								
845218								
1641443								
845217								
1641444								
845212								
845211								
845214								
845213

If I remove the group by step then my output file does have values in the hazard_sampled column

JPowell · 12 September 2023 23:59

Might be relevant that for each ARI each address point will have 2 values, one with a float and one blank eg

address_id	cell	ari	hazard_sampled
1	        10	    2	
1	        30	    2	1.5

timbeale · 13 September 2023 00:29

I presume the event data is being loaded from CSV, in which case event.ari is probably a text string (i.e. '2') rather than an integer value (i.e. 2), so the pick expression won’t find any matches. Try changing the pick expression so that it converts the event.ari value from a text string to an integer, e.g.

pick: b -> b = int(event.ari),

JPowell · 13 September 2023 02:28

Thanks Tim that has worked. I thought it might have been something to do with casting but I was focused on the

select: {
                    max(hazard_sampled) as Depth
               },

part.

JPowell · 13 September 2023 21:58

I ended up grouping by 2 clauses so this was the final code

-> group(by: [address_id,event.cell],
      select: {
        mode(address_id) as Address_ID,
        mode(event.cell) as Cell,
        bucket(by: event.ari,
               pick: b -> b = int(event.ari),
               select: {
                    max(hazard_sampled) as Depth
               },
               buckets: {
                    "2": 2,
                    "5": 5,
                    "10": 10,
                    "20": 20,
                    "50": 50,
                    "100": 100,
                    "200": 200,
                    "1000": 1000,
               }) as "event.ari"
      })

timbeale · 13 September 2023 23:22

Just a note that mode() can be quite memory-intensive, as it collects all the values into a list. It’s probably fine in your case, but could become a memory bottleneck with larger datasets. A better approach is to use something like:

-> group(by: { address_id as Address_ID, event.cell as Cell },
      select: {
        *,
        bucket(by: event.ari,
       ...

Here the * includes all the attributes in the by: clause.

Topic		Replies	Views
Use group step to produce a shapefile with aggregated information - help! Community	2	151	3 October 2023
Riskscape slowing down at the end of a job Community	3	19	29 May 2025
Aggregating outputs Community	1	218	15 November 2022
Average Annualised Loss (old vers) - output being overwritten? Community	11	257	3 July 2023
Modify all values in a hazard geotiff Community	4	8	19 May 2025

Convert long data to wide data

Related topics