Convert long data to wide data

Is it possible in Riskscape to convert long data into wide eg

address	ARI	depth
1	    10	0
1	    20	1
1	    30	2
2	    10	10
2	    20	15
2	    30	20

to

address	Depth_ARI10	Depth_ARI20	Depth_ARI30
1	    0	        1	        2
2	    10	        15	        20

Sure, you can use the bucket() aggregation function for that, E.g.

input(value: { depth: 15, ARI: 20, address: 'foo' }) ->
group(by: address,
      select: {
        *,
        bucket(by: ARI,
               pick: b -> b = ARI,
               select: {
                    max(depth) as Depth
               },
               buckets: {
                    "10": 10,
                    "20": 20,
                    "30": 30,
               }) as ARI
      })

evaluating that pipeline produces:

address,ARI.10.Depth,ARI.20.Depth,ARI.30.Depth
foo,,15,
1 Like

I have added

-> select({*, sample_one(geometry: geom, coverage: event.coverage) as hazard_sampled})

-> group(by: address_id,
      select: {
        *,
        bucket(by: event.ari,
               pick: b -> b = event.ari,
               select: {
                    max(hazard_sampled) as Depth
               },
               buckets: {
                    "2": 2,
                    "5": 5,
                    "10": 10,
                    "20": 20,
                    "50": 50,
                    "100": 100,
                    "200": 200,
                    "1000": 1000,
               }) as ARI
      })

to my pipeline. I now get a csv in the shape I want but with no data in it

address_id	ARI.2.Depth	ARI.5.Depth	ARI.10.Depth	ARI.20.Depth	ARI.50.Depth	ARI.100.Depth	ARI.200.Depth	ARI.1000.Depth
1709559								
1709558								
1641441								
845209								
845208								
1641442								
1641440								
845210								
1709560								
845216								
1641445								
845215								
1641446								
845218								
1641443								
845217								
1641444								
845212								
845211								
845214								
845213								

If I remove the group by step then my output file does have values in the hazard_sampled column

Might be relevant that for each ARI each address point will have 2 values, one with a float and one blank eg

address_id	cell	ari	hazard_sampled
1	        10	    2	
1	        30	    2	1.5

I presume the event data is being loaded from CSV, in which case event.ari is probably a text string (i.e. '2') rather than an integer value (i.e. 2), so the pick expression won’t find any matches. Try changing the pick expression so that it converts the event.ari value from a text string to an integer, e.g.

pick: b -> b = int(event.ari),

Thanks Tim that has worked. I thought it might have been something to do with casting but I was focused on the

select: {
                    max(hazard_sampled) as Depth
               },

part.

I ended up grouping by 2 clauses so this was the final code

-> group(by: [address_id,event.cell],
      select: {
        mode(address_id) as Address_ID,
        mode(event.cell) as Cell,
        bucket(by: event.ari,
               pick: b -> b = int(event.ari),
               select: {
                    max(hazard_sampled) as Depth
               },
               buckets: {
                    "2": 2,
                    "5": 5,
                    "10": 10,
                    "20": 20,
                    "50": 50,
                    "100": 100,
                    "200": 200,
                    "1000": 1000,
               }) as "event.ari"
      })

Just a note that mode() can be quite memory-intensive, as it collects all the values into a list. It’s probably fine in your case, but could become a memory bottleneck with larger datasets. A better approach is to use something like:

-> group(by: { address_id as Address_ID, event.cell as Cell },
      select: {
        *,
        bucket(by: event.ari,
       ...

Here the * includes all the attributes in the by: clause.