2  Vector Data

2.1 sf package

Maintained by Pebesma (2018), sf is the main package used in R to manipulate data with geographical attributes. It allows to encode spatial vector data in computers and make the link between geometries and their properties.

sf stands for Simple Features : feature geometry refers to the spatial properties (location or extent) of a feature, and can be described by a point, a point set, a linestring, a set of linestrings, a polygon, a set of polygons, or a combination of these. The simple adjective refers to the property that linestrings and polygons are built from points connected by straight line segments.

© Allison Horst, 2018

Main features:

  • import / export
  • display
  • geoprocessing
  • support for unprojected data (on the globe)
  • use of the simple feature standard
  • compatibility with the pipe operators (|> or %>%)
  • compatibility with tidyverse operators.

2.2 Resources

TipExercice

Install the sf package.

2.3 sf format

The three classes used to represent simple features are:

  • sf, the table (data.frame) with feature attributes and feature geometries, which contains
  • sfc, the list-column with the geometries for each feature (record), which is composed of a crs (coordinates reference system), and
  • sfg, the feature geometry of an individual simple feature.

We see:

  • in green a simple feature: a single record, or data.frame row, consisting of attributes and geometry
  • in blue a single simple feature geometry (an object of class sfg)
  • in red a simple feature list-column (an object of class sfc, which is a column in the data.frame) that although geometries are native R objects, they are printed as well-known text.
NoteWell-known text

There are two serialisation standards for writing geometry of an object : well-known-text (WKT) and well-known-binary (WKB). When written as well-known text, the localisation of an object is defined by the coordinates of its points (XY), in a counterclockwise direction.

Source : Well-known text representation of geometry, Wikipedia (2026)

2.4 Import and export

The st_read() and st_write() functions can be used to import and export a wide range of file types.

2.4.1 Import

The st_read() function imports geographic layers in sf format.
The following lines import the layer of municipalities (communes) of the Lot department provided in the geopackage file lot.gpkg.

library(sf)
mun <- st_read("data/lot.gpkg", layer = "communes")
#> Reading layer `communes' from data source 
#>   `C:\Users\Louis\Desktop\cours\cartography_with_r\data\lot.gpkg' 
#>   using driver `GPKG'
#> Simple feature collection with 313 features and 12 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 539668.5 ymin: 6346290 xmax: 637380.9 ymax: 6439668
#> Projected CRS: RGF93 v1 / Lambert-93
class(mun)
#> [1] "sf"         "data.frame"
NoteGeopackage

The geopackage format lets you store multiple layers in a single file.
The st_layers() function provides an overview of the layers present in a geopackage file.

st_layers("data/lot.gpkg")
#> Driver: GPKG 
#> Available layers:
#>     layer_name geometry_type features fields              crs_name
#> 1     communes Multi Polygon      313     12 RGF93 v1 / Lambert-93
#> 2 departements Multi Polygon       96      5 RGF93 v1 / Lambert-93
#> 3  restaurants         Point      694      2 RGF93 v1 / Lambert-93
#> 4   elevations         Point     5228      1 RGF93 v1 / Lambert-93
#> 5       routes   Line String     1054      2 RGF93 v1 / Lambert-93

2.5 Export

The following lines export the mun object to the communes layer of the com.gpkg geopackage in the data folder.

st_write(obj = mun, dsn = "data/com.gpkg", layer = "communes")
#> Writing layer `communes' to data source `data/com.gpkg' using driver `GPKG'
#> Writing 313 features with 12 fields and geometry type Multi Polygon.
TipExercice

Import restaurants layer in an object called resto.

2.6 Explore and display

2.6.1 Overview of variables

The sf objects are data.frame. We can use the head() or summary() functions.

mun <- st_read("data/lot.gpkg", layer = "communes", quiet = TRUE)
head(mun, n = 3)
#> Simple feature collection with 3 features and 12 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 557759.2 ymin: 6371852 xmax: 607179 ymax: 6418606
#> Projected CRS: RGF93 v1 / Lambert-93
#>   INSEE_COM  NOM_COM         STATUT POPULATION     AGR_H    AGR_F     IND_H
#> 1     46001    Albas Commune simple        522  4.978581 0.000000  4.936153
#> 2     46002   Albiac Commune simple         67  0.000000 9.589041  0.000000
#> 3     46003 Alvignac Commune simple        706 10.419682 0.000000 10.419682
#>      IND_F     BTP_H BTP_F     TER_H     TER_F                           geom
#> 1 0.000000  9.957527     0 44.917145 34.681799 MULTIPOLYGON (((559262 6371...
#> 2 0.000000  4.794521     0  4.794521  9.589041 MULTIPOLYGON (((605540.7 64...
#> 3 5.209841 10.419682     0 57.308249 78.147612 MULTIPOLYGON (((593707.7 64...

2.6.2 Plot

Data overview with plot(). This function will display geometries for each column of the sf object with colors associated, according to the column type.

plot(mun)
#> Warning: plotting the first 9 out of 12 attributes; use max.plot = 12 to plot
#> all

Display a single variable:

plot(mun["POPULATION"])

Display geometry only:

plot(st_geometry(mun), col = "ivory4", border = "ivory")

You can also use the mapsf package (Giraud 2026) to display sf objects.

library(mapsf)
mf_map(mun, col = "ivory4", border = "ivory")

You can remove the geometry column of an sf object. The object becomes then a data frame.

mun_df <- st_set_geometry(mun, NULL)
mun_df <- st_drop_geometry(mun) # same
head(mun_df)
#>   INSEE_COM         NOM_COM         STATUT POPULATION     AGR_H    AGR_F
#> 1     46001           Albas Commune simple        522  4.978581 0.000000
#> 2     46002          Albiac Commune simple         67  0.000000 9.589041
#> 3     46003        Alvignac Commune simple        706 10.419682 0.000000
#> 4     46004         Anglars Commune simple        219  0.000000 0.000000
#> 5     46005 Anglars-Juillac Commune simple        329  4.894895 4.894895
#> 6     46006   Anglars-Nozac Commune simple        377  4.840849 0.000000
#>       IND_H     IND_F     BTP_H    BTP_F     TER_H     TER_F
#> 1  4.936153  0.000000  9.957527 0.000000 44.917145 34.681799
#> 2  0.000000  0.000000  4.794521 0.000000  4.794521  9.589041
#> 3 10.419682  5.209841 10.419682 0.000000 57.308249 78.147612
#> 4 20.000000 15.000000 10.000000 0.000000 20.000000 20.000000
#> 5  4.894895  0.000000  0.000000 0.000000 29.369369 29.369369
#> 6  0.000000  0.000000  9.681698 4.840849 43.567639 38.726790
TipExercice
  1. Plot municpalities with sf. The filling color must be blue and borders must be thick.
  2. Plot municipalities with mapsf. The filling color must be lightgreen and borders must be white and thin.
  3. Add the restaurants to the municpalities map.
  4. How many municipalities are there?
  5. Which commune has the largest population?

Hint: Write ?mf_map() to check different arguments of the function

2.7 Coordinate Reference Systems (CRS)

The coordinate reference system allows to give specific coordinates for each place of the earth. For mapping object at planet scale, we use WGS 84 (World Geodetic System 1984, EPSG:4326), that describes the objects with X/Y coordinates according to their angle from the center of the earth. For example, the top of the Kilimanjaro is located on coordinates -3.067 (latitude), 37.355 (longitude), based respectively on the equator and the Greenwich meridian.

When we display a sphere to a flat surface, it inevitably includes deformations. Some projections prioritize the shapes over the distances, and vice versa. That is why we use local ellipsoids, in order to obtain precise coordinates for a small region. The coordinates are then expressed in distance to a reference point, still in two directions (north/south and east/west). For the case study of France, we use the Lambert 93 coordinate reference system (EPSG:2154).

2.7.1 Inspect the CRS of an sf object

The st_crs() function lets you inspect the CRS used by an sf object.

st_crs(x = mun)
#> Coordinate Reference System:
#>   User input: RGF93 v1 / Lambert-93 
#>   wkt:
#> PROJCRS["RGF93 v1 / Lambert-93",
#>     BASEGEOGCRS["RGF93 v1",
#>         DATUM["Reseau Geodesique Francais 1993 v1",
#>             ELLIPSOID["GRS 1980",6378137,298.257222101,
#>                 LENGTHUNIT["metre",1]]],
#>         PRIMEM["Greenwich",0,
#>             ANGLEUNIT["degree",0.0174532925199433]],
#>         ID["EPSG",4171]],
#>     CONVERSION["Lambert-93",
#>         METHOD["Lambert Conic Conformal (2SP)",
#>             ID["EPSG",9802]],
#>         PARAMETER["Latitude of false origin",46.5,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8821]],
#>         PARAMETER["Longitude of false origin",3,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8822]],
#>         PARAMETER["Latitude of 1st standard parallel",49,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8823]],
#>         PARAMETER["Latitude of 2nd standard parallel",44,
#>             ANGLEUNIT["degree",0.0174532925199433],
#>             ID["EPSG",8824]],
#>         PARAMETER["Easting at false origin",700000,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8826]],
#>         PARAMETER["Northing at false origin",6600000,
#>             LENGTHUNIT["metre",1],
#>             ID["EPSG",8827]]],
#>     CS[Cartesian,2],
#>         AXIS["easting (X)",east,
#>             ORDER[1],
#>             LENGTHUNIT["metre",1]],
#>         AXIS["northing (Y)",north,
#>             ORDER[2],
#>             LENGTHUNIT["metre",1]],
#>     USAGE[
#>         SCOPE["Engineering survey, topographic mapping."],
#>         AREA["France - onshore and offshore, mainland and Corsica (France métropolitaine including Corsica)."],
#>         BBOX[41.15,-9.86,51.56,10.38]],
#>     ID["EPSG",2154]]

2.7.2 Transform the CRS of an sf object

The st_transform() function can be used to change the coordinate system of an sf object and reproject it.

mf_map(mun, expandBB = c(0, .12, 0, 0))
mf_graticule(x = mun)
mf_title("RGF93 / Lambert-93")
# CRS change
mun_reproj <- st_transform(x = mun, crs = "EPSG:3035")
mf_map(mun_reproj, expandBB = c(0, .12, .0, 0))
mf_graticule(x = mun_reproj)
mf_title("ETRS89-extended / LAEA Europe")

2.8 Attribute selection and join

2.8.1 Selection by attributes

As sf objects are data.frames, we can select their rows and columns in the same way as data.frames.

# row selection
mun[1:2, ]
#> Simple feature collection with 2 features and 12 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 557759.2 ymin: 6371852 xmax: 607179 ymax: 6410204
#> Projected CRS: RGF93 v1 / Lambert-93
#>   INSEE_COM NOM_COM         STATUT POPULATION    AGR_H    AGR_F    IND_H IND_F
#> 1     46001   Albas Commune simple        522 4.978581 0.000000 4.936153     0
#> 2     46002  Albiac Commune simple         67 0.000000 9.589041 0.000000     0
#>      BTP_H BTP_F     TER_H     TER_F                           geom
#> 1 9.957527     0 44.917145 34.681799 MULTIPOLYGON (((559262 6371...
#> 2 4.794521     0  4.794521  9.589041 MULTIPOLYGON (((605540.7 64...
mun[mun$NOM_COM == "Gramat", ]
#> Simple feature collection with 1 feature and 12 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 593605.6 ymin: 6402330 xmax: 602624.6 ymax: 6413784
#> Projected CRS: RGF93 v1 / Lambert-93
#>     INSEE_COM NOM_COM         STATUT POPULATION    AGR_H    AGR_F    IND_H
#> 119     46128  Gramat Commune simple       3468 10.19868 15.29802 122.3842
#>        IND_F    BTP_H BTP_F    TER_H    TER_F                           geom
#> 119 107.0862 56.09275     0 260.0664 304.1941 MULTIPOLYGON (((594713.1 64...
# column selection
mun[, "POPULATION"]
#> Simple feature collection with 313 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 539668.5 ymin: 6346290 xmax: 637380.9 ymax: 6439668
#> Projected CRS: RGF93 v1 / Lambert-93
#> First 10 features:
#>    POPULATION                           geom
#> 1         522 MULTIPOLYGON (((559262 6371...
#> 2          67 MULTIPOLYGON (((605540.7 64...
#> 3         706 MULTIPOLYGON (((593707.7 64...
#> 4         219 MULTIPOLYGON (((613211.3 64...
#> 5         329 MULTIPOLYGON (((556744.9 63...
#> 6         377 MULTIPOLYGON (((576667.2 64...
#> 7         988 MULTIPOLYGON (((581404 6370...
#> 8         203 MULTIPOLYGON (((558216 6389...
#> 9         642 MULTIPOLYGON (((612729.6 63...
#> 10        367 MULTIPOLYGON (((581404 6370...
mun[mun$NOM_COM == "Gramat", 1:4]
#> Simple feature collection with 1 feature and 4 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 593605.6 ymin: 6402330 xmax: 602624.6 ymax: 6413784
#> Projected CRS: RGF93 v1 / Lambert-93
#>     INSEE_COM NOM_COM         STATUT POPULATION                           geom
#> 119     46128  Gramat Commune simple       3468 MULTIPOLYGON (((594713.1 64...

2.8.2 Attribute join

We can join a data.frame and an sf object using merge() if they share a common identifier.
Be careful with the order of the arguments, as the object returned will be of the same type as x. It is not possible to make an attribute join using two sf objects.

# import additional data 
mun_df <- read.csv(file = "data/com.csv")

# common identifiers?
names(mun_df)
#> [1] "INSEE_COM" "ACT"       "IND"       "SACT"      "SACT_IND"
names(mun)
#>  [1] "INSEE_COM"  "NOM_COM"    "STATUT"     "POPULATION" "AGR_H"     
#>  [6] "AGR_F"      "IND_H"      "IND_F"      "BTP_H"      "BTP_F"     
#> [11] "TER_H"      "TER_F"      "geom"
# attribute join 
mun_final <- merge(
  x = mun, # the sf object
  y = mun_df, # the data.frame
  by.x = "INSEE_COM", # identifier in x
  by.y = "INSEE_COM", # identifier in y
  all.x = TRUE # keep all lines
)

# the two objects have been joined
head(mun_final, 3)
#> Simple feature collection with 3 features and 16 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 557759.2 ymin: 6371852 xmax: 607179 ymax: 6418606
#> Projected CRS: RGF93 v1 / Lambert-93
#>   INSEE_COM  NOM_COM         STATUT POPULATION     AGR_H    AGR_F     IND_H
#> 1     46001    Albas Commune simple        522  4.978581 0.000000  4.936153
#> 2     46002   Albiac Commune simple         67  0.000000 9.589041  0.000000
#> 3     46003 Alvignac Commune simple        706 10.419682 0.000000 10.419682
#>      IND_F     BTP_H BTP_F     TER_H     TER_F       ACT       IND     SACT
#> 1 0.000000  9.957527     0 44.917145 34.681799  99.47120  4.936153 19.05579
#> 2 0.000000  4.794521     0  4.794521  9.589041  28.76712  0.000000 42.93600
#> 3 5.209841 10.419682     0 57.308249 78.147612 171.92475 15.629522 24.35195
#>   SACT_IND                           geom
#> 1 4.962393 MULTIPOLYGON (((559262 6371...
#> 2 0.000000 MULTIPOLYGON (((605540.7 64...
#> 3 9.090909 MULTIPOLYGON (((593707.7 64...
TipExercice
  1. Import the com.csv file. This dataset covers all municipalities and includes several additional variables:
    • working population (ACT)
    • working population in the industrial sector (IND)
    • working population as a percentage of total population (SACT)
    • share of the working population working in the industrial sector (SACT_IND)
  2. Join this dataset to the municipality layer.
  3. Extract municipalities with more than 500 active workers and whose share of active workers in the total population is greater than 30%.
  4. Plot a map with all municipalities in gray and selected municipalities in red.