Setting up a reproducible data analysis project in R

featuring GitHub, {renv}, {targets} and more

Olivia Angelin-Bonnet

The New Zealand Institute for Plant and Food Research

20 August 2024

Presentation disclaimer

Presentation for
ECSSN and NZSA Joint Webinar, 20 August 2024

Publication data:
Angelin-Bonnet O. September 2024. Setting up a reproducible data analysis project in R featuring GitHub, {renv}, {targets} and more. A Plant & Food Research PowerPoint presentation. SPTS No. 26044.

Presentation prepared by:
Olivia Angelin-Bonnet
Scientist, Data Science
September 2024

Presentation approved by:
Mark Wohlers
Science Group Leader, Data Science
September 2024

For more information contact:
Olivia Angelin-Bonnet
DDI: +64 6 355 6156
Email: Olivia.Angelin-Bonnet@plantandfood.co.nz

This report has been prepared by The New Zealand Institute for Plant and Food Research Limited (Plant & Food Research).
Head Office: 120 Mt Albert Road, Sandringham, Auckland 1025, New Zealand, Tel: +64 9 925 7000, Fax: +64 9 925 7001.
www.plantandfood.co.nz

DISCLAIMER

The New Zealand Institute for Plant and Food Research Limited does not give any prediction, warranty or assurance in relation to the accuracy of or fitness for any particular use or application of, any information or scientific or other result contained in this presentation. Neither The New Zealand Institute for Plant and Food Research Limited nor any of its employees, students, contractors, subcontractors or agents shall be liable for any cost (including legal costs), claim, liability, loss, damage, injury or the like, which may be suffered or incurred as a direct or indirect result of the reliance by any person on any information contained in this presentation.

COPYRIGHT

© COPYRIGHT (2024) The New Zealand Institute for Plant and Food Research Limited. All Rights Reserved. No part of this report may be reproduced, stored in a retrieval system, transmitted, reported, or copied in any form or by any means electronic, mechanical or otherwise, without the prior written permission of The New Zealand Institute for Plant and Food Research Limited. Information contained in this report is confidential and is not to be disclosed in any form to any party without the prior approval in writing of The New Zealand Institute for Plant and Food Research Limited. To request permission, write to: The Science Publication Office, The New Zealand Institute for Plant and Food Research Limited – Postal Address: Private Bag 92169, Victoria Street West, Auckland 1142, New Zealand; Email: SPO-Team@plantandfood.co.nz.

1 / 51

Setting up a reproducible data analysis project in R featuring GitHub, {renv} , {targets} and more Olivia Angelin-Bonnet The New Zealand Institute for Plant and Food Research 20 August 2024

Setting up a reproducible data analysis project in R
The project
Outline
Let’s get started!
Creating a new GitHub repository
Creating a new GitHub repository
Cloning the repository
Cloning the repository
Initialising {renv}
Initialising {renv}
Using {renv} in four commands
Setting up directory structure
Setting up directory structure
Populating the README
Populating the README
Commit and push
Start coding!
I have written some code…
I have written some code…
Step 1: Turn your code into functions
Step 1: Turn your code into functions
Step 1: Turn your code into functions
Step 2: Turn your main script into a {targets} pipeline!
Step 2: Turn your main script into a {targets} pipeline!
Step 2: Turn your main script into a {targets} pipeline!
Visualise your pipeline
Execute your pipeline
Get the pipeline results
Change in a step
Change in a step
Change in a step
Change in a step
Change in a step
Change in the data
Change in the data – using {assertr} for data checking
Using {assertr} for data checking
Using {assertr} for data checking
Using {assertr} for data checking
Using {assertr} for data checking
Writing a report with Quarto
Writing a report – Quarto + {targets}
Writing a report – Quarto + {targets}
Writing a report – Quarto + {targets}
Writing a report – Quarto + {targets}
Writing a report – Quarto + {targets}
{targets} – how to reproduce the analysis?
Conclusion
Creating reproducible R code for data science projects
Further reading and inspiration
Thank you for your attention!
Presentation disclaimer