#Problem 1
##Result:
#Output
#First String,Second,1.22,3.4
#Second,More Text,1.555555,2.2220
#Third,x,3,124
#First step was to find all the tab spaces in order to turn them into commas to fit the csv format. Finding two tab spaces is necessary to only replace the spaces we wish to replace. The command was; [ \t]{2,} which was then replaced with ','.
#Problem 2:
##Result:
#Bryan Ballif (University of Vermont)
#Aaron Ellison (Harvard Forest)
#Sydne Record (Bryn Mawr)
#I captured the names first with (\w+), (\w+) and then tried to capture the institution with (.*). In order to reverse the order of the names I did the replace as such \2 \1 \(\3\). This yielded the correct names and parenteses around the institutions.
#Problem 3:
##Result:
#0001 Georgia Horseshoe.mp3
#0002 Billy In The Lowground.mp3
#0003 Winder Slide.mp3
#0004 Walking Cane.mp3
#Since all files ended with .mp3 and were followed by digits in the next file, I captured the mp3 followed by the digits as such; (.mp3) (\d) and replaced them with \1\n\2. This produced each file on its own line as seen above.
#Problem 4:
##Result:
#Georgia Horseshoe_0001.mp3
#Billy In The Lowground_0002.mp3
#Winder Slide_0003.mp3
#Walking Cane_0004.mp3
#I captured the digits, the bulk of the file text, and the .mp3 seperately as such;(\d{4,})(.*)(.mp3) and replaced them with \2_\1\3. This yielded the correct formatting for the mp3 files with the digits at the end followed by .mp3.
#Problem 5:
##Result
#C_pennsylvanicus,44
#C_herculeanus,3
#M_punctiventris,4
#L_neoniger,55
#I used this as my find function; (\w)\w+\,(\w+)(.*)(\,\d{1,}). This allowed me to capture the first letter, the species, and the numbers we wanted at the end. To display the proper result I used this as my replace function ;\1_\2\4.
#Problem 6:
##Result:
#C_penn,44
#C_herc,3
#M_punc,4
#L_neon,55
#This one was tricky for me... capturing the first digit and 4 letters of the species were simple, I had more trouble figuring out how to deal with seperating the digits in the end such that I would only retain the last couple. I did so with this find; (\w)\w+\,(\w{4})\w+\,(.*)(\,\d+) and paired it with this replace;\1_\2\4.
#Problem 7:
##Result:
#Campen,44,10.2
#Camher,3,10.5
#Myrpun,4,12.2
#Lasneo,55,3.3
#I believe I did not get the spacing quite correct, but I am darn close. I used this find function; (\w{3})\w+\,(\w{3})\w+(.*)(\,\d+) paired with this replace function;\1\2\4\3.
#Problem 8:
#Result
#I believe I will be somewhat convoluted in my approach to this, requiring multiple individual commands to find and replace the 'special' characters with nothing.
#To begin, I found all asterix '\*' and replaced with nothing. I followed this method for '\!', '\&', '\$', '\%', and '\+'. At this point I realized I could use a command like [\@\^\#\_\(\)\-] to find a bunch of special characters in once string and was able to remove them all at once.
#Per the hint regarding the pathogen_binary, I assume that the data has to be either '0' or '1'. Thus, the fact that only 1s live in this space leads me to believe that the NA's in this field are meant to be 0s. For this typical find and replace; find 'NA' and replace '0' was used. This is something that I would be extra certain with regarding an acive data set since I am entering data and need to make sure these are meant to be zeros. The last thing to clean was a few rogue spaces which was done by finding '\ ,' and replacing with nothing.
##Cleaned csv data file output:
id,target_name,pathogen_binary,sampling_date,site_code,field_id,bombus_spp,host_plant,bee_caste,sampling_event,pathogen_load 1,BQCV,1,9/9/2020,BOST,6,impatiens,solidago,worker,4,2414168.805 3,BQCV,0,9/10/2020,MUDGE,18,impatiens,crown vetch,worker,4,760793.2324 4,BQCV,0,9/10/2020,MUDGE,11,impatiens,crown vetch,worker,4,0 5,BQCV,0,9/9/2020,BOST,11,impatiens,solidago,male,4,0 6,BQCV,0,9/10/2020,CIND,14,impatiens,knot weed,worker,4,124236.7921 8,BQCV,0,9/10/2020,MUDGE,4,impatiens,crown vetch,worker,4,413814.5638 10,BQCV,1,9/10/2020,CIND,13,impatiens,red clover,worker,4,1028831.605 11,BQCV,0,7/7/2020,BOST,38,impatiens,red clover,worker,2,307696378.8 12,BQCV,0,9/10/2020,MUDGE,5,impatiens,crown vetch,worker,4,0 13,BQCV,1,9/9/2020,BOST,12,impatiens,solidago,male,4,311873.0526 15,BQCV,0,9/9/2020,BOST,18,impatiens,solidago,worker,4,0 16,BQCV,0,9/9/2020,BOST,23,impatiens,solidago,male,4,1674951.391 17,BQCV,0,9/10/2020,CIND,20,impatiens,red clover,worker,4,3214026.976 18,BQCV,1,9/10/2020,CIND,11,impatiens,birdsfoot,worker,4,995592.63 19,BQCV,0,9/10/2020,CIND,17,impatiens,red clover,worker,4,0 20,BQCV,1,9/10/2020,CIND,16,impatiens,red clover,worker,4,200804.8542 21,BQCV,1,9/9/2020,BOST,17,impatiens,solidago,worker,4,228085.8104 22,BQCV,1,9/9/2020,BOST,7,impatiens,solidago,worker,4,7261151.315 23,BQCV,0,7/7/2020,COL,22,impatiens,red clover,worker,2,817906.8276 24,BQCV,1,7/7/2020,BOST,45,impatiens,red clover,worker,2,858658.6884