Salesforce.com provides an easy way to periodically download your entire organization’s data. Located in Setup | Data Management | Data Export, you can either export the information immediately or create a schedule to do so periodically. Once the file is ready, you will receive an email notification where you can download the file and store it safely.
Recently I had downloaded the zip file and began looking into the contents.
The structure of the zip file is pretty straightforward. Data from standard and custom objects are located in the top level folder with a .csv (Comma Separated Values) file extension. All account data is found in a file called “Account.csv” that you can open up in Excel to peruse. Custom objects use their API name, so if you had a custom object called “Case Study”, the name of the file would be something like “Case_Study__c.csv”.
Also found in the top level folder are three directories: one for attachments, another for content version and another for documents. These folders are where export process puts all of the documents, attachments and content. Navigating into these directories you’ll find a bunch of files that are named with their unique Salesforce ID. In this example, I navigated into the Documents folder and found a file that is named: 01500000000guQwAAI
No file extension is given, so you do not have any idea whether a particular file is a PDF, Word Document, Excel Spreadsheet or some other file type. And if you have thousands of records, locating the file you will need be cumbersome (unless of course you search for it within Salesforce).
Fortunately, you can figure out this information from the corresponding CSV file.
We wanted a way that would automate this process and rename the files to something more meaningful. So I set about to write a short script to create renamed versions of the file. Rather than fire up Eclipse and write Java or Microsoft Visual Studio and write C#, I decided to use tools that already existed and settled upon Bashfor the script and Awk for handling the parsing and file copying.
The result was a script file named renameDocs.sh that requires 3 parameters:
- The CSV file
- The Source Directory
- The Target or Renamed Directory
Before running the script, be sure that the target or renamed directory exists. What I recommend is to create three folders: RenamedAttachments, RenamedContentVersion and RenamedDocuments.
If you copy the script into the top level folder where the CSV files are located, then renaming can be accomplished by running the following commands:
./renameDocs.sh Attachment.csv Attachments RenamedAttachments ./renameDocs.sh ContentVersion.csv ContentVersion RenamedContentVersion ./renameDocs.sh Document.csv Documents RenamedDocuments
After the commands run, navigate to the new Renamed directories where you will find files with much more friendlier names along with file extensions, if available.
One use case I discovered while testing this tool is duplicate file names, which occurs when multiple revisions are uploaded into Salesforce. If this situation is detected, the target file name is prepended with the unique Salesforce ID, e.g. (01500000000guQwAAI-Meeting Notes.doc)
Finally, if you are using Windows, this script requires Cygwin or if you are a developer and are running Git Bash (you want msysGit), you already have everything you need.
You can download the script from here.