Gis Tricks

Batch merge ArcGIS

Batch merge ArcGIS

Last week I had to merge more than 2000 shapefiles together. The merge tool in ArcGIS can usually merge files allright without too much of a fuss. However when the number of files exceeds a certain amount it becomes incredibly slow. One of my colleagues suggested quite an elegant solution in python to batch merge a lot of files in ArcGIS.

Here are the steps:

  • create a list of files to merge
  • merge the first 10 files into 1
  • remove the 10 files from the list
  • append the 1 result into the list.

Like this the list is reduced with 9 files in every step, just until the list is empty. It turns out to merge 2000 files within 10 minutes, which is quite alright in arcgis world πŸ™‚

Here is the piece of code that does the magic:


while len(lijst_merge) > 0:

    temp_file = get_random_file_name(temp_dir)
    if len(lijst_merge) > 10:
        print 'List length : %s time: %s' %(len(lijst_merge),datetime.now())

    merge_from_list_of_files(gp, lijst_merge[0:10], temp_file)
    for i in xrange(10):
        lijst_merge.pop(0)
        lijst_merge.append(temp_file)
    else:
        print 'Final step... %s' %datetime.now()
        merge_from_list_of_files(gp, lijst_merge, output_file)
        print 'Succesfully merged everything'
        # empty list to get out of the while loop
        lijst_merge = []

As you can see I use a function to create temporary files. Afterwards I remove the temporary directory with all the files.

def get_random_file_name(temp_dir, suffix=None):
""" creates a random name that does not exists yet in the temp directory"""
if (os.path.splitext(temp_dir)[1] == '.gdb'Β or os.path.splitext(temp_dir)[1] == '.mdb'):
    suffix = None
# inside a file geodatabase, overwrite suffix to None
if suffix == None:
    random_filename = temp_dir + "/" + "".join(random.sample(string.ascii_lowercase, 8))
else:
    random_filename = temp_dir + "/" + "".join(
    random.sample(string.ascii_lowercase, 8)) + suffix
    return random_filename

and of course a function in which the actual merge is performed

def merge_from_list_of_files(gp, list_of_files_to_merge, output_merge_file):
 """
 input: [filename1, filename2, filename3 ....]
 output: filename_out
 """
 count = 1
 for file in list_of_files_to_merge:
     if count == 1:
         merge_list = file
         count = 2
     else:
         merge_list = file + ";" + merge_list

 gp.Merge_management(merge_list, output_merge_file)

I’m sure that last function can be nicer in python language, but I was going for result. If anyone has a good tip to make that function pythonic I ‘d be happy to hear it!

 

Here is the full script

# (c) Nelen & Schuurmans. GPL licensed, see LICENSE.txt
# -*- coding: utf-8 -*-

import sys
import os
import traceback
from datetime import datetime

import random
import string
import arcgisscripting

def merge_from_list_of_files(gp, list_of_files_to_merge,
 output_merge_file):
 """
 input: [filename1, filename2, filename3 ....]
 output: filename_out
 """
 count = 1
 for file in list_of_files_to_merge:
     if count == 1:
         merge_list = file
         count = 2
     else:
         merge_list = file + ";" + merge_list

 gp.Merge_management(merge_list, output_merge_file)

def get_random_file_name(temp_dir, suffix=None):
 """ geeft naam die nog niet voorkomt in de directory"""
 if (os.path.splitext(temp_dir)[1] == '.gdb' or os.path.splitext(temp_dir)[1] == '.mdb'):
     suffix = None
 # inside a file geodatabase, overwrite suffix to None
 if suffix == None:
     random_filename = temp_dir + "/" + "".join(
     random.sample(string.ascii_lowercase, 8))
 else:
     random_filename = temp_dir + "/" + "".join(
     random.sample(string.ascii_lowercase, 8)) + suffix
 return random_filename

if __name__ == '__main__':

 try:
     gp = arcgisscripting.create()
     if len(sys.argv) == 4:
         input_dir_to_merge = sys.argv[1]
         output_file = sys.argv[2]
         temp_dir = sys.argv[3]
     else:
         print "usage: <input_dir_to_merge> <output_file><temp dir>"
         sys.exit(1)

     lijst_merge = []
     if gp.Exists (output_file):
         print "%s already exists " %output_file
         sys.exit(1)
     print datetime.now()
     # create list of files to merge
     for file in os.listdir(input_dir_to_merge):
         if file.endswith(".shp"):
             lijst_merge.append(input_dir_to_merge + os.sep + file)

     print "merge list complete"

     print datetime.now()
     while len(lijst_merge) > 0:

         temp_file = get_random_file_name(temp_dir)
         if len(lijst_merge) > 10:
             print 'List length : %s time: %s' %(len(lijst_merge),datetime.now())

             merge_from_list_of_files(gp, lijst_merge[0:10], temp_file)
             for i in xrange(10):
                 lijst_merge.pop(0)
                 lijst_merge.append(temp_file)
         else:
             print 'Final step... %s' %datetime.now()
             merge_from_list_of_files(gp, lijst_merge, output_file)
             print 'Succesfully merged everything'
             # empty list to get out of the while loop
             lijst_merge = []

 except:
      print "%s" %(traceback.format_exc())
      sys.exit(1)

 finally:

 del gp

 

2 thoughts on “Batch merge ArcGIS

    1. admin Post author

      Glad to hear that! It’s a problem I am having from time to time and then it’s nice to fall back on this documentation πŸ™‚

Leave a Reply

Your email address will not be published. Required fields are marked *

3 × three =