Return Boolean True: Converting LDIF/LDAP data into a CSV file

Tuesday, January 27, 2009

Converting LDIF/LDAP data into a CSV file

LDIF to CSV

I work with LDAP on a regularly basis. I frequently have to pull data using ldapsearch. While the data that ldapsearch spits out is a decent representation sometimes I want something a bit easier to work with.

The format I use most tends to be CSV. While the acronym stands for comma separated value the format may be used to describe data that is printed out line by line and separated by anything you can imagine.

The problem for me is finding a decent LDIF to CSV converter. The ones that I have tried tend to choke on any number of issues. Some of these include binary data or failing to normalize the multivalued attributes.

I finally got sick enough of these issues that I decided to write my own. You can download it here:

Python/Linux/Source

Windows

Update: Pushed the code to GitHub: https://github.com/tachang/ldiftocsv

Hopefully the Windows binary works for you. If it doesn't then just download Python 2.6 and run the source manually. I am providing the binary just for convenience.

Using LDIFtoCSV
Running "python LDIFtoCSV.py" should give you the usage text.

usage: LDIFtoCSV.py [options]

-o <filename> : File to write output. By default this is set to sys.stdout
-l <filename> : File to write logging output. By default there is no logging.
-F <char> : Character to separate the fields by. By default this is a comma. i.e. -F","
-D <char> : Character to delimit the text value by. By default this is a double quote. i.e. -D"""
-M <num> : The maximum number of columns a multivalued attribute should take up (default: 5). This is common with the objectClass attribute where it can have over 20 values. Do you want to have 20 columns each with the same heading objectClass or do you want to limit it.

Here are some common command lines that I use (assuming you have a test.ldif):

Outputs the CSV straight to standard output:
python LDIFtoCSV.py test.ldif

Outputs CSV to standard output with semicolons as the delimiter:python LDIFtoCSV.py -F";" test.ldif
Outputs CSV to standard output with pipes as the delimiter and text surrounded by carrots:
python LDIFtoCSV.py -F"|" -D"^" test.ldif

52 comments:

aabtzu said...: the website for the csv2ldif seems offline or non existent. did you move it somewhere else ?; March 29, 2009 at 5:08 AM
Jeff Tchang said...: Should be back up. If you decide to try the tool out please let me know what you think!; March 29, 2009 at 10:42 AM
Unknown said...: Very useful script - you've just saved me a lot of time. Thanks!; April 4, 2009 at 11:24 AM
Unknown said...: Hello,

Thanks for this great program. Is it possible to include all multiple values for the same LDAP attribute as one csv entry separated by semicolons instead of setting a maximum number and limiting the amount of multiple values stored in the CSV?

Ie -
favorite - "Ice cream"
favorite - "Apples"

In the csv file:
..., "Ice cream";"Apples",....; April 7, 2009 at 1:52 PM
Jeff Tchang said...: @Mujtaba - Right now there isn't but that is a great idea for the next version. Thanks!; April 7, 2009 at 1:57 PM
Unknown said...: Any idea on when the next version will be out? I need this functionality for a project. :); April 15, 2009 at 9:41 AM
Jeff Tchang said...: @Mujtaba Right now I do not have any date for a new version. If I need the tool to do additional functionality then I will release a new version. Of course you are free to download the source and modify it to fit your needs. If you are really in a need of customizations feel free to e-mail me directly and we can discuss a nominal rate.; April 15, 2009 at 10:06 AM
Unknown said...: Thank you, I will modify it myself.; April 15, 2009 at 10:40 AM
register said...: I have addressed similar problems in C++. My solution is simplier but could fit the needs of someone who adopts C++ as the language of choice. Here is the link: http://register72.wordpress.com/2009/07/09/ldif-to-tsv-conversion/; July 9, 2009 at 2:09 AM
Unknown said...: This is wonderful, exactly what I need.

Is the .exe and/or source redistributable?; September 30, 2009 at 1:49 PM
Jeff Tchang said...: @Kenneth

Not sure about the executable (since there might be some Microsoft DLL stuff in there) but feel free to use the source and downloading a python interpreter.; September 30, 2009 at 1:52 PM
Ramiz said...: This is brilliant. Just as you experienced; I could not get any softwarew that coudl do the conversion well but this has save me a lot of hard work.

I hope to see future enhancements; as this is a tool for the future for all that use the LDAP platform.; December 3, 2009 at 1:32 AM
Grant Prellwitz said...: We used it to convert our 3000 employee LDIF to a spreadsheet, and it worked like a charm. Thanks! Only thing that would make it better would be the ability to either make a connection directly to the LDAP or to accept the input file from stdin so we could pipe the ldapsearch results to it! For a one-time use, though, it's great as is.; December 21, 2009 at 1:46 PM
Unknown said...: Thanks a lot, just converted a LDIF-file containing 72000 entries. It worked well for me.; January 2, 2010 at 7:24 AM
ExpatZ said...: Excellent script!

Thank you very much for sharing it.; January 26, 2010 at 6:45 AM
MGT 5050 Brian G said...: Thank you, Thank you, Thank you!

Worked perfect the first try!; January 29, 2010 at 7:37 AM
DJ Alik said...: Awesome tool!!! Works like a charm.; April 16, 2010 at 5:27 PM
Akshay Kulkarni said...: This is perfect. I found it after a long struggle with various other tools..

Thanks much!; May 10, 2010 at 10:51 AM
Anonymous said...: I am also fed up with all tools that tries to convert ldif to csv, but having tried this one, it suffer from the same two problems of every other tool I used. Both two problems are because most such tools are written by people from the west with little care on ideographs.

First, your tool like other don't distinguis ideograph string from binary pictures. It is difficult to tell ideographs from pictures, despite different attribute type, they all look like binaries on the command line pipe. The application that can correctly write out ideograph to csv/tsv must either allow user to configure which field is a string and which is binary, or, even better, be schema-aware. To be schema-aware the application must act as ldap client, instead of reading from ldif file, because when exported to ldif file the schema is already lost.

Second, it should offer the possibility to only convert one language version to csv/tsv, or offer user to consider multiple language versions as multiple values. Language extension is not much used in the west. Consider this:

line100: cn: Zhang Weiwu
line101: cn;lang-zh: 張韡武

Line 101 is the Chinese spelling of the same name.

It is not possible, as far as I know, to produce ldif with ldapsearch so that only include line100 but not line101, thus tools must be careful when handling such data source.

I know it is difficult to make tools that doesn't suck for all users, but if you can manage that, it is the kind of achievement that many others before you never even manage to get close to.; June 13, 2010 at 8:54 AM
Jeff Tchang said...: @Zhang Out of curiosity how does the program fail?; June 13, 2010 at 10:30 AM
Anonymous said...: This comment has been removed by the author.; June 14, 2010 at 7:32 PM
Anonymous said...: Behavior of your system:

# the blog doesn't accept HTML pre tag. From now on is start of pre
$ ldapsearch_intranet cn=Yu*Liansu givenName sn cn title jobtitle organization organizationalUnit postalcode country st locality street telephonenumber fax labeledURI description | tee /tmp/test.ldif
# extended LDIF
#
# LDAPv3
# base with scope subtree
# filter: cn=Yu*Liansu
# requesting: givenName sn cn title jobtitle organization organizationalUnit postalcode country st locality street telephonenumber fax labeledURI description
#

# yuliansu, contacts, realss.com, eoa.cn
dn: uid=yuliansu,ou=contacts,ou=realss.com,dc=eoa,dc=cn
description: Assistant, started to work in mid-2005
sn:: 5L+e
sn;lang-en: yu
givenName:: 6I6y6IuP
cn:: 5L+e6I6y6IuP
cn;lang-en: yuliansu
givenName;lang-en: liansu
userPassword:: e21kNX1RdmdhTytSZkw5NTB0bUNDM09JamtBPT0=
telephoneNumber: 010 69020190
st:: 5YyX5Lqs
st;lang-en: beijing
street:: 5rW35reA5Yy65LiH5p+z5LiH5rOJ5paw5pawMTHmoIsz6ZeoMTAx5a6k
street;lang-en: haidianquwanliuwanquanxinxin11dong3men101shi
l:: 5YyX5Lqs5biC
l;lang-en: beijingshi
postalCode: 100089
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: evolutionPerson
objectClass: posixAccount
objectClass: shadowAccount
objectClass: phpgwAccount
objectClass: qmailUser

# search result
search: 2
result: 0 Success

# numResponses: 2
# numEntries: 1

$ python /tmp/LDIFtoCSV.py /tmp/test.ldif
"cn","cn;lang-en","description","dn","givenName","givenName;lang-en","l","l;lang-en","objectClass","objectClass","objectClass","objectClass","objectClass","postalCode","result","search","sn","sn;lang-en","st","st;lang-en","street","street;lang-en","telephoneNumber","userPassword",
"'\xe4\xbf\x9e\xe8\x8e\xb2\xe8\x8b\x8f'","yuliansu","Assistant, started to work in mid-2005","uid=yuliansu,ou=contacts,ou=realss.com,dc=eoa,dc=cn","'\xe8\x8e\xb2\xe8\x8b\x8f'","liansu","'\xe5\x8c\x97\xe4\xba\xac\xe5\xb8\x82'","beijingshi","top","person","organizationalPerson","evolutionPerson","posixAccount","100089","","","'\xe4\xbf\x9e'","yu","'\xe5\x8c\x97\xe4\xba\xac'","beijing","'\xe6\xb5\xb7\xe6\xb7\x80\xe5\x8c\xba\xe4\xb8\x87\xe6\x9f\xb3\xe4\xb8\x87\xe6\xb3\x89\xe6\x96\xb0\xe6\x96\xb011\xe6\xa0\x8b3\xe9\x97\xa8101\xe5\xae\xa4'","haidianquwanliuwanquanxinxin11dong3men101shi","010 69020190","{md5}QvgaO+RfL950tmCC3OIjkA==",
"","","","None","","","","","","","","","","","0 Success","2","","","","","","","","",
# end pre tag here

If problem 1 solved, all appearance of things like \x9f\xb should be ideograhs like 莲.

If problem 2 solved, there should be ways to omit every column whose field name is "*;lang-en".

It's great you want to look at this issue. As an asian user I cannot help feeling this:

the world is not better because there are 10+ tools, it would be better if there is only one tool that does things well.

And often I read "I decide to make my own tool" on the web is a creation again ignoring Asian text, not only ldap tools but all tools. You cannot help being critical after all such experiences, even given all tools are granted as free in freedom without charge:); June 14, 2010 at 7:35 PM
Jason said...: The download appears not to be working again. Can you check it?; July 11, 2011 at 1:30 PM
netarc said...: The download links don't appear to be working, fyi. BTW, is your tool the same as the "LDIF2CSV" one on softpedia, wondering if you published it there?; September 28, 2011 at 2:41 PM
ganesh said...: the download isn't working?? do you know of a way where i can convert multiple ldif files 19000 to be precise into csv and load it onto a DB? if anyone has any answers please lemme knw..; October 31, 2011 at 9:24 AM
Jeff Tchang said...: I just updated the links. I will be putting it on github shortly.; October 31, 2011 at 10:00 AM
'Spark said...: Thanks very much for your useful tool. If you ever have time to implement Mujtaba's suggestion (collect multi-values into one delimited string), it would be extremely beneficial for my use-case. Cheers :); November 16, 2011 at 2:00 PM
Unknown said...: Thanks for the very useful tool.; February 14, 2012 at 12:03 PM
Anonymous said...: I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

Frases de Amor; February 24, 2012 at 1:22 AM
Anonymous said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

Houston Electrician; February 24, 2012 at 3:31 AM
smash john said...: I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

Reliable VPN Proxy Service; February 24, 2012 at 9:03 PM
harrywatson said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

cheap voip call; February 24, 2012 at 10:29 PM
john ibrahim said...: I would like to thank you for the efforts you have made in writing this post. I am hoping

the same best work from you in the future as well.

compra venta; February 24, 2012 at 10:30 PM
geetika madan said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

segunda mano; February 24, 2012 at 11:10 PM
Anonymous said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

vpn liberty reserve; February 25, 2012 at 3:25 AM
charlie smash said...: I would like to thank you for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well.

Escorts San Diego; February 29, 2012 at 12:51 AM
charlie smash said...: I would like to thank you for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well.

Escorts San Diego; February 29, 2012 at 12:51 AM
craig pattinson said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit

used cnc router; February 29, 2012 at 4:15 AM
Anonymous said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit
Electrician Greenville SC; March 17, 2012 at 3:32 AM
harrywatson said...: Nice stuff you got, very interesting to read.
Well, I do have also in my sleeves, if you have time don't forget to visit
Greenville air conditioning; March 17, 2012 at 4:21 AM
Alessandro said...: Fantastic, it helped me to migrate contacts in LDIF format into GMail!!!; September 7, 2012 at 7:23 AM
Alessandro said...: This comment has been removed by the author.; September 7, 2012 at 7:24 AM
Unknown said...: Hello Jeff, sorry to contact you so randomly.. I was seeking some help from online and I believe you might be able to help me. I am currently working on a project where I have to migrate a table from DB2 to LDIF and import into IBM LDAP. Would you please able to share some knowledge on this?

Thanks
Ankur; July 19, 2013 at 7:50 AM
Arjen van Bochoven said...: I think I fixed the utf character problem:

In ldif.py, I added .encode('utf8') to line 325:

attr_value = base64.decodestring(unfolded_line[colon_pos+2:]).decode('utf8')

in LDIFtoCSV.py I changed line 128 to 139 to

if( attributeName in entry ):
i = 0
while( i < numberOfTimesToPrint ):

if( i < len(entry[attributeName])):
self.defaultOutput.write(self.textDelimiter + entry[attributeName][i].encode('utf8') + self.textDelimiter + self.fieldSeparatorCharacter)
else:
self.defaultOutput.write(self.textDelimiter + self.textDelimiter + self.fieldSeparatorCharacter)

i = i + 1

# If the attribute name is dn, print the fully qualified distinguished name

Now this ldif:

dn: CN=MACHINE,OU=Domain Controllers,DC=our,DC=domain
name: MACHINE
operatingSystem:: V2luZG93cyBTZXJ2ZXLCriAyMDA4IFN0YW5kYXJk

Converts correctly to:

Windows Server® 2008 Standard; May 24, 2014 at 11:24 PM
Arjen van Bochoven said...: I meant

In ldif.py, I added .decode('utf8') to line 325:

attr_value = base64.decodestring(unfolded_line[colon_pos+2:]).decode('utf8'); May 24, 2014 at 11:27 PM
Sushmi said...: Hi,

I am loooking fro CSV to Ldif conversion,and this is very urgent for me...I see so many tools online and getting confused.
Can you pl. advise?; June 2, 2014 at 10:01 AM
Unknown said...: Perfect !! - Cojonuda !!

It does exactly what I wanted; September 18, 2015 at 2:52 AM
Anonymous said...: Great Work, Thanks for providing this Utility, windows utility worked great for me.; December 16, 2015 at 12:36 PM
Larry W. Virden said...: Unfortunately, the link to your source no longer appears to be valid. Is there a place where it is available?; March 7, 2018 at 8:15 AM
123456789 said...: Regardless of what stage you are at in your profession movement, we have agreed elite to more readily assist you with planning for your vocation. Professional graphic design; November 5, 2020 at 3:23 AM
dean said...: This is a very useful article that explains how LDIF/LDAP data can be converted into CSV format for easier processing and analysis. The discussion on handling multivalued attributes, binary data, and delimiter customization demonstrates the practical challenges involved in directory service data management. Developing a custom LDIF-to-CSV converter using Python is an excellent example of how scripting can simplify data transformation tasks and improve interoperability between different systems.; June 20, 2026 at 11:55 PM
dean said...: The process of extracting, transforming, and organizing directory information into structured formats such as CSV is an important step in modern analytics workflows. These data preparation techniques are widely used in Data Science Projects For Final Year, where raw data from multiple sources is cleaned, transformed, and analyzed to derive meaningful insights and support intelligent decision-making.; June 20, 2026 at 11:57 PM

Return Boolean True

Tuesday, January 27, 2009

Converting LDIF/LDAP data into a CSV file

52 comments:

About Me

Blog Archive