Split command is very useful when you are managing large file . Consider you have a csv file with millions of records and its taking too much time to open. In this case we can split file into small pieces and can access it easily in any GUI.
The default size for each split file is 1000 lines, and default PREFIX is "x". However we can split file based of number of lines or bytes and can change the prefix as well. In this article i will show you how to use split command with examples.
Let us consider we have a file testfile.csv with 1342 records.
1). Split simple example :
As you can see below split command split file testfile.csv in 2 pieces with default prefix x. testfile.csv file having total 1342 records hence by default it split first file name as xaa with default 1000 line and second file name as xab with remaining records 342.
2) Split file with specific number of lines:
We can use -l option with split command to achieve specific number of line into split files. Let us we want to split file with 500 records for each then use following command.
3) Split file with a specific prefix:
If we want to use our own prefix "NEW" in split files use the following command.
4) Split file with numeric suffix:
We can append our own numeric suffix like 00,01,02... instead default xa,xb,xc .... with -d option as following.
By default numeric suffix has 2 digits and you may need to increase the number of digits if split files crossing more than 100 files. In that case you will get following "suffixes exhausted" message and you may loose some split files after NEW99.
To overcome this you can increase number of digits in suffix by using -a option as following.
We can use -b option with desired number of size.
6) Split file with 2 files of equal length:
We can use -n option in place of -l as following to achieve specific number of file of same records.
In above example the expected count should be 671 into each NEW00 and NEW01 but its not. If anyone could explain me it would be appreciated.
The default size for each split file is 1000 lines, and default PREFIX is "x". However we can split file based of number of lines or bytes and can change the prefix as well. In this article i will show you how to use split command with examples.
Let us consider we have a file testfile.csv with 1342 records.
[~]$ cat testfile.csv | wc -l
1342
1342
1). Split simple example :
As you can see below split command split file testfile.csv in 2 pieces with default prefix x. testfile.csv file having total 1342 records hence by default it split first file name as xaa with default 1000 line and second file name as xab with remaining records 342.
[~]$ split testfile.csv
[~]$ ls
testfile.csv xaa xab
[~]$ cat xaa | wc -l
1000
[~]$ cat xab | wc -l
342
[~]$ ls
testfile.csv xaa xab
[~]$ cat xaa | wc -l
1000
[~]$ cat xab | wc -l
342
2) Split file with specific number of lines:
We can use -l option with split command to achieve specific number of line into split files. Let us we want to split file with 500 records for each then use following command.
[~]$ split -l 500 testfile.csv
[~]$ ls
testfile.csv xaa xab xac
[~]$ cat xaa | wc -l
500
[~]$ cat xab | wc -l
500
[~]$ cat xac | wc -l
342
[~]$ ls
testfile.csv xaa xab xac
[~]$ cat xaa | wc -l
500
[~]$ cat xab | wc -l
500
[~]$ cat xac | wc -l
342
3) Split file with a specific prefix:
If we want to use our own prefix "NEW" in split files use the following command.
[~]$ split -l 500 testfile.csv NEW
[~]$ ls
NEWaa NEWab NEWac testfile.csv
[~]$ ls
NEWaa NEWab NEWac testfile.csv
4) Split file with numeric suffix:
We can append our own numeric suffix like 00,01,02... instead default xa,xb,xc .... with -d option as following.
[~]$ split -l 50 -d testfile.csv NEW
[~]$ ls
NEW00 NEW02 NEW04 NEW06 NEW08 NEW10 NEW12 NEW14 NEW16 NEW18 NEW20 NEW22 NEW24 NEW26
NEW01 NEW03 NEW05 NEW07 NEW09 NEW11 NEW13 NEW15 NEW17 NEW19 NEW21 NEW23 NEW25 testfile.csv
[~]$ ls
NEW00 NEW02 NEW04 NEW06 NEW08 NEW10 NEW12 NEW14 NEW16 NEW18 NEW20 NEW22 NEW24 NEW26
NEW01 NEW03 NEW05 NEW07 NEW09 NEW11 NEW13 NEW15 NEW17 NEW19 NEW21 NEW23 NEW25 testfile.csv
By default numeric suffix has 2 digits and you may need to increase the number of digits if split files crossing more than 100 files. In that case you will get following "suffixes exhausted" message and you may loose some split files after NEW99.
[~]$ split -l 10 -d testfile.csv NEW
split: output file suffixes exhausted
[~]$ ls
NEW00 NEW05 NEW10 NEW15 NEW20 NEW25 NEW30 NEW35 NEW40 NEW45 NEW50 NEW55 NEW60 NEW65 NEW70 NEW75 NEW80 NEW85 NEW90 NEW95 testfile.csv
NEW01 NEW06 NEW11 NEW16 NEW21 NEW26 NEW31 NEW36 NEW41 NEW46 NEW51 NEW56 NEW61 NEW66 NEW71 NEW76 NEW81 NEW86 NEW91 NEW96
NEW02 NEW07 NEW12 NEW17 NEW22 NEW27 NEW32 NEW37 NEW42 NEW47 NEW52 NEW57 NEW62 NEW67 NEW72 NEW77 NEW82 NEW87 NEW92 NEW97
NEW03 NEW08 NEW13 NEW18 NEW23 NEW28 NEW33 NEW38 NEW43 NEW48 NEW53 NEW58 NEW63 NEW68 NEW73 NEW78 NEW83 NEW88 NEW93 NEW98
NEW04 NEW09 NEW14 NEW19 NEW24 NEW29 NEW34 NEW39 NEW44 NEW49 NEW54 NEW59 NEW64 NEW69 NEW74 NEW79 NEW84 NEW89 NEW94 NEW99
split: output file suffixes exhausted
[~]$ ls
NEW00 NEW05 NEW10 NEW15 NEW20 NEW25 NEW30 NEW35 NEW40 NEW45 NEW50 NEW55 NEW60 NEW65 NEW70 NEW75 NEW80 NEW85 NEW90 NEW95 testfile.csv
NEW01 NEW06 NEW11 NEW16 NEW21 NEW26 NEW31 NEW36 NEW41 NEW46 NEW51 NEW56 NEW61 NEW66 NEW71 NEW76 NEW81 NEW86 NEW91 NEW96
NEW02 NEW07 NEW12 NEW17 NEW22 NEW27 NEW32 NEW37 NEW42 NEW47 NEW52 NEW57 NEW62 NEW67 NEW72 NEW77 NEW82 NEW87 NEW92 NEW97
NEW03 NEW08 NEW13 NEW18 NEW23 NEW28 NEW33 NEW38 NEW43 NEW48 NEW53 NEW58 NEW63 NEW68 NEW73 NEW78 NEW83 NEW88 NEW93 NEW98
NEW04 NEW09 NEW14 NEW19 NEW24 NEW29 NEW34 NEW39 NEW44 NEW49 NEW54 NEW59 NEW64 NEW69 NEW74 NEW79 NEW84 NEW89 NEW94 NEW99
To overcome this you can increase number of digits in suffix by using -a option as following.
[~]$ split -l 10 -a 3 -d testfile.csv NEW
[~]$ ls
NEW000 NEW007 NEW014 NEW021 ......... NEW099 NEW100 NEW101 ......... NEW132
5) Split file with 4000 bytes output:[~]$ ls
NEW000 NEW007 NEW014 NEW021 ......... NEW099 NEW100 NEW101 ......... NEW132
We can use -b option with desired number of size.
[~]$ split -b4000 testfile.csv
(or)
[~]$ split -b4k testfile.csv
[~]$ ls -ltr x*
-rw-rw-r-- 1 mukesh mukesh 3888 Oct 7 21:14 xae
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xad
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xac
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xab
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xaa
(or)
[~]$ split -b4k testfile.csv
[~]$ ls -ltr x*
-rw-rw-r-- 1 mukesh mukesh 3888 Oct 7 21:14 xae
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xad
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xac
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xab
-rw-rw-r-- 1 mukesh mukesh 4096 Oct 7 21:14 xaa
6) Split file with 2 files of equal length:
We can use -n option in place of -l as following to achieve specific number of file of same records.
[~]$ split -n 2 -d testfile.csv NEW
[~]$ ls
NEW00 NEW01 testfile.csv
[~]$ cat NEW00 | wc -l
670
[~]$ cat NEW01 | wc -l
672
[~]$ cat testfile.csv | wc -l
1342
[~]$ ls
NEW00 NEW01 testfile.csv
[~]$ cat NEW00 | wc -l
670
[~]$ cat NEW01 | wc -l
672
[~]$ cat testfile.csv | wc -l
1342
In above example the expected count should be 671 into each NEW00 and NEW01 but its not. If anyone could explain me it would be appreciated.
***End***
No comments:
Post a Comment