首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
社区首页 >问答首页 >如何根据每个列的最大长度对齐txt文件中不同长度的列?

如何根据每个列的最大长度对齐txt文件中不同长度的列?
EN

Stack Overflow用户
提问于 2018-04-03 06:30:25
回答 1查看 161关注 0票数 3

有一个txt文件,18列由''分隔,用,分隔,其中每一行代表sqlite query的插入语句

代码语言:javascript
代码运行次数:0
运行
复制
    (1999,1999,1999,1999,1999,0,0,'flaggr.png',261,     'Βάκχειος',             'Spl-up','B ',  'Pagrati/Athens,Attica,Greece',     'N/A',   'Hellenic Mythology',      '','', ''),
    (2000,2000,2000,2000,2000,0,2010,'flagru.png',3340, 'Анклав Снов',              'Act',    'G/D ',   'Bryansk,Russia',       '2008-2009(as Vampire''s Crypt),2010-present',   'N/A',     '','', ''),
    (2001,2001,2001,2001,2001,0,2002,'flagru.png',271,  'Аркона',               'Act','P/FO ',  'Moscow,Russia',        '2002(as Гиперборея),2002-present',  'Slavic Pism and FOtales, Legends, Mythology',     '', '', ''),
    (2002,2002,2002,2002,2002,0,1988,'flagru.png',470,      'Аспид',                'Spl-up','PROG ',   'Volgodonsk,Rostovregion,Russia',       '1988-1997,2010-?',  'Politics, Horror, Death',     '', '', ''),
    (2003,2003,2003,2003,2003,0,2000,'flagua.png',359,  'Ірій',             'Unknown','FO D /G ',   'Lviv,Ukraine',     '2000-?',    'Slavic mythology, Ukrainian FOlore',      '', '', ''),
    (2004,2004,2004,2004,2004,0,2011,'flagru.png',3036579,  'Лесьяр',               'Act','P FO ',  'Moscow,Russia',        '2011-present',  'Pism, FOlore, Social matters, Feelings',      '', '', ''),
    (2005,2005,2005,2005,2005,0,2003,'flagru.png',218,  'М8Л8ТХ',               'Act','B  with RAC',    'Tver,Ukraine(posterior),Russia',       '2003-present',  'National Pride, National Socialism, Hatred, War, Intolerance, Pism',      '', '', ''),
    (2006,2006,2006,2006,2006,0,0,'flagru.png',354037,      'Рельос',               'Act','PR/POST-/ (early), G/POST-, Ambient (later)',    'Baltiisk,Kaliningradregion,Russia',        'N/A',   'N/A',     '', '',''),
    (2007,2007,2007,2007,2007,0,2006,'flagru.png',32937,    'Сивый Яр',             'Act','P/POST-B ',  'Vyritsa,Leningradregion,Russia',       '2006-present',  'Pism, Pride, Heritage, Poetry, Slavonic Mythology',       '', '', ''),
    (2008,2008,2008,2008,2008,0,2001,'flagru.png',44,       'Темнозорь',                'Act','FO/B ',  'Moscow,Russia',        '2001-present',  'Nature, Slavonic Pism, War, Right-wing nationalism',      '4394', '', ''),
    (2009,2009,2009,2009,2009,0,1993,'flagru.png',80,       'Эпидемия',             'Act','Pow ',   'Moscow,Russia',        '1993-present',  'Fantasy, Tolkien, Elves',     '', '', ''),
    (2010,2010,2010,2010,2010,0,0,'flagjp.png',354039,      'こくまろみるく',              'Act','G/Pow ', 'N/A,Japan',        'N/A',   'Bizarre, Macabre',        '', '', ''),
    (2011,2011,2011,2011,2011,0,2012,'flagus.png',38723,    'מזמור',                'Act','B/Drone/D ', 'Portland,Oregon,United States',        '2012-present',  'N/A',     '', '', ''),
    (2012,2012,2012,2012,2012,0,2004,'flaglb.png',67,   'دمار',             'Spl-up','B/Death ',    'Hamra,Beirut,Lebanon',     '2004-2006',     'War, Pride, Blasphemy, Supremacy',        '', '', ''),
    (2013,2013,2013,2013,2013,0,2006,'flagcn.png',760,  '原罪',               'Act','B  (early), G/B  (later)',   'Chengdu,SichuanProvince,China',        '2006-present',  'Misanthropy, Hatred, Depression, War, Revelation',        '', '', ''),
    (2014,2014,2014,2014,2014,0,1995,'flagtw.png',443,      '閃靈',               'Act','Melodic B/Death/FO ',    'Taipei,Taiwan',        '1995-present',  'Taiwanese Myths and Legends, Anti-Fascism, History',      '4443', '', ''),
    (2015,2015,2015,2015,2015,0,2001,'flagjp.png',31450,    '電気式華憐音楽集団',                'Act','Pow/G',  'N/A,Japan',        '2001-present',  'Anime, Fantasy, Liberty',     '', '', '');

对齐所有列的最佳方法是什么,例如前两行是:

代码语言:javascript
代码运行次数:0
运行
复制
(1999,1999,1999,1999,1999,0,0,   'flaggr.png',261,  'Βάκχειος',     'Spl-up',   'B ',   'Pagrati/Athens,Attica,Greece', 'N/A',                                          'Hellenic Mythology',   '','', ''),
(2000,2000,2000,2000,2000,0,2010,'flagru.png',3340, 'Анклав Снов',  'Act',      'G/D ', 'Bryansk,Russia',               '2008-2009(as Vampire''s Crypt),2010-present',  'N/A',                  '','', ''),

我在想:

  1. 使用逗号分隔文件中的所有行字符串
  2. 计算每一列的最大长度并将其存储在内存中。
  3. 再次循环文件,但这一次使用计算出的最大长度和写入输出。

我带来的代码如下所示,但是我意识到了一个问题,有些列在单引号中有逗号,比如'bla1,bla2,bla3' (columns 12 to 18 could have inner commas...),所以如果我使用逗号拆分字符串,我将不会得到18列。

在那个问题之后我不知道怎么继续..。如果考虑到一些字符串的单引号,那么用逗号分隔的方法是什么呢?

代码语言:javascript
代码运行次数:0
运行
复制
    private static void AdjustColumnsInFile(string filePath, string outputFile)
    {
        //array to store max size of each column
        int[] sizes = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
        foreach (var line in File.ReadLines(filePath))
        {
            var words = line.Split(',');
            if (words.Length == 18)
            {
                var i = 0;
                //get max value of each column
                foreach (var word in words)
                {
                    sizes[i] = sizes[i] < word.Length ? word.Length : sizes[i];
                    i++;
                }
            }
        }

        ...

        using (var sw = new StreamWriter(outputFile))
        {
            foreach (var l in newLines)
            {
                sw.WriteLine($"{l}");
            }
        }
    }
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-04-03 07:27:30

据我所知,考虑到一些逗号可能出现在''引号中,您唯一的问题是如何在逗号上拆分字符串。您可以使用正则表达式来完成这一任务:

代码语言:javascript
代码运行次数:0
运行
复制
,(?=(?:[^\']*\'[^\']*\')*[^\']*$)

它基本上匹配逗号,逗号后面跟着零或偶数引号(')。如果逗号出现在''引号中--在一个有效的字符串中,它将后面跟着奇数引号,因此不匹配。

其余的应该很简单,首先计算大小:

代码语言:javascript
代码运行次数:0
运行
复制
//array to store max size of each column
int[] sizes = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
foreach (var line in File.ReadLines(filePath)) {
    var tmp = line.Trim(); // remove leading and trailing whitespace
    tmp = tmp.Remove(tmp.Length - 2, 2); // remove closing ) and , or ;
    tmp = tmp.Remove(0, 1); // remove opening (   
    // split by comma                 
    var words = Regex.Split(tmp, @",(?=(?:[^\']*\'[^\']*\')*[^\']*$)");
    if (words.Length == 18) {
        for (int i = 0; i < words.Length; i++) {
            var word = words[i].Trim(); // remove whitespace
            sizes[i] = sizes[i] < word.Length ? word.Length : sizes[i];
        }
    }
    else throw new Exception("Invalid number of columns");
}

然后向不符合预期大小的列重复并追加空格:

代码语言:javascript
代码运行次数:0
运行
复制
using (var writer = new StreamWriter(outputFile)) {
    foreach (var line in File.ReadLines(filePath)) {                    
        var tmp = line.Trim(); // remove trailing whitespace
        bool hadTrailingComma = tmp.EndsWith(",");
        tmp = tmp.Remove(tmp.Length - 2, 2); // remove closing ) and , or ;
        tmp = tmp.Remove(0, 1); // remove opening (                                                            
        var words = Regex.Split(tmp, @",(?=(?:[^\']*\'[^\']*\')*[^\']*$)");
        var newLine = String.Join(",", words.Select((w, i) =>
        {
            w = w.Trim();
            var targetSize = sizes[i];
            if (w.Length < targetSize)
                return w + new string(' ', targetSize - w.Length); // append spaces until max length
            return w;
        }));

        writer.WriteLine($"({newLine}){(hadTrailingComma ? "," : ";")}");
    }
}

请注意,由于unicode字符(如こくまろみるく ),您的输出文件可能没有正确地对齐,而实际上是这样的(即-每个列的字符大小相同)。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49623603

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档